callout: Move per-CPU callout state into the dpcpu region
This eliminates some static bloat in amd64 kernels and reduces the
penalty of increasing MAXCPU. The structures now also maintain NUMA
affinity. No functional change intended.
PR: 269572
Reviewed by: mjg, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39807