Allocate arm64 per-CPU data in the correct domain
To minimise NUMA traffic allocate the pcpu, dpcpu, and boot stacks in
the correct domain when possible.
Submitted by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32338