arm64: Use store-pair to zero the kernel bss
While this won't be noticed by most users the time to zero the bss
while using instruction tracing in the Arm FVP models (simulators) is
noticeable.
Reduce this time by using a store-pair instruction to double the size
of memory we zero on each iteration of the loop.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42733
(cherry picked from commit f1bc3750cf9a6623b0c0861984ef2a8ac966a4e3)