arm64: bzero optimization
This optimization attempts to utylize as wide as possible register store instructions to zero large buffers.
The implementation, if possible, will use 'dc zva' to zero buffer by cache lines.
Speedup: 60x faster memory zeroing
Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5726