lib/libc/aarch64/string: add memcpy SIMD implementation
I noticed that we have a SIMD optimized memcpy in the
arm-optimized-routines in /contrib.
This patch ensures we use the SIMD variant as opposed to the Scalar
optimized variant.
Benchmarks are generated by fuz' strperf utility.
See the DR for benchmark results.
Tested by: fuz (exprun)
Reviewed by: fuz, emaste
Sponsored by: Google LLC (GSoC 2024)
PR: 281175
Differential Revision: https://reviews.freebsd.org/D46251