lib/libc/aarch64/string: add strlcpy SIMD implementation
This changeset includes a port of the SIMD implementation of
strlcpy for amd64 to Aarch64.
It is based on memccpy (D46170) with some minor differences.
Performance is significantly better than the scalar implementation.
Benchmark results are as usual generated by the strperf utility
written by fuz.
See the DR for benchmark results.
Tested by: fuz (exprun)
Reviewed by: fuz, emaste
Sponsored by: Google LLC (GSoC 2024)
PR: 281175
Differential Revision: https://reviews.freebsd.org/D46243