This changeset adds a baseline implementation of memcmp and bcmp
for amd64. The same code is used for both functions with conditional
code were the behaviour differs (we need more precise output for the
memcmp case).
FreeBSD documents that memcmp returns the difference between the
mismatching characters. Slightly faster code would be possible could
we relax this requirement to the ISO/IEC 9899:1999 requirement of
merely returning a negative/positive integer or zero.
__FBSDID is dropped in anticipation of @imp's announced change.
The changes are documented in simd(7).
In addition to this change, extend the memcmp test suite entry to
accept an externally defined memcmp function to simplify the
development of additonal test cases.
Performance is better than bionic and glibc, except for long strings
were the two are 13% faster. This could be because they use SSE4
ptest which we cannot use in a baseline kernel.
│ memcmp.baseline.out │ memcmp.bionic.out │ memcmp.scalar.out │ │ sec/op │ sec/op vs base │ sec/op vs base │ Short 26.41µ ± 0% 65.81µ ± 0% +149.15% (p=0.000 n=30+20) 61.40µ ± 0% +132.46% (p=0.000 n=30+20) Mid 8.175µ ± 0% 21.077µ ± 1% +157.82% (p=0.000 n=30+20) 13.580µ ± 1% +66.12% (p=0.000 n=30+20) Long 3.469µ ± 0% 3.055µ ± 6% -11.92% (p=0.000 n=30+20) 4.807µ ± 0% +38.58% (p=0.000 n=30+20) geomean 9.082µ 16.18µ +78.19% 15.89µ +74.91% │ memcmp.baseline.out │ memcmp.bionic.out │ memcmp.scalar.out │ │ B/s │ B/s vs base │ B/s vs base │ Short 4.407Gi ± 0% 1.769Gi ± 0% -59.86% (p=0.000 n=30+20) 1.896Gi ± 0% -56.98% (p=0.000 n=30+20) Mid 14.240Gi ± 0% 5.523Gi ± 1% -61.21% (p=0.000 n=30+20) 8.572Gi ± 1% -39.80% (p=0.000 n=30+20) Long 33.56Gi ± 0% 38.10Gi ± 6% +13.53% (p=0.000 n=30+20) 24.22Gi ± 0% -27.84% (p=0.000 n=30+20) geomean 12.82Gi 7.194Gi -43.88% 7.328Gi -42.83% os: Linux arch: x86_64 cpu: │ memcmp.glibc.out │ │ sec/op │ Short 32.29µ ± 1% Mid 10.25µ ± 0% Long 3.111µ ± 0% geomean 10.10µ │ memcmp.glibc.out │ │ B/s │ Short 3.605Gi ± 1% Mid 11.36Gi ± 0% Long 37.42Gi ± 0% geomean 11.53Gi