As part of an ongoing FreeBSD Foundation project to enhance libc
string functions with SIMD on amd64, enhance timingsafe_bcmp(3).
As usual, two implementations, selectable by ARCHLEVEL (see simd(7))
are provided: one (scalar) without SIMD, and one (baseline) with SSE/SSE2.
AVX or AVX-512 implementations may be provided with a future changeset.
Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.
Performance appears to be quite ok:
The “pre” benchmark set refers to the generic C implementation.
os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ memcmp.pre.out │ memcmp.scalar.out │ memcmp.baseline.out │ │ sec/op │ sec/op vs base │ sec/op vs base │ TsBcmpShort 101.65µ ± 1% 56.70µ ± 1% -44.22% (p=0.000 n=20) 36.65µ ± 0% -63.95% (p=0.000 n=20) TsBcmpMid 29.106µ ± 0% 8.412µ ± 0% -71.10% (p=0.000 n=20) 7.028µ ± 0% -75.85% (p=0.000 n=20) TsBcmpLong 13.974µ ± 0% 5.096µ ± 0% -63.53% (p=0.000 n=20) 3.481µ ± 0% -75.09% (p=0.000 n=20) geomean 34.58µ 13.44µ -61.12% 9.643µ -72.11% │ memcmp.pre.out │ memcmp.scalar.out │ memcmp.baseline.out │ │ B/s │ B/s vs base │ B/s vs base │ TsBcmpShort 1.145Gi ± 1% 2.053Gi ± 1% +79.28% (p=0.000 n=20) 3.177Gi ± 0% +177.36% (p=0.000 n=20) TsBcmpMid 4.000Gi ± 0% 13.840Gi ± 0% +246.02% (p=0.000 n=20) 16.565Gi ± 0% +314.14% (p=0.000 n=20) TsBcmpLong 8.331Gi ± 0% 22.845Gi ± 0% +174.23% (p=0.000 n=20) 33.443Gi ± 0% +301.44% (p=0.000 n=20) geomean 3.367Gi 8.659Gi +157.18% 12.07Gi +258.60%
Sponsored by: The FreeBSD Foundation