Page MenuHomeFreeBSD

lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
ClosedPublic

Authored by fuz on Aug 31 2023, 3:44 PM.
Tags
None
Referenced Files
F102751375: D41673.diff
Sat, Nov 16, 4:42 PM
Unknown Object (File)
Tue, Nov 12, 5:27 PM
Unknown Object (File)
Thu, Oct 24, 5:58 AM
Unknown Object (File)
Sep 25 2024, 2:57 PM
Unknown Object (File)
Sep 24 2024, 9:06 AM
Unknown Object (File)
Sep 22 2024, 2:29 PM
Unknown Object (File)
Sep 22 2024, 10:46 AM
Unknown Object (File)
Sep 22 2024, 2:20 AM

Details

Summary

As part of an ongoing FreeBSD Foundation project to enhance libc
string functions with SIMD on amd64, enhance timingsafe_bcmp(3).
As usual, two implementations, selectable by ARCHLEVEL (see simd(7))
are provided: one (scalar) without SIMD, and one (baseline) with SSE/SSE2.
AVX or AVX-512 implementations may be provided with a future changeset.

Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.

Performance appears to be quite ok:
The “pre” benchmark set refers to the generic C implementation.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
            │ memcmp.pre.out │          memcmp.scalar.out          │         memcmp.baseline.out         │
            │     sec/op     │   sec/op     vs base                │   sec/op     vs base                │
TsBcmpShort     101.65µ ± 1%   56.70µ ± 1%  -44.22% (p=0.000 n=20)   36.65µ ± 0%  -63.95% (p=0.000 n=20)
TsBcmpMid       29.106µ ± 0%   8.412µ ± 0%  -71.10% (p=0.000 n=20)   7.028µ ± 0%  -75.85% (p=0.000 n=20)
TsBcmpLong      13.974µ ± 0%   5.096µ ± 0%  -63.53% (p=0.000 n=20)   3.481µ ± 0%  -75.09% (p=0.000 n=20)
geomean          34.58µ        13.44µ       -61.12%                  9.643µ       -72.11%

            │ memcmp.pre.out │           memcmp.scalar.out            │          memcmp.baseline.out           │
            │      B/s       │      B/s       vs base                 │      B/s       vs base                 │
TsBcmpShort     1.145Gi ± 1%    2.053Gi ± 1%   +79.28% (p=0.000 n=20)    3.177Gi ± 0%  +177.36% (p=0.000 n=20)
TsBcmpMid       4.000Gi ± 0%   13.840Gi ± 0%  +246.02% (p=0.000 n=20)   16.565Gi ± 0%  +314.14% (p=0.000 n=20)
TsBcmpLong      8.331Gi ± 0%   22.845Gi ± 0%  +174.23% (p=0.000 n=20)   33.443Gi ± 0%  +301.44% (p=0.000 n=20)
geomean         3.367Gi         8.659Gi       +157.18%                   12.07Gi       +258.60%

Sponsored by: The FreeBSD Foundation

Test Plan

passes extended memcmp() tests of D41528. Constant time
properties to be verified by manual code review.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 53396
Build 50287: arc lint + arc unit

Event Timeline

fuz requested review of this revision.Aug 31 2023, 3:44 PM
markj added inline comments.
lib/libc/amd64/string/timingsafe_bcmp.S
34

Is there a reason not to pad with int3 instead?

fuz marked an inline comment as done.Aug 31 2023, 10:06 PM
fuz added inline comments.
lib/libc/amd64/string/timingsafe_bcmp.S
34

The padding must be executable as it is traversed to get to the loop entrance.

fuz marked an inline comment as done.Aug 31 2023, 10:35 PM
  • lib/libc/amd64/string/timingsafe_bcmp.S: fix off-by-one error
  • lib/libc/amd64/string/timingsafe_bcmp.S: fix jump to wrong label

Two bugs I found in the code that the test suite unfortunately did
not catch. I hope it's all correct now.

This revision is now accepted and ready to land.Oct 11 2023, 7:48 PM