Page MenuHomeFreeBSD

lib/libc/aarch64/string: add ASIMD-enhanced timingsafe_bcmp implementation
AcceptedPublic

Authored by fuz on Sep 23 2024, 9:54 AM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 12 2024, 6:15 PM
Unknown Object (File)
Oct 10 2024, 11:41 PM
Unknown Object (File)
Sep 28 2024, 6:32 AM
Unknown Object (File)
Sep 25 2024, 5:42 AM
Unknown Object (File)
Sep 24 2024, 5:17 AM

Details

Reviewers
andrew
cperciva
getz
Group Reviewers
security
Summary

This adds an implementation of the timingsafe_bcmp function for AArch64
patterned after the implementation in D41673. Performance is quite nice:
(on Cortex X1, Windows 2023 Dev Kit; this benchmark also contains an upcoming
timingsafe_memcmp implementation).

              │ memcmp.A.pre.out │            memcmp.A.post.out             │
              │       B/s        │      B/s        vs base                  │
MemcmpShort        1.564Gi ±  1%    1.573Gi ±  1%          ~ (p=0.355 n=20)
MemcmpMid          3.118Gi ±  0%    3.118Gi ±  1%          ~ (p=0.512 n=20)
MemcmpLong         12.80Gi ±  1%    12.82Gi ±  0%          ~ (p=0.157 n=20)
BcmpShort          1.541Gi ± 14%    1.322Gi ± 14%          ~ (p=0.134 n=20)
BcmpMid            2.908Gi ±  1%    2.899Gi ±  1%          ~ (p=0.602 n=20)
BcmpLong           12.93Gi ±  1%    12.99Gi ±  1%          ~ (p=0.327 n=20)
TsBcmpShort        516.2Mi ±  0%    970.7Mi ±  3%    +88.06% (p=0.000 n=20)
TsBcmpMid          1.456Gi ±  2%    3.340Gi ±  1%   +129.37% (p=0.000 n=20)
TsBcmpLong         4.523Gi ±  0%   13.200Gi ±  1%   +191.83% (p=0.000 n=20)
TsMemcmpShort      284.4Mi ±  1%    841.5Mi ±  1%   +195.94% (p=0.000 n=20)
TsMemcmpMid        336.8Mi ±  0%   3177.0Mi ±  2%   +843.22% (p=0.000 n=20)
TsMemcmpLong       351.7Mi ±  0%   7350.1Mi ±  1%  +1990.09% (p=0.000 n=20)
geomean            1.639Gi          3.401Gi         +107.46%

Please review to ensure that this function fulfills the required constant time
properties. @andrew and @cpercival have agreed to do a joint review of the code
during EuroBSDcon 2024.

We have considered adding a wrapper that would set the DIT (data-independent
timing) bit before the code and reset it to its prior state after, but after
discussion with @imp and others have decided to leave this setting to a future
portable function (i.e. the caller is responsible for enabling DIT mode if
desired).

Event: EuroBSDcon 2024

Test Plan

passes our test suite; test suite does not test constant time
properties.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 59564
Build 56451: arc lint + arc unit

Event Timeline

fuz requested review of this revision.Sep 23 2024, 9:54 AM
fuz created this revision.

Looks good, simple and nice. :-)

This revision is now accepted and ready to land.Mon, Oct 21, 6:10 PM

A review with hat security, i.e. by @andrew and @cperciva, is still needed to proceed here.

lib/libc/aarch64/string/timingsafe_bcmp.S
71

Question because I don't know arm64 assembly: ldr q0, ... loads 16 bytes into v0?

89

s/64/32/ ?

fuz marked 2 inline comments as done.Sat, Oct 26, 12:30 AM

Reply to in-line comments.

lib/libc/aarch64/string/timingsafe_bcmp.S
71

Correct, q0 and v0 are the same register, just as a quadword scalar or as a vector.

89

At this point, we know that x2 > 32. We subtract 64 to subtract 32 bytes for the current iteration, as well as 32 bytes for the next iteration. If the size becomes negative (i.e. x2 < 64) we know that there aren't 32 bytes left to complete the next iteration and have to bail into the tail handling code. At the beginning of the next loop, x2 is always 32 lower than the number of remaining bytes.

lib/libc/aarch64/string/timingsafe_bcmp.S
89

Ah, I didn't realize that x2 was aiming to become negative and the final addition is to bring x0/x1 back to pointing into the buffer.

92

Does ldp advance the pointer? If not, how are we moving through the buffer as opposed to checking the same initial 32 bytes repeatedly?

fuz marked 4 inline comments as done.Wed, Oct 30, 10:47 PM
fuz added inline comments.
lib/libc/aarch64/string/timingsafe_bcmp.S
92

Yes, [reg], #imm is post-indexed addressing. The effective address is [reg] and after memory access, reg is incremented by #imm. ARM64 also supports pre-increment (writeback) using a syntax like [reg, #imm]!, but that isn't used here.

Took me a while to understand how arm64 assembly language works, but I'm now convinced that this is correct.