This adds an implementation of the timingsafe_bcmp function for AArch64
patterned after the implementation in D41673. Performance is quite nice:
(on Cortex X1, Windows 2023 Dev Kit; this benchmark also contains an upcoming
timingsafe_memcmp implementation).
│ memcmp.A.pre.out │ memcmp.A.post.out │ │ B/s │ B/s vs base │ MemcmpShort 1.564Gi ± 1% 1.573Gi ± 1% ~ (p=0.355 n=20) MemcmpMid 3.118Gi ± 0% 3.118Gi ± 1% ~ (p=0.512 n=20) MemcmpLong 12.80Gi ± 1% 12.82Gi ± 0% ~ (p=0.157 n=20) BcmpShort 1.541Gi ± 14% 1.322Gi ± 14% ~ (p=0.134 n=20) BcmpMid 2.908Gi ± 1% 2.899Gi ± 1% ~ (p=0.602 n=20) BcmpLong 12.93Gi ± 1% 12.99Gi ± 1% ~ (p=0.327 n=20) TsBcmpShort 516.2Mi ± 0% 970.7Mi ± 3% +88.06% (p=0.000 n=20) TsBcmpMid 1.456Gi ± 2% 3.340Gi ± 1% +129.37% (p=0.000 n=20) TsBcmpLong 4.523Gi ± 0% 13.200Gi ± 1% +191.83% (p=0.000 n=20) TsMemcmpShort 284.4Mi ± 1% 841.5Mi ± 1% +195.94% (p=0.000 n=20) TsMemcmpMid 336.8Mi ± 0% 3177.0Mi ± 2% +843.22% (p=0.000 n=20) TsMemcmpLong 351.7Mi ± 0% 7350.1Mi ± 1% +1990.09% (p=0.000 n=20) geomean 1.639Gi 3.401Gi +107.46%
Please review to ensure that this function fulfills the required constant time
properties. @andrew and @cpercival have agreed to do a joint review of the code
during EuroBSDcon 2024.
We have considered adding a wrapper that would set the DIT (data-independent
timing) bit before the code and reset it to its prior state after, but after
discussion with @imp and others have decided to leave this setting to a future
portable function (i.e. the caller is responsible for enabling DIT mode if
desired).
Event: EuroBSDcon 2024