HomeFreeBSD

lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)

Description

lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)

A lot better than the generic (pre) implementaion. We do not beat glibc
for long strings, likely due to glibc switching to AVX once the input is
sufficiently long. X86-64-v3 and v4 implementations may be added at a
future time.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

│ strchrnul_pre.out │         strchrnul_scalar.out         │       strchrnul_baseline.out        │
│      sec/op       │    sec/op     vs base                │   sec/op     vs base                │

Short 129.68µ ± 3% 59.91µ ± 1% -53.80% (p=0.000 n=20) 44.37µ ± 1% -65.79% (p=0.000 n=20)
Mid 21.15µ ± 0% 19.30µ ± 0% -8.76% (p=0.000 n=20) 12.30µ ± 0% -41.85% (p=0.000 n=20)
Long 13.772µ ± 0% 11.028µ ± 0% -19.92% (p=0.000 n=20) 3.285µ ± 0% -76.15% (p=0.000 n=20)
geomean 33.55µ 23.36µ -30.37% 12.15µ -63.80%

│ strchrnul_pre.out │          strchrnul_scalar.out          │         strchrnul_baseline.out         │
│        B/s        │      B/s       vs base                 │      B/s       vs base                 │

Short 919.3Mi ± 3% 1989.7Mi ± 1% +116.45% (p=0.000 n=20) 2686.8Mi ± 1% +192.28% (p=0.000 n=20)
Mid 5.505Gi ± 0% 6.033Gi ± 0% +9.60% (p=0.000 n=20) 9.466Gi ± 0% +71.97% (p=0.000 n=20)
Long 8.453Gi ± 0% 10.557Gi ± 0% +24.88% (p=0.000 n=20) 35.441Gi ± 0% +319.26% (p=0.000 n=20)
geomean 3.470Gi 4.983Gi +43.62% 9.584Gi +176.22%

For comparison, glibc on the same machine:

│ strchrnul_glibc.out │
│       sec/op        │

Short 49.73µ ± 0%
Mid 14.60µ ± 0%
Long 1.237µ ± 0%
geomean 9.646µ

│ strchrnul_glibc.out │
│         B/s         │

Short 2.341Gi ± 0%
Mid 7.976Gi ± 0%
Long 94.14Gi ± 0%
geomean 12.07Gi

Sponsored by: The FreeBSD Foundation
Approved by: mjg
Differential Revision: https://reviews.freebsd.org/D41333

Details

Provenance
fuzAuthored on Jun 30 2023, 2:45 PM
Differential Revision
D41333: SIMD-enhanced strchrnul(3)
Parents
rGf1d955be2a73: hidraw(4): Implement HIDRAW_GET_DEVICEINFO ioctl
Branches
Unknown
Tags
Unknown