string: add strlcpy SIMD implementation
ClosedPublic
Actions

Authored by getz on Aug 8 2024, 1:54 PM.

Details

Reviewers

fuz
emaste
andrew

Commits

rG756b7fc80837: lib/libc/aarch64/string: add strlcpy SIMD implementation

Summary

This changeset includes a port of the SIMD implementation of strlcpy
for amd64 to Aarch64.

It is based on memccpy (D46170) with some minor differences.

Performance is significantly better than the scalar implementation.

Benchmark results are as usual generated by the strperf utility written
by fuz.

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A76 r4p1
        │ strlcpyScalar │             strlcpySIMD             │
        │    sec/op     │   sec/op     vs base                │
Short       202.7µ ± 1%   167.3µ ± 0%  -17.48% (p=0.000 n=20)
Mid        121.67µ ± 1%   39.75µ ± 1%  -67.33% (p=0.000 n=20)
Long      109.359µ ± 0%   7.928µ ± 3%  -92.75% (p=0.000 n=20)
geomean     139.2µ        37.50µ       -73.06%

        │ strlcpyScalar │               strlcpySIMD               │
        │      B/s      │      B/s       vs base                  │
Short      588.1Mi ± 1%    712.6Mi ± 0%    +21.18% (p=0.000 n=20)
Mid        979.7Mi ± 1%   2998.8Mi ± 1%   +206.08% (p=0.000 n=20)
Long       1.065Gi ± 0%   14.684Gi ± 3%  +1279.42% (p=0.000 n=20)
geomean    856.4Mi         3.105Gi        +271.24%

os: FreeBSD
arch: arm64
cpu: ARM Neoverse-V1 r1p1
        │ strlcpyScalar │             strlcpySIMD             │
        │    sec/op     │   sec/op     vs base                │
Short       143.4µ ± 1%   138.9µ ± 1%   -3.17% (p=0.000 n=20)
Mid         66.48µ ± 0%   24.06µ ± 1%  -63.81% (p=0.000 n=20)
Long       70.863µ ± 0%   4.961µ ± 0%  -93.00% (p=0.000 n=20)
geomean     87.75µ        25.50µ       -70.94%

        │ strlcpyScalar │               strlcpySIMD               │
        │      B/s      │      B/s       vs base                  │
Short      831.2Mi ± 1%    858.5Mi ± 1%     +3.28% (p=0.000 n=20)
Mid        1.751Gi ± 0%    4.839Gi ± 1%   +176.32% (p=0.000 n=20)
Long       1.643Gi ± 0%   23.466Gi ± 0%  +1328.41% (p=0.000 n=20)
geomean    1.327Gi         4.566Gi        +244.17%

Test Plan

Passes all the unit tests