Page MenuHomeFreeBSD

lib/libc/aarch64/string: add strlcpy SIMD implementation
ClosedPublic

Authored by getz on Aug 8 2024, 1:54 PM.
Tags
None
Referenced Files
F108594185: D46243.id142012.diff
Sun, Jan 26, 6:16 PM
F108592226: D46243.id141914.diff
Sun, Jan 26, 5:57 PM
Unknown Object (File)
Sat, Jan 18, 5:12 AM
Unknown Object (File)
Fri, Jan 10, 4:13 PM
Unknown Object (File)
Thu, Jan 9, 3:02 PM
Unknown Object (File)
Sat, Jan 4, 3:02 AM
Unknown Object (File)
Dec 27 2024, 3:02 AM
Unknown Object (File)
Dec 26 2024, 11:59 AM
Subscribers

Details

Summary

This changeset includes a port of the SIMD implementation of strlcpy
for amd64 to Aarch64.

It is based on memccpy (D46170) with some minor differences.

Performance is significantly better than the scalar implementation.

Benchmark results are as usual generated by the strperf utility written
by fuz.

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A76 r4p1
        │ strlcpyScalar │             strlcpySIMD             │
        │    sec/op     │   sec/op     vs base                │
Short       202.7µ ± 1%   167.3µ ± 0%  -17.48% (p=0.000 n=20)
Mid        121.67µ ± 1%   39.75µ ± 1%  -67.33% (p=0.000 n=20)
Long      109.359µ ± 0%   7.928µ ± 3%  -92.75% (p=0.000 n=20)
geomean     139.2µ        37.50µ       -73.06%

        │ strlcpyScalar │               strlcpySIMD               │
        │      B/s      │      B/s       vs base                  │
Short      588.1Mi ± 1%    712.6Mi ± 0%    +21.18% (p=0.000 n=20)
Mid        979.7Mi ± 1%   2998.8Mi ± 1%   +206.08% (p=0.000 n=20)
Long       1.065Gi ± 0%   14.684Gi ± 3%  +1279.42% (p=0.000 n=20)
geomean    856.4Mi         3.105Gi        +271.24%

os: FreeBSD
arch: arm64
cpu: ARM Neoverse-V1 r1p1
        │ strlcpyScalar │             strlcpySIMD             │
        │    sec/op     │   sec/op     vs base                │
Short       143.4µ ± 1%   138.9µ ± 1%   -3.17% (p=0.000 n=20)
Mid         66.48µ ± 0%   24.06µ ± 1%  -63.81% (p=0.000 n=20)
Long       70.863µ ± 0%   4.961µ ± 0%  -93.00% (p=0.000 n=20)
geomean     87.75µ        25.50µ       -70.94%

        │ strlcpyScalar │               strlcpySIMD               │
        │      B/s      │      B/s       vs base                  │
Short      831.2Mi ± 1%    858.5Mi ± 1%     +3.28% (p=0.000 n=20)
Mid        1.751Gi ± 0%    4.839Gi ± 1%   +176.32% (p=0.000 n=20)
Long       1.643Gi ± 0%   23.466Gi ± 0%  +1328.41% (p=0.000 n=20)
geomean    1.327Gi         4.566Gi        +244.17%
Test Plan

Passes all the unit tests

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

getz requested review of this revision.Aug 8 2024, 1:54 PM
  • unsigned comparison for limit (b.mi -> b.lo)
  • label function using __$FUNC convention

exp-run says it's fine.

This revision is now accepted and ready to land.Nov 6 2024, 2:24 PM