Page MenuHomeFreeBSD

lib/libc/aarch64/string: add strlcpy SIMD implementation
AcceptedPublic

Authored by getz on Aug 8 2024, 1:54 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 1, 5:12 AM
Unknown Object (File)
Fri, Oct 18, 12:02 AM
Unknown Object (File)
Fri, Oct 11, 1:24 AM
Unknown Object (File)
Oct 7 2024, 5:41 AM
Unknown Object (File)
Oct 5 2024, 5:42 AM
Unknown Object (File)
Oct 2 2024, 1:26 PM
Unknown Object (File)
Sep 27 2024, 3:07 PM
Unknown Object (File)
Sep 26 2024, 1:32 AM
Subscribers

Details

Reviewers
fuz
emaste
andrew
Summary

This changeset includes a port of the SIMD implementation of strlcpy
for amd64 to Aarch64.

It is based on memccpy (D46170) with some minor differences.

Performance is significantly better than the scalar implementation.

Benchmark results are as usual generated by the strperf utility written
by fuz.

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A76 r4p1
        │ strlcpyScalar │             strlcpySIMD             │
        │    sec/op     │   sec/op     vs base                │
Short       202.7µ ± 1%   167.3µ ± 0%  -17.48% (p=0.000 n=20)
Mid        121.67µ ± 1%   39.75µ ± 1%  -67.33% (p=0.000 n=20)
Long      109.359µ ± 0%   7.928µ ± 3%  -92.75% (p=0.000 n=20)
geomean     139.2µ        37.50µ       -73.06%

        │ strlcpyScalar │               strlcpySIMD               │
        │      B/s      │      B/s       vs base                  │
Short      588.1Mi ± 1%    712.6Mi ± 0%    +21.18% (p=0.000 n=20)
Mid        979.7Mi ± 1%   2998.8Mi ± 1%   +206.08% (p=0.000 n=20)
Long       1.065Gi ± 0%   14.684Gi ± 3%  +1279.42% (p=0.000 n=20)
geomean    856.4Mi         3.105Gi        +271.24%

os: FreeBSD
arch: arm64
cpu: ARM Neoverse-V1 r1p1
        │ strlcpyScalar │             strlcpySIMD             │
        │    sec/op     │   sec/op     vs base                │
Short       143.4µ ± 1%   138.9µ ± 1%   -3.17% (p=0.000 n=20)
Mid         66.48µ ± 0%   24.06µ ± 1%  -63.81% (p=0.000 n=20)
Long       70.863µ ± 0%   4.961µ ± 0%  -93.00% (p=0.000 n=20)
geomean     87.75µ        25.50µ       -70.94%

        │ strlcpyScalar │               strlcpySIMD               │
        │      B/s      │      B/s       vs base                  │
Short      831.2Mi ± 1%    858.5Mi ± 1%     +3.28% (p=0.000 n=20)
Mid        1.751Gi ± 0%    4.839Gi ± 1%   +176.32% (p=0.000 n=20)
Long       1.643Gi ± 0%   23.466Gi ± 0%  +1328.41% (p=0.000 n=20)
geomean    1.327Gi         4.566Gi        +244.17%
Test Plan

Passes all the unit tests

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 58992
Build 55879: arc lint + arc unit