Page MenuHomeFreeBSD

lib/libc/aarch64/string: add memcpy SIMD implementation
ClosedPublic

Authored by getz on Aug 9 2024, 1:18 PM.
Tags
None
Referenced Files
F108591890: D46251.id141946.diff
Sun, Jan 26, 5:54 PM
Unknown Object (File)
Fri, Jan 24, 5:40 PM
Unknown Object (File)
Fri, Jan 17, 3:27 PM
Unknown Object (File)
Mon, Jan 13, 2:08 AM
Unknown Object (File)
Fri, Jan 10, 5:34 PM
Unknown Object (File)
Sat, Jan 4, 3:02 AM
Unknown Object (File)
Dec 27 2024, 5:55 AM
Unknown Object (File)
Dec 26 2024, 1:44 PM
Subscribers

Details

Summary

I noticed that we have a SIMD optimized memcpy in the
arm-optimized-routines in /contrib.

This patch ensures we use the SIMD variant as opposed to the
Scalar optimized variant.

Benchmarks are available below generated by fuz' strperf utility.

os: FreeBSD
arch: arm64
cpu: ARM Neoverse-V1 r1p1
        │ memcpyScalar │             memcpySIMD              │
        │    sec/op    │   sec/op     vs base                │
64         30.71µ ± 0%   22.47µ ± 1%  -26.83% (p=0.000 n=20)
4k         7.875µ ± 0%   4.069µ ± 0%  -48.33% (p=0.000 n=20)
256k       6.608µ ± 0%   5.126µ ± 0%  -22.43% (p=0.000 n=20)
16m        512.0µ ± 0%   503.0µ ± 0%   -1.75% (p=0.000 n=20)
1g         41.42m ± 0%   39.73m ± 0%   -4.08% (p=0.000 n=20)
geomean    127.7µ        98.70µ       -22.68%

        │ memcpyScalar │              memcpySIMD               │
        │     B/s      │      B/s       vs base                │
64        7.582Gi ± 0%   10.362Gi ± 1%  +36.68% (p=0.000 n=20)
4k        29.57Gi ± 0%    57.22Gi ± 0%  +93.55% (p=0.000 n=20)
256k      35.23Gi ± 0%    45.42Gi ± 0%  +28.91% (p=0.000 n=20)
16m       29.11Gi ± 0%    29.62Gi ± 0%   +1.78% (p=0.000 n=20)
1g        23.02Gi ± 0%    24.00Gi ± 0%   +4.26% (p=0.000 n=20)
geomean   22.12Gi         28.60Gi       +29.33%

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A76 r4p1
        │ memcpyScalar │             memcpySIMD              │
        │    sec/op    │   sec/op     vs base                │
64         51.55µ ± 0%   46.25µ ± 0%  -10.29% (p=0.000 n=20)
4k         9.866µ ± 0%   7.253µ ± 0%  -26.48% (p=0.000 n=20)
256k       7.044µ ± 0%   7.793µ ± 0%  +10.64% (p=0.000 n=20)
16m        3.523m ± 6%   3.707m ± 5%        ~ (p=0.602 n=20)
1g         209.3m ± 1%   211.3m ± 1%   +0.93% (p=0.035 n=20)
geomean    305.1µ        289.9µ        -4.97%

        │ memcpyScalar │              memcpySIMD              │
        │     B/s      │     B/s       vs base                │
64        4.516Gi ± 0%   5.035Gi ± 0%  +11.48% (p=0.000 n=20)
4k        23.60Gi ± 0%   32.10Gi ± 0%  +36.02% (p=0.000 n=20)
256k      33.05Gi ± 0%   29.88Gi ± 0%   -9.62% (p=0.000 n=20)
16m       4.230Gi ± 5%   4.020Gi ± 5%        ~ (p=0.602 n=20)
1g        4.556Gi ± 1%   4.514Gi ± 1%   -0.92% (p=0.035 n=20)
geomean   9.255Gi        9.739Gi        +5.23%

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A78C r0p0
        │ memcpyScalar │             memcpySIMD             │
        │    sec/op    │   sec/op     vs base               │
64         67.58µ ± 0%   64.87µ ± 0%  -4.00% (p=0.000 n=20)
4k         14.42µ ± 0%   14.43µ ± 0%       ~ (p=0.478 n=20)
256k       14.68µ ± 1%   14.76µ ± 1%       ~ (p=0.192 n=20)
16m        1.513m ± 1%   1.500m ± 1%       ~ (p=0.301 n=20)
1g         86.77m ± 2%   87.08m ± 1%       ~ (p=0.640 n=20)
geomean    284.9µ        282.7µ       -0.78%

        │ memcpyScalar │             memcpySIMD              │
        │     B/s      │     B/s       vs base               │
64        3.445Gi ± 0%   3.589Gi ± 0%  +4.17% (p=0.000 n=20)
4k        16.15Gi ± 0%   16.14Gi ± 0%       ~ (p=0.478 n=20)
256k      15.86Gi ± 1%   15.77Gi ± 1%       ~ (p=0.192 n=20)
16m       9.850Gi ± 1%   9.931Gi ± 1%       ~ (p=0.301 n=20)
1g        10.99Gi ± 2%   10.95Gi ± 1%       ~ (p=0.640 n=20)
geomean   9.909Gi        9.987Gi       +0.78%
Test Plan

No regressions in the test suite noticed, all tests pass

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 58975
Build 55862: arc lint + arc unit