Page MenuHomeFreeBSD

lib/libc/amd64/string: add strncmp scalar, baseline implementation
ClosedPublic

Authored by fuz on Oct 8 2023, 3:24 AM.
Tags
None
Referenced Files
F102681520: D42122.diff
Fri, Nov 15, 7:17 PM
Unknown Object (File)
Oct 2 2024, 7:28 PM
Unknown Object (File)
Oct 2 2024, 7:28 PM
Unknown Object (File)
Oct 2 2024, 7:27 PM
Unknown Object (File)
Oct 2 2024, 7:14 PM
Unknown Object (File)
Sep 26 2024, 5:32 AM
Unknown Object (File)
Sep 24 2024, 6:26 PM
Unknown Object (File)
Sep 23 2024, 3:14 PM

Details

Summary

The scalar implementation is fairly straightforward and merely unrolled
four times. The baseline implementation closely follows D41971 with
appropriate extensions and extra code paths to pay attention to string
length.

The strcmp unit tests from D41970 have been copied and adapted to
strncmp. As always, the man page simd(7) is updated to account for the
new code.

Performance is quite good. We beat both glibc (except for very long
strings, but they likely use AVX which we don't) and Bionic (except for
medium-sized aligned strings, where we are still in the same ballpark).
Curiously enough, the fastest implementation for the qsort benchmark
(sorting random arrays of strings) is the old generic C implementation,
though the scalar and baseline implementations are only slightly slower
than it.

Please see benchmark results at the link or below:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                      │ strncmp.pre.out │          strncmp.scalar.out          │         strncmp.baseline.out         │          strncmp.bionic.out          │
                      │     sec/op      │    sec/op     vs base                │    sec/op     vs base                │    sec/op     vs base                │
StrncmpShortAligned        137.51µ ± 0%   137.32µ ± 1%        ~ (p=0.398 n=20)    40.22µ ± 0%  -70.75% (p=0.000 n=20)    62.65µ ± 1%  -54.44% (p=0.000 n=20)
StrncmpMidAligned           88.42µ ± 0%    56.69µ ± 1%  -35.89% (p=0.000 n=20)    13.13µ ± 0%  -85.15% (p=0.000 n=20)    11.24µ ± 0%  -87.29% (p=0.000 n=20)
StrncmpLongAligned         71.648µ ± 0%   36.185µ ± 0%  -49.50% (p=0.000 n=20)    4.753µ ± 0%  -93.37% (p=0.000 n=20)    4.855µ ± 0%  -93.22% (p=0.000 n=20)
StrncmpShortUnaligned      135.76µ ± 0%   146.14µ ± 3%   +7.65% (p=0.000 n=20)    42.91µ ± 0%  -68.39% (p=0.000 n=20)   111.93µ ± 0%  -17.55% (p=0.000 n=20)
StrncmpMidUnaligned         88.90µ ± 0%    57.62µ ± 5%  -35.19% (p=0.000 n=20)    13.22µ ± 0%  -85.13% (p=0.000 n=20)    23.69µ ± 0%  -73.35% (p=0.000 n=20)
StrncmpLongUnaligned       71.640µ ± 0%   36.176µ ± 0%  -49.50% (p=0.000 n=20)    4.758µ ± 0%  -93.36% (p=0.000 n=20)    6.225µ ± 0%  -91.31% (p=0.000 n=20)
StrncmpShortQsort           901.1µ ± 0%    976.8µ ± 0%   +8.40% (p=0.000 n=20)   1118.2µ ± 0%  +24.10% (p=0.000 n=20)   1339.6µ ± 0%  +48.66% (p=0.000 n=20)
StrncmpMidQsort             203.5µ ± 0%    206.9µ ± 0%   +1.63% (p=0.000 n=20)    220.1µ ± 0%   +8.13% (p=0.000 n=20)    316.5µ ± 0%  +55.48% (p=0.000 n=20)
geomean                     138.8µ         107.1µ       -22.85%                   33.71µ       -75.72%                   47.03µ       -66.13%

                      │ strncmp.pre.out │          strncmp.scalar.out           │          strncmp.baseline.out           │           strncmp.bionic.out            │
                      │       B/s       │      B/s       vs base                │      B/s       vs base                  │      B/s       vs base                  │
StrncmpShortAligned        866.9Mi ± 0%    868.1Mi ± 1%        ~ (p=0.398 n=20)   2963.9Mi ± 0%   +241.90% (p=0.000 n=20)   1902.8Mi ± 1%   +119.49% (p=0.000 n=20)
StrncmpMidAligned          1.317Gi ± 0%    2.054Gi ± 1%  +55.98% (p=0.000 n=20)    8.867Gi ± 0%   +573.46% (p=0.000 n=20)   10.360Gi ± 0%   +686.86% (p=0.000 n=20)
StrncmpLongAligned         1.625Gi ± 0%    3.217Gi ± 0%  +98.00% (p=0.000 n=20)   24.491Gi ± 0%  +1407.32% (p=0.000 n=20)   23.977Gi ± 0%  +1375.68% (p=0.000 n=20)
StrncmpShortUnaligned      878.1Mi ± 0%    815.7Mi ± 3%   -7.10% (p=0.000 n=20)   2778.3Mi ± 0%   +216.40% (p=0.000 n=20)   1065.0Mi ± 0%    +21.29% (p=0.000 n=20)
StrncmpMidUnaligned        1.309Gi ± 0%    2.020Gi ± 4%  +54.29% (p=0.000 n=20)    8.804Gi ± 0%   +572.35% (p=0.000 n=20)    4.914Gi ± 0%   +275.26% (p=0.000 n=20)
StrncmpLongUnaligned       1.625Gi ± 0%    3.218Gi ± 0%  +98.03% (p=0.000 n=20)   24.466Gi ± 0%  +1405.57% (p=0.000 n=20)   18.702Gi ± 0%  +1050.89% (p=0.000 n=20)
StrncmpShortQsort         132.29Mi ± 0%   122.05Mi ± 0%   -7.74% (p=0.000 n=20)   106.61Mi ± 0%    -19.42% (p=0.000 n=20)    88.99Mi ± 0%    -32.73% (p=0.000 n=20)
StrncmpMidQsort            585.7Mi ± 0%    576.3Mi ± 0%   -1.60% (p=0.000 n=20)    541.6Mi ± 0%     -7.52% (p=0.000 n=20)    376.7Mi ± 0%    -35.68% (p=0.000 n=20)
geomean                    858.5Mi         1.087Gi       +29.62%                   3.453Gi        +311.89%                   2.476Gi        +195.26%

os: Linux
arch: x86_64
cpu:
                      │ strncmp.glibc.out │
                      │      sec/op       │
StrncmpShortAligned           52.57µ ± 0%
StrncmpMidAligned             17.31µ ± 0%
StrncmpLongAligned            2.318µ ± 0%
StrncmpShortUnaligned         55.26µ ± 0%
StrncmpMidUnaligned           26.62µ ± 0%
StrncmpLongUnaligned          2.323µ ± 0%
StrncmpShortQsort             1.260m ± 0%
StrncmpMidQsort               279.3µ ± 0%
geomean                       35.53µ

                      │ strncmp.glibc.out │
                      │        B/s        │
StrncmpShortAligned          2.214Gi ± 0%
StrncmpMidAligned            6.725Gi ± 0%
StrncmpLongAligned           50.21Gi ± 0%
StrncmpShortUnaligned        2.107Gi ± 0%
StrncmpMidUnaligned          4.373Gi ± 0%
StrncmpLongUnaligned         50.11Gi ± 0%
StrncmpShortQsort            94.64Mi ± 0%
StrncmpMidQsort              426.8Mi ± 0%
geomean                      3.277Gi

Sponsored by: The FreeBSD Foundation

lib/libc/amd64/string/strncmp.S: fix whitespace issue

Test Plan

passes test suite and newly added unit tests.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable