Page MenuHomeFreeBSD

lib/libc/amd64/string: add memrchr scalar, baseline implementation
ClosedPublic

Authored by fuz on Dec 6 2023, 2:06 PM.
Tags
None
Referenced Files
F109306674: D42925.diff
Mon, Feb 3, 8:38 AM
Unknown Object (File)
Dec 11 2024, 11:20 PM
Unknown Object (File)
Dec 3 2024, 10:29 PM
Unknown Object (File)
Dec 1 2024, 12:28 AM
Unknown Object (File)
Nov 28 2024, 3:58 PM
Unknown Object (File)
Nov 20 2024, 10:22 PM
Unknown Object (File)
Sep 23 2024, 7:17 AM
Unknown Object (File)
Sep 22 2024, 7:48 PM
Subscribers

Details

Summary

The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.

The baseline implementation performs well and is similar to
timingsafe_memcmp in the way it operates. See the usual place
for benchmark results:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ memrchr.pre.out │          memrchr.scalar.out          │        memrchr.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short        120.95µ ± 0%    98.08µ ± 0%  -18.90% (p=0.000 n=20)   37.75µ ± 1%  -68.79% (p=0.000 n=20)
Mid          74.374µ ± 0%   48.394µ ± 0%  -34.93% (p=0.000 n=20)   9.120µ ± 0%  -87.74% (p=0.000 n=20)
Long         52.181µ ± 0%   38.607µ ± 0%  -26.01% (p=0.000 n=20)   4.110µ ± 0%  -92.12% (p=0.000 n=20)
geomean       77.72µ         56.80µ       -26.91%                  11.23µ       -85.55%

        │ memrchr.pre.out │          memrchr.scalar.out           │          memrchr.baseline.out           │
        │       B/s       │      B/s       vs base                │      B/s       vs base                  │
Short        985.6Mi ± 0%   1215.4Mi ± 0%  +23.31% (p=0.000 n=20)   3158.2Mi ± 1%   +220.42% (p=0.000 n=20)
Mid          1.565Gi ± 0%    2.406Gi ± 0%  +53.68% (p=0.000 n=20)   12.765Gi ± 0%   +715.52% (p=0.000 n=20)
Long         2.231Gi ± 0%    3.015Gi ± 0%  +35.16% (p=0.000 n=20)   28.323Gi ± 0%  +1169.56% (p=0.000 n=20)
geomean      1.498Gi         2.050Gi       +36.82%                   10.37Gi        +592.26%

New unit tests to cover this function are provided, too.

Test Plan

passes newly added unit tests, no new kyua failures.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable