HomeFreeBSD

Fletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bits

Description

Fletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bits

This is not useful on micro-architecture with a weak NEON
implementation (only 64 bits); the native version is slower &
the byteswap barely faster than scalar. On A53 or A57, it's
a small improvement on scalar but OK for byteswap.

Results from an A53 system:
0 0 0x01 -1 0 1499068294333000 1499101101878000
implementation native byteswap
scalar 1008227510 755880264
aarch64_neon 1198098720 1044818671
fastest aarch64_neon aarch64_neon

Results from a A57 system:
0 0 0x01 -1 0 4407214734807033 4407233933777404
implementation native byteswap
scalar 2302071241 1124873346
aarch64_neon 2542214946 2245570352
fastest aarch64_neon aarch64_neon

Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #5248

Details

Provenance
Romain Dolbeau <romain.github@dolbeau.name>Authored on Oct 21 2016, 5:55 PM
Brian Behlendorf <behlendorf1@llnl.gov>Committed on Oct 21 2016, 5:55 PM
Parents
rGe4ffa98dcaf2: Fix userquota_compare() function
Branches
Unknown
Tags
Unknown

Event Timeline

Brian Behlendorf <behlendorf1@llnl.gov> committed rG24cdeaf12e9e: Fletcher4 algorithm implemented in pure NEON for Aarch64 / ARMv8 64 bits (authored by Romain Dolbeau <romain.github@dolbeau.name>).Oct 21 2016, 5:55 PM