Rework of fletcher_4 module
fc897b24b2ef
Actions

Description

Rework of fletcher_4 module

Benchmark memory block is increased to 128kiB to reflect real block sizes more

accurately. Measurements include all three stages needed for checksum generation,
i.e. init()/compute()/fini(). The inner loop is repeated multiple times to offset
overhead of time function.

Fastest implementation selects native and byteswap methods independently in

benchmark. To support this new function pointers init_byteswap()/fini_byteswap()
are introduced.

Implementation mutex lock is replaced by atomic variable.

To save time, benchmark is not executed in userspace. Instead, highest supported

implementation is used for fastest. Default userspace selector is still 'cycle'.

fletcher_4_native/byteswap() methods use incremental methods to finish

calculation if data size is not multiple of vector stride (currently 64B).

Added fletcher_4_native_varsize() special purpose method for use when buffer size

is not known in advance. The method does not enforce 4B alignment on buffer size, and
will ignore last (size % 4) bytes of the data buffer.

Benchmark kstat is changed to match the one of vdev_raidz. It now shows

throughput for all supported implementations (in B/s), native and byteswap,
as well as the code [fastest] is running.

Example of fletcher_4_bench running on Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz:
implementation native byteswap
scalar 4768120823 3426105750
sse2 7947841777 4318964249
ssse3 7951922722 6112191941
avx2 13269714358 11043200912
fastest avx2 avx2

Example of fletcher_4_bench running on Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz:
implementation native byteswap
scalar 1291115967 1031555336
sse2 2539571138 1280970926
ssse3 2537778746 1080016762
avx2 4950749767 1078493449
avx512f 9581379998 4010029046
fastest avx512f avx512f

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4952

Details

Provenance

Gvozden Neskovic <neskovic@gmail.com>	Authored on Jul 12 2016, 3:50 PM
Brian Behlendorf <behlendorf1@llnl.gov>	Committed on Aug 16 2016, 9:11 PM

Parents

rG70b258fc962f: Fletcher4 implementation using avx512f instruction set

Branches

Unknown

Tags

Unknown

Event Timeline

Brian Behlendorf <behlendorf1@llnl.gov> committed rGfc897b24b2ef: Rework of fletcher_4 module (authored by Gvozden Neskovic <neskovic@gmail.com>).Aug 16 2016, 9:11 PM