pf: Convert pfi_kkif to use counter_u64
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.
The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:
x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
+ | ||
xx + | ||
xxx +++ | ||
A | ||
A | ||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962 9526356 9343902 9371057.6 116720.36
+ 5 19427190 19698400 19502922 19546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756 108.584% +/- 2.9359% (Student's t, pooled s = 112967)
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763
(cherry picked from commit 5a3b9507d784aaa6a7ce35432b2111a7eec12cba)