armv8crypto: Use cursors to access crypto buffer data
Currently armv8crypto copies the scheme used in aesni(9), where payload
data and output buffers are allocated on the fly if the crypto buffer is
not virtually contiguous. This scheme is simple but incurs a lot of
overhead: for an encryption request with a separate output buffer we
have to
- allocate a temporary buffer to hold the payload
- copy input data into the buffer
- copy the encrypted payload to the output buffer
- zero the temporary buffer before freeing it
We have a handy crypto buffer cursor abstraction now, so reimplement the
armv8crypto routines using that instead of temporary buffers. This
introduces some extra complexity, but gallatin@ reports a 10% throughput
improvement with a KTLS workload without additional CPU usage. The
driver still allocates an AAD buffer for AES-GCM if necessary.
Reviewed by: jhb
Tested by: gallatin
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D28950