When ktls_bind_thread is 2 or more,, we should pick a ktls worker thread that is bound to the same domain as the TCP connection associated with the socket. We use roughly the same code as netinet/tcp_hpts.c to do this. This allows crypto to run on the same domain as the TCP connection is associated with. Assuming TCP_REUSPORT_LB_NUMA ( D21636) is in place & in use, this ensures that the crypto source buffers are local to the same NUMA domain as we're running crypto on.
Additionally, we need to set our worker threads's domainset policy to prefer the local domain so that the vm pages we allocate as destination crypto buffers are allocated from memory on the local NUMA node. Doing this eliminates writing across the QPI (or other) NUMA interconnect when doing encryption.
This change (when TCP_REUSPORT_LB_NUMA, D21636, is used) reduces cross-domain traffic from over 37% down to about 13% as measured by pcm.x