Inline hardware TLS offload is efficient in most NIC implementations only when packets are transmitted in order. When packets are transmitted in order, the NIC can track encryption state between transmits. However, when packets are transmitted out of order, as they are during a TCP re-transmit, then the NIC needs to re-establish encryption state by re-DMA'ing the entire TLS record up to and including the segment that TCP is re-transmitting. The worse case is when a TCP segment falls at the end of a full-sized 16KB TLS record. In that case, we're DMA'ing over 10x as much data as we're sending in order to properly encrypt the re-transmitted bytes. If this happens too often, this can easily overwhelm the PCIe bus and starve the NIC for PCIe bandwidth.
This change adds a tunable & sysctl , kern.ipc.tls.ifnet_max_rexmit. This specifies the maximum percentage bytes that can be re-transmitted on a TCP connection before we automatically disable inline hw (aka ifnet) ktls offload for that connection and switch it to software ktls. This dramatically reduces output drops and increases bandwidth on Netflix servers using ifnet ktls offload with Mellanox CX6-DX NICs, and allows us to get much closer to the link bandwidth.