Add TCP LRO support for VLAN and VxLAN.
This change makes the TCP LRO code more generic and flexible with regards
to supporting multiple different TCP encapsulation protocols and in general
lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
to merge TCP packets for the same flow, to reduce the number of calls to upper
layers. This reduces CPU and increases performance, due to being able to send
larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
possible to avoid per-packet interaction by the host CPU.
Because the current TCP LRO code was tightly bound and optimized for TCP/IP
over ethernet only, several larger changes were needed. Also a minor bug was
fixed in the flushing mechanism for inactive entries, where the expire time,
"le->mtime" was not always properly set.
To avoid having to re-run time consuming regression tests for every change,
it was chosen to squash the following list of changes into a single commit:
- Refactor parsing of all address information into the "lro_parser" structure. This easily allows to reuse parsing code for inner headers.
- Speedup header data comparison. Don't compare field by field, but instead use an unsigned long array, where the fields get packed.
- Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed recursivly, only applying deltas as the result of updating payload data.
- Make smaller inline functions doing one operation at a time instead of big functions having repeated code.
- Refactor the TCP ACK compression code to only execute once per TCP LRO flush. This gives a minor performance improvement and keeps the code simple.
- Use sbintime() for all time-keeping. This change also fixes flushing of inactive entries.
- Try to shrink the size of the LRO entry, because it is frequently zeroed.
- Removed unused TCP LRO macros.
- Cleanup unused TCP LRO statistics counters while at it.
- Try to use __predict_true() and predict_false() to optimise CPU branch predictions.
Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.
Tested by: Netflix
Reviewed by: rrs (transport)
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 week
Sponsored by: Mellanox Technologies // NVIDIA Networking