Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment, and tso_tcp_flags_mask_last_segment sysctl-variables to control the handling of TCP flags during TSO.
This allows to fix the handling appropriate for classical ECN and to configure a handling appropriate for accurate ECN.
Details
- Reviewers
rscheff rrs kbowling erj - Group Reviewers
Restricted Owners Package (Owns No Changed Paths) - Commits
- rGe5149ac66c2c: ixgbe: sysctl for TCP flag handling during TSO
rG52ff9ccb5b25: ixgbe: sysctl for TCP flag handling during TSO
rGeea2e089f8da: ixgbe: sysctl for TCP flag handling during TSO
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
As at least two, maybe three different NIC vendors have similar capabilities, wouldn't it make sense to have global, driver independent sysctls?
Are you sure this can't be dealt with dynamically in ix_txrx? Admittedly I have no reason to spend a lot of time digging into this but my intuition is you can stuff the offsets into if_pkt_info_t in iflib.c and make the right decisions when constructing the TSO packet descriptor.
sys/dev/ixgbe/if_ix.c | ||
---|---|---|
4797 | The handler will run any time the sysctl tree is queried. I would gate the write to only apply on changes. |
Are you saying you would write the register with every TSO packet? I have no experience, but I thought this might be too expensive. Right now, the NIC is using one behavior for all TSO packets for all connections...
No, I was speculating maybe you can accomplish whatever header change is desired in the TSO descriptor rather than globally altering those registers. But I do not know.
I believe by default whatever is in the pseudoheader is copied to all segments.
What are we trying to accomplish.. is a default mask ruining the ECN flag or are you trying to drop/alter the ECN flag in the middles or last segment?
Here is what I'm trying to accomplish:
The TCP header contains 12 flags (see IANA. The Intel NIC uses three 12 bit masks to compute, which flags are copied to the first, the middle, and the last segment when performing TSO. When doing ECN as specified in RFC 3168, the masks should be:
mask.first | 0xFF6 | |
mask.middle | 0xF76 | |
mask.last | 0xF7F | |
This means that the FIN and PSH flag only appear in the last segment and the CWR flag only appear in the first segment. Several Intel NICs behave like this.
However,
ix0: <Intel(R) X520 82599ES (SFI/SFP+)> mem 0x6004000000000-0x600400007ffff,0x6004000100000-0x6004000103fff irq 1040376 at device 0.0 numa-domain 0 on pci3 ix0: Using 2048 TX descriptors and 2048 RX descriptors ix0: Using 16 RX queues 16 TX queues ix0: Using MSI-X interrupts with 17 vectors ix0: allocated for 16 queues ix0: allocated for 16 rx queues ix0: Ethernet address: 90:e2:ba:f7:48:74 ix0: PCI Express Bus: Speed 5.0GT/s Width x8 ix0: eTrack 0x000161c1
reports
dev.ix.0.tso_tcp_flags_mask_first_segment: 0x00000ff6 dev.ix.0.tso_tcp_flags_mask_middle_segment: 0x00000ff6 dev.ix.0.tso_tcp_flags_mask_last_segment: 0x00000f7f
This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.
When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:
mask.first | 0xFF6 | |
mask.middle | 0xFF6 | |
mask.last | 0xFFF | |
The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.
Does this description make the motivation clear?
sys/dev/ixgbe/if_ix.c | ||
---|---|---|
4797 | I think if you just perform a read operation req->newptr == NULL is true, and you would not perform the IXGBE_WRITE_REG(). Are you suggesting to optimize the case of a write operation where the new value is equal to the old value and skipping the IXGBE_WRITE_REG() in this special case? |
sys/dev/ixgbe/if_ix.c | ||
---|---|---|
4797 |
Yes kib corrected me in the other review it is fine as is. |
It does, thank you. There is nothing wrong with the patch as is. Should we program the registers automatically for normal ECN in a followup as the newer NICs do? I am guessing they are the way they are due to the age of the hardware/driver.
This means that the PSH and FIN flag correctly appear only in the last segment, but the CWR flag does not only appear in the first segment, but also in all middle segments.
Using this patch, this could be fixed.When doing Accurate ECN as currently being specified in Accurate ECN, the masks should be:
mask.first 0xFF6 mask.middle 0xFF6 mask.last 0xFFF The proposed patch would allow a NIC to be configured to do TSO for either classical ECN or accurate ECN.
Does this description make the motivation clear?
Your description definitely helps; thanks for asking the question @kbowling. @tuexen have you checked these masks on the other devices that use ixl/ice, or is this what you've observed so far?
I haven't heard anything internally about needing the TSO masks to be changed, but I'm certainly fine with adding sysctls to configure these values since it seems to fix a valid issue.
Yes, such a follow up commit makes sense. I can propose one. Procedural question: should I wait for someone in the O11: Intel Networking group to approve this review or go ahead and commit it with the two approvals right now?
Depends on the meaning of "checking"... Here is what is based on Intel datasheets:
Chip | driver | first mask | middle mask | last mask |
82576EB | igb | 0xFF6 | 0xF76 | 0xF7F |
i210 | igb | 0xFF6 | 0xF76 | 0xF7F |
i211 | igb | 0xFF6 | 0xF76 | 0xF7F |
i350 | igb | 0xFF6 | 0xF76 | 0xF7F |
82599 | ix | 0xFF6 | 0xFF6 | 0xF7F |
X710 | ixl | 0xFF6 | 0xF76 | 0xF7F |
E810 | ice | 0xFF6 | 0xF76 | 0xF7F |
So all cards except the 82599 uses the expected masks for classical ECN. I have tested the NICs except the last two. Give me a couple of days and I'll actually test the X710, since I have these cards in my lab. I can't test the E810, since I don't have these cards in my lab.
I haven't heard anything internally about needing the TSO masks to be changed, but I'm certainly fine with adding sysctls to configure these values since it seems to fix a valid issue.
Changing these masks came up in relation to Accurate ECN, TSO Requirements. This document is in its final phase at the IETF.
The incorrect retaining of the CWR header flag for middle segments usually is benign. It may (due to a popular misimplementation, which may exist in hardware) mask a CE mark on that very packet carrying the CWR (fbsd had this, see https://reviews.freebsd.org/rG9c251892c0a8612ee1cc6f53e5a7b89039b0f480 ). This would at least delay, possibly induce additional loss - and in a subsequent round, the CE marks (or loss) would get noted and appropriate action taken.
So, unless one specifically looks for this, it's very hard to notice. And in most cases, the CWR-flagged packet would not get marked with CE by the network. However, for correctness - and since the possibility exists to fix this - I'm in favor of changing these masks even for older hardware.