On the receiver side, when new in-sequence data is
processed in tcp_reass(), a socket upcall will inform
the consumer of the newly arrived data. When more than
1/4 of the socket receive buffer is read by the
application, during that very same upcall, a window
update can be sent out.
If this window update happens during a SACK loss
recovery episode, and since the update to sackblks[] only
happens once tcp_reass() is finished, the window
update will contain old SACK information, while the ack
number is updated. Effective, such an (partial) ACK looks
like a DSACK.
Certain other stack have begun to dynamically adjust their
dupthresh or RTO limits based on information, if a receiver
informs the sender of unnecessary retransmissions. Such
a TCP sender is consuming the DSACK information to dynamically
adjust to an environment with high reordering extent.
Overall, the erroneous coupling of old SACK information
in a window update can lead to very delayed and inefficient
loss recovery for the remainder of an affected TCP session.
Thanks to Cheng for the significant effort to investigate
the root cause of this sporadic and hard to pinpoint
performance issue when sending data towards BSD.