Page MenuHomeFreeBSD

tcp: fix erroneous transmission selection after RTO w/ SACK incoming
ClosedPublic

Authored by rscheff on Jan 7 2024, 4:24 PM.
Tags
None
Referenced Files
F107124715: D43355.diff
Fri, Jan 10, 12:17 PM
F107094467: D43355.id133386.diff
Fri, Jan 10, 12:47 AM
Unknown Object (File)
Wed, Dec 18, 6:23 PM
Unknown Object (File)
Dec 5 2024, 9:23 PM
Unknown Object (File)
Dec 4 2024, 10:37 PM
Unknown Object (File)
Dec 2 2024, 1:24 AM
Unknown Object (File)
Nov 24 2024, 3:12 PM
Unknown Object (File)
Nov 19 2024, 2:09 PM
Subscribers

Details

Summary

A subtle bug was present in the base TCP stack, virtually since the
inception of SACK loss recovery. However, for a number of reasons
it doesn't show up easily.

Under normal circumstances, when SACK loss recovery is initiated,
snd_nxt and snd_max track each other. Only during non-SACK loss
recovery, snd_nxt deviates from snd_max.

On RTO, snd_nxt is reset to snd_una. Prior to RFC6675 SACK loss
recovery initialization, SACK loss recovery wouldn't have been
initiated right away with the first incoming SACK ACK.

In addition, cwnd used to be 1 MSS right after RTO, increasing
to 2 MSS more recently.

Furthermore, TSO/LRO typically deliver one ACK covering two or
more segments, thus masking the issue of tcp_output alternating
between retransmitting from snd_nxt (without SACK, thus not
dragging hole->rxmit right), followed by retransmitting
from the SACK hole - resulting in the same data sent twice
until a full ACK without SACK (and an empty scoreboard) is reached.

Address this by setting up snd_recover just in cc_cong_signal.

MFC after: 1 week

Test Plan

Disable TSO & LRO on sender and receiver:

ifconfig <if> -lro -tso

Disable behavior like LRD (Lost Retransmission Detection),
PRR (Proportional Rate Reduction) to demonstrate the issue similar to
what exists in older FreeBSD versions:

sysctl net.inet.tcp.do_prr=0
sysctl net.inet.tcp.do_lrd=0 (or net.inet.tcp.sack.lrd=0)

Start an iperf3 server in the background on the receiver:

iperf3 -s

Start a script to use ipfw to induce sudden, and longer loss periods:

while [ 1 ]; do

  1. induce a little loss for ~10-20 ms, to induce SACK loss recovery ipfw add 10 deny tcp from any to any 5201; ipfw delete 10;
  2. induce a longer loss period ~100ms, to trigger RTO ipfw add 10 deny tcp from any to any 5201; sleep 0.1; ipfw delete 10;
  3. allow all packets for 4000 ms to build up the congestion window sleep 4; done

Finally start a tcpdump on the sender:

tcpdump -i <if> -w <file> -s 128 tcp and port 5201 &

and start traffic:

iperf3 -c <client> -t 45

Finally, stop the trace and inspect it for occurances of SACK blocks
and that every ACK during the loss episode is covering at most one
MSS (typically ~1500 bytes). After the RTO with SACK, it is normal
that cwnd remains at 2 MSS, but the presence of DSACK blocks, or
segments covering the same data twice demonstrates this issue.

packetdrill demonstration of packets getting sent twice:

#cat RTO-while-SACK-LR.pkt
//
// A simple test to verify behavior when
// during SACK LR, an RTO happens - and packets
// are sent twice - once from the sackhole, once
// from the pulled back snd_nxt - D43355
//
// Runs clean against FreeBSD 15.0-CURRENT #17 main-n272693-dfcb8de5ef80
//
--tolerance_usecs=25000

0.0 `
ifconfig tun0 tso
sysctl net.inet.tcp.sack.lrd=0
sysctl net.inet.tcp.do_prr=0
sysctl net.inet.tcp.sack.revised=1
sysctl net.inet.tcp.sack.enable=1
sysctl net.inet.tcp.cc.algorithm=cubic
sysctl net.inet.tcp.hostcache.purgenow=1
`
// Create a listening TCP socket.

0.1 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.001 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.005 bind(3, ..., ...) = 0
+0.005 listen(3, 1) = 0

// Establish a TCP connection.
+0.035 < S  0:0(0) win 65535 <mss 1000, sackOK, wscale 9, nop, nop, nop >
+0.000 > S. 0:0(0) ack 1 win 65535 <...>
+0.000 <  . 1:1(0) ack 1 win 65535
+0.000 accept(3, ..., ...) = 4

+0.01 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.2  write(4, ..., 100000) = 100000

+0.0     >  .     1:1001(1000) ack 1
+0.0     >  .  1001:2001(1000) ack 1
+0.0     >  .  2001:3001(1000) ack 1
+0.0     >  .  3001:4001(1000) ack 1
+0.0     >  .  4001:5001(1000) ack 1
+0.0     >  .  5001:6001(1000) ack 1
+0.0     >  .  6001:7001(1000) ack 1
+0.0     >  .  7001:8001(1000) ack 1
+0.0     >  .  8001:9001(1000) ack 1
+0.0     >  .  9001:10001(1000) ack 1
+0.001     <  .     1:1(0) ack 1001 win 65535  // ACKing every packet
+0.001     <  .     1:1(0) ack 2001 win 65535  // to built up cwnd fast
+0.001     <  .     1:1(0) ack 3001 win 65535
+0.001     <  .     1:1(0) ack 4001 win 65535
+0.001     <  .     1:1(0) ack 5001 win 65535
+0.001     <  .     1:1(0) ack 6001 win 65535
+0.001     <  .     1:1(0) ack 7001 win 65535
+0.001     <  .     1:1(0) ack 8001 win 65535
+0.001     <  .     1:1(0) ack 9001 win 65535
+0.001     <  .     1:1(0) ack 10001 win 65535
+0      >  . 10001:11001(1000) ack 1
+0      >  . 11001:12001(1000) ack 1
+0      >  . 12001:13001(1000) ack 1
+0      >  . 13001:14001(1000) ack 1
+0      >  . 14001:15001(1000) ack 1
+0      >  . 15001:16001(1000) ack 1
+0      >  . 16001:17001(1000) ack 1
+0      >  . 17001:18001(1000) ack 1
+0      >  . 18001:19001(1000) ack 1
+0      >  . 19001:20001(1000) ack 1
+0.001       <  .  1:1(0) ack 11001 win 65535
+0.001       <  .  1:1(0) ack 12001 win 65535
+0.001       <  .  1:1(0) ack 13001 win 65535
+0.001       <  .  1:1(0) ack 14001 win 65535
+0.001       <  .  1:1(0) ack 15001 win 65535
+0.001       <  .  1:1(0) ack 16001 win 65535
+0.001       <  .  1:1(0) ack 17001 win 65535
+0.001       <  .  1:1(0) ack 18001 win 65535
+0.001       <  .  1:1(0) ack 19001 win 65535
+0.001       <  .  1:1(0) ack 20001 win 65535
+0     >  . 20001:21001(1000) ack 1
+0     >  . 21001:22001(1000) ack 1
+0     >  . 22001:23001(1000) ack 1
+0     >  . 23001:24001(1000) ack 1
+0     >  . 24001:25001(1000) ack 1
+0     >  . 25001:26001(1000) ack 1
+0     >  . 26001:27001(1000) ack 1
+0     >  . 27001:28001(1000) ack 1
+0     >  . 28001:29001(1000) ack 1
+0     >  . 29001:30001(1000) ack 1
+0.001       <  .  1:1(0) ack 21001 win 65535
+0.001       <  .  1:1(0) ack 22001 win 65535
+0.001       <  .  1:1(0) ack 23001 win 65535
+0.001       <  .  1:1(0) ack 24001 win 65535
+0.001       <  .  1:1(0) ack 25001 win 65535
+0.001       <  .  1:1(0) ack 26001 win 65535
+0.001       <  .  1:1(0) ack 27001 win 65535
+0.001       <  .  1:1(0) ack 28001 win 65535
+0.001       <  .  1:1(0) ack 29001 win 65535
+0.001       <  .  1:1(0) ack 30001 win 65535
+0     >  . 30001:31001(1000) ack 1  // we "drop" the next 30 packets
+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1
+0     >  . 33001:34001(1000) ack 1
+0     >  . 34001:35001(1000) ack 1
+0     >  . 35001:36001(1000) ack 1
+0     >  . 36001:37001(1000) ack 1
+0     >  . 37001:38001(1000) ack 1
+0     >  . 38001:39001(1000) ack 1
+0     >  . 39001:40001(1000) ack 1

+0     >  . 40001:41001(1000) ack 1
+0     >  . 41001:42001(1000) ack 1
+0     >  . 42001:43001(1000) ack 1
+0     >  . 43001:44001(1000) ack 1
+0     >  . 44001:45001(1000) ack 1
+0     >  . 45001:46001(1000) ack 1
+0     >  . 46001:47001(1000) ack 1
+0     >  . 47001:48001(1000) ack 1
+0     >  . 48001:49001(1000) ack 1
+0     >  . 49001:50001(1000) ack 1

+0     >  . 50001:51001(1000) ack 1
+0     >  . 51001:52001(1000) ack 1
+0     >  . 52001:53001(1000) ack 1
+0     >  . 53001:54001(1000) ack 1
+0     >  . 54001:55001(1000) ack 1
+0     >  . 55001:56001(1000) ack 1
+0     >  . 56001:57001(1000) ack 1
+0     >  . 57001:58001(1000) ack 1
+0     >  . 58001:59001(1000) ack 1
+0     >  . 59001:60001(1000) ack 1

+0     >  . 60001:61001(1000) ack 1
+0     >  . 61001:62001(1000) ack 1
+0     >  . 62001:63001(1000) ack 1
+0     >  . 63001:64001(1000) ack 1
+0     >  . 64001:65001(1000) ack 1
+0     >  . 65001:66001(1000) ack 1
+0     >  . 66001:67001(1000) ack 1
+0     >  . 67001:68001(1000) ack 1
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:61001, nop, nop>
+0     >  . 70001:71001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:62001, nop, nop>
+0     >  . 71001:72001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:63001, nop, nop>
+0     >  . 30001:31001(1000) ack 1   // we expect the retransmission around here

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:64001, nop, nop>
+0     >  . 31001:32001(1000) ack 1

+0.20~+0.30  >  . 30001:31001(1000) ack 1
+0.20~+0.60  >  . 30001:31001(1000) ack 1

+0.001   <   . 1:1(0) ack 31001 win 65535 <sack 60001:65001, nop, nop>

+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1

+0.001   <   . 1:1(0) ack 32001 win 65535 <sack 60001:66001, nop, nop>
+0     >  . 32001:33001(1000) ack 1
+0     >  . 33001:34001(1000) ack 1

+0.001   <   . 1:1(0) ack 33001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 34001:35001(1000) ack 1

+0.001   <   . 1:1(0) ack 34001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 34001:35001(1000) ack 1
+0     >  . 35001:36001(1000) ack 1

+0.001   <   . 1:1(0) ack 35001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 36001:37001(1000) ack 1

+0.001   <   . 1:1(0) ack 36001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 36001:37001(1000) ack 1
+0     >  . 37001:38001(1000) ack 1

+0.001   <   . 1:1(0) ack 37001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 38001:39001(1000) ack 1

+0.001   <   . 1:1(0) ack 38001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 38001:39001(1000) ack 1
+0     > .  39001:40001(1000) ack 1

+0.001   <   . 1:1(0) ack 39001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  40001:41001(1000) ack 1

+0.001   <   . 1:1(0) ack 40001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  40001:41001(1000) ack 1
+0     > .  41001:42001(1000) ack 1

+0.001   <   . 1:1(0) ack 41001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  42001:43001(1000) ack 1

+0.001   <   . 1:1(0) ack 42001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  42001:43001(1000) ack 1
+0     > .  43001:44001(1000) ack 1

+0.001   <   . 1:1(0) ack 43001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 44001:45001(1000) ack 1

+0.001   <   . 1:1(0) ack 44001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 44001:45001(1000) ack 1
+0     >  . 45001:46001(1000) ack 1

+0.001   <   . 1:1(0) ack 45001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 46001:47001(1000) ack 1

+0.001   <   . 1:1(0) ack 46001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 46001:47001(1000) ack 1
+0     >  . 47001:48001(1000) ack 1

+0.001   <   . 1:1(0) ack 47001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 48001:49001(1000) ack 1

+0.001   <   . 1:1(0) ack 48001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 48001:49001(1000) ack 1
+0     >  . 49001:50001(1000) ack 1

+0.001   <   . 1:1(0) ack 49001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 50001:51001(1000) ack 1

+0.001   <   . 1:1(0) ack 50001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 50001:51001(1000) ack 1
+0     >  . 51001:52001(1000) ack 1

+0.001   <   . 1:1(0) ack 51001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 52001:53001(1000) ack 1

+0.001   <   . 1:1(0) ack 52001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 52001:53001(1000) ack 1
+0     >  . 53001:54001(1000) ack 1

+0.001   <   . 1:1(0) ack 53001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 54001:55001(1000) ack 1

+0.001   <   . 1:1(0) ack 54001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 54001:55001(1000) ack 1
+0     >  . 55001:56001(1000) ack 1

+0.001   <   . 1:1(0) ack 55001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 56001:57001(1000) ack 1

+0.001   <   . 1:1(0) ack 56001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 56001:57001(1000) ack 1
+0     >  . 57001:58001(1000) ack 1

+0.001   <   . 1:1(0) ack 57001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 58001:59001(1000) ack 1

+0.001   <   . 1:1(0) ack 58001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 58001:59001(1000) ack 1
+0     >  . 59001:60001(1000) ack 1

+0.001   <   . 1:1(0) ack 59001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 67001 win 65535
+0     >  . 67001:68001(1000) ack 1
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 69001 win 65535
+0     >  . 70001:71001(1000) ack 1
+0     >  . 71001:72001(1000) ack 1
+0     >  . 72001:73001(1000) ack 1

+0.001   <   . 1:1(0) ack 71001 win 65535
+0     >  . 73001:74001(1000) ack 1
+0     >  . 74001:75001(1000) ack 1
+0     >  . 75001:76001(1000) ack 1
+0     >  . 76001:77001(1000) ack 1

+0.001   <   . 1:1(0) ack 73001 win 65535
+0     >  . 77001:78001(1000) ack 1
+0     >  . 78001:79001(1000) ack 1
+0     >  . 79001:80001(1000) ack 1
+0     >  . 80001:81001(1000) ack 1

+0.001   <   . 1:1(0) ack 75001 win 65535
+0     >  . 81001:82001(1000) ack 1
+0     >  . 82001:83001(1000) ack 1
+0     >  . 83001:84001(1000) ack 1
+0     >  . 84001:85001(1000) ack 1

+0.001   <   . 1:1(0) ack 77001 win 65535
+0     >  . 85001:86001(1000) ack 1
+0     >  . 86001:87001(1000) ack 1
+0     >  . 87001:88001(1000) ack 1
+0     >  . 88001:89001(1000) ack 1

+0.001   <   . 1:1(0) ack 79001 win 65535
+0     >  . 89001:90001(1000) ack 1
+0     >  . 90001:91001(1000) ack 1
+0     >  . 91001:92001(1000) ack 1
+0     >  . 92001:93001(1000) ack 1

+0.001   <   . 1:1(0) ack 81001 win 65535
+0     >  . 93001:94001(1000) ack 1
+0     >  . 94001:95001(1000) ack 1
+0     >  . 95001:96001(1000) ack 1
+0     >  . 96001:97001(1000) ack 1

+0.001   <   . 1:1(0) ack 83001 win 65535
+0     >  . 97001:98001(1000) ack 1
+0     >  . 98001:99001(1000) ack 1

+0     > P. 99001:100001(1000) ack 1

+0.001   <   . 1:1(0) ack 85001 win 65535

+0.001   <   . 1:1(0) ack 87001 win 65535

+0.001   <   . 1:1(0) ack 89001 win 65535

+0.001   <   . 1:1(0) ack 91001 win 65535

+0.001   <   . 1:1(0) ack 93001 win 65535

+0.001   <   . 1:1(0) ack 94001 win 65535

+0.001   <   . 1:1(0) ack 96001 win 65535

+0.001   <   . 1:1(0) ack 98001 win 65535

+0.001   <   . 1:1(0) ack 100001 win 65535
+0.010 close(4) = 0
+0 > F. 100001:100001(0) ack 1
+0.1 < F. 1:1(0) ack 100002 win 65535

`
`

With the fix, the exchange then looks like this - not that there are no duplicate packets any more, and a 2nd round of SACKs sind the trailing packets are simulated to be lost:

//
// A simple test to verify behavior when
// during SACK LR, an RTO happens - and packets
// are sent twice - once from the sackhole, once
// from the pulled back snd_nxt - D43355
//
// Runs clean against FreeBSD 15.0-CURRENT #16 D43355_sack_after_rto-n272715-073145f14da4
//
--tolerance_usecs=25000

0.0 `
ifconfig tun0 tso
sysctl net.inet.tcp.sack.lrd=0
sysctl net.inet.tcp.do_prr=0
sysctl net.inet.tcp.sack.revised=1
sysctl net.inet.tcp.sack.enable=1
sysctl net.inet.tcp.cc.algorithm=cubic
sysctl net.inet.tcp.hostcache.purgenow=1
`
// Create a listening TCP socket.

0.1 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.001 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.005 bind(3, ..., ...) = 0
+0.005 listen(3, 1) = 0

// Establish a TCP connection.
+0.035 < S  0:0(0) win 65535 <mss 1000, sackOK, wscale 9, nop, nop, nop >
+0.000 > S. 0:0(0) ack 1 win 65535 <...>
+0.000 <  . 1:1(0) ack 1 win 65535
+0.000 accept(3, ..., ...) = 4

+0.01 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.2  write(4, ..., 100000) = 100000

+0.0     >  .     1:1001(1000) ack 1
+0.0     >  .  1001:2001(1000) ack 1
+0.0     >  .  2001:3001(1000) ack 1
+0.0     >  .  3001:4001(1000) ack 1
+0.0     >  .  4001:5001(1000) ack 1
+0.0     >  .  5001:6001(1000) ack 1
+0.0     >  .  6001:7001(1000) ack 1
+0.0     >  .  7001:8001(1000) ack 1
+0.0     >  .  8001:9001(1000) ack 1
+0.0     >  .  9001:10001(1000) ack 1
+0.001     <  .     1:1(0) ack 1001 win 65535  // ACKing every packet
+0.001     <  .     1:1(0) ack 2001 win 65535  // to built up cwnd fast
+0.001     <  .     1:1(0) ack 3001 win 65535
+0.001     <  .     1:1(0) ack 4001 win 65535
+0.001     <  .     1:1(0) ack 5001 win 65535
+0.001     <  .     1:1(0) ack 6001 win 65535
+0.001     <  .     1:1(0) ack 7001 win 65535
+0.001     <  .     1:1(0) ack 8001 win 65535
+0.001     <  .     1:1(0) ack 9001 win 65535
+0.001     <  .     1:1(0) ack 10001 win 65535
+0      >  . 10001:11001(1000) ack 1
+0      >  . 11001:12001(1000) ack 1
+0      >  . 12001:13001(1000) ack 1
+0      >  . 13001:14001(1000) ack 1
+0      >  . 14001:15001(1000) ack 1
+0      >  . 15001:16001(1000) ack 1
+0      >  . 16001:17001(1000) ack 1
+0      >  . 17001:18001(1000) ack 1
+0      >  . 18001:19001(1000) ack 1
+0      >  . 19001:20001(1000) ack 1
+0.001       <  .  1:1(0) ack 11001 win 65535
+0.001       <  .  1:1(0) ack 12001 win 65535
+0.001       <  .  1:1(0) ack 13001 win 65535
+0.001       <  .  1:1(0) ack 14001 win 65535
+0.001       <  .  1:1(0) ack 15001 win 65535
+0.001       <  .  1:1(0) ack 16001 win 65535
+0.001       <  .  1:1(0) ack 17001 win 65535
+0.001       <  .  1:1(0) ack 18001 win 65535
+0.001       <  .  1:1(0) ack 19001 win 65535
+0.001       <  .  1:1(0) ack 20001 win 65535
+0     >  . 20001:21001(1000) ack 1
+0     >  . 21001:22001(1000) ack 1
+0     >  . 22001:23001(1000) ack 1
+0     >  . 23001:24001(1000) ack 1
+0     >  . 24001:25001(1000) ack 1
+0     >  . 25001:26001(1000) ack 1
+0     >  . 26001:27001(1000) ack 1
+0     >  . 27001:28001(1000) ack 1
+0     >  . 28001:29001(1000) ack 1
+0     >  . 29001:30001(1000) ack 1
+0.001       <  .  1:1(0) ack 21001 win 65535
+0.001       <  .  1:1(0) ack 22001 win 65535
+0.001       <  .  1:1(0) ack 23001 win 65535
+0.001       <  .  1:1(0) ack 24001 win 65535
+0.001       <  .  1:1(0) ack 25001 win 65535
+0.001       <  .  1:1(0) ack 26001 win 65535
+0.001       <  .  1:1(0) ack 27001 win 65535
+0.001       <  .  1:1(0) ack 28001 win 65535
+0.001       <  .  1:1(0) ack 29001 win 65535
+0.001       <  .  1:1(0) ack 30001 win 65535
+0     >  . 30001:31001(1000) ack 1  // we "drop" the next 30 packets
+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1
+0     >  . 33001:34001(1000) ack 1
+0     >  . 34001:35001(1000) ack 1
+0     >  . 35001:36001(1000) ack 1
+0     >  . 36001:37001(1000) ack 1
+0     >  . 37001:38001(1000) ack 1
+0     >  . 38001:39001(1000) ack 1
+0     >  . 39001:40001(1000) ack 1

+0     >  . 40001:41001(1000) ack 1
+0     >  . 41001:42001(1000) ack 1
+0     >  . 42001:43001(1000) ack 1
+0     >  . 43001:44001(1000) ack 1
+0     >  . 44001:45001(1000) ack 1
+0     >  . 45001:46001(1000) ack 1
+0     >  . 46001:47001(1000) ack 1
+0     >  . 47001:48001(1000) ack 1
+0     >  . 48001:49001(1000) ack 1
+0     >  . 49001:50001(1000) ack 1

+0     >  . 50001:51001(1000) ack 1
+0     >  . 51001:52001(1000) ack 1
+0     >  . 52001:53001(1000) ack 1
+0     >  . 53001:54001(1000) ack 1
+0     >  . 54001:55001(1000) ack 1
+0     >  . 55001:56001(1000) ack 1
+0     >  . 56001:57001(1000) ack 1
+0     >  . 57001:58001(1000) ack 1
+0     >  . 58001:59001(1000) ack 1
+0     >  . 59001:60001(1000) ack 1

+0     >  . 60001:61001(1000) ack 1
+0     >  . 61001:62001(1000) ack 1
+0     >  . 62001:63001(1000) ack 1
+0     >  . 63001:64001(1000) ack 1
+0     >  . 64001:65001(1000) ack 1
+0     >  . 65001:66001(1000) ack 1
+0     >  . 66001:67001(1000) ack 1
+0     >  . 67001:68001(1000) ack 1
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:61001, nop, nop>
+0     >  . 70001:71001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:62001, nop, nop>
+0     >  . 71001:72001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:63001, nop, nop>
+0     >  . 30001:31001(1000) ack 1   // we expect the retransmission around here

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:64001, nop, nop>
+0     >  . 31001:32001(1000) ack 1

+0.20~+0.30  >  . 30001:31001(1000) ack 1
+0.20~+0.60  >  . 30001:31001(1000) ack 1

+0.001   <   . 1:1(0) ack 31001 win 65535 <sack 60001:65001, nop, nop>

+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1

+0.001   <   . 1:1(0) ack 32001 win 65535 <sack 60001:66001, nop, nop>
+0     >  . 33001:34001(1000) ack 1
+0     >  . 34001:35001(1000) ack 1

+0.001   <   . 1:1(0) ack 33001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 35001:36001(1000) ack 1
+0     >  . 36001:37001(1000) ack 1

+0.001   <   . 1:1(0) ack 34001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 37001:38001(1000) ack 1
+0     >  . 38001:39001(1000) ack 1

+0.001   <   . 1:1(0) ack 35001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  39001:40001(1000) ack 1
+0     > .  40001:41001(1000) ack 1

+0.001   <   . 1:1(0) ack 36001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  41001:42001(1000) ack 1
+0     > .  42001:43001(1000) ack 1

+0.001   <   . 1:1(0) ack 37001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  43001:44001(1000) ack 1
+0     >  . 44001:45001(1000) ack 1

+0.001   <   . 1:1(0) ack 38001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 45001:46001(1000) ack 1
+0     >  . 46001:47001(1000) ack 1

+0.001   <   . 1:1(0) ack 39001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 47001:48001(1000) ack 1
+0     >  . 48001:49001(1000) ack 1

+0.001   <   . 1:1(0) ack 40001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 49001:50001(1000) ack 1
+0     >  . 50001:51001(1000) ack 1

+0.001   <   . 1:1(0) ack 41001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 51001:52001(1000) ack 1
+0     >  . 52001:53001(1000) ack 1

+0.001   <   . 1:1(0) ack 42001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 53001:54001(1000) ack 1
+0     >  . 54001:55001(1000) ack 1

+0.001   <   . 1:1(0) ack 43001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 55001:56001(1000) ack 1
+0     >  . 56001:57001(1000) ack 1

+0.001   <   . 1:1(0) ack 44001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 57001:58001(1000) ack 1
+0     >  . 58001:59001(1000) ack 1

+0.001   <   . 1:1(0) ack 45001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 59001:60001(1000) ack 1
+0     >  . 67001:68001(1000) ack 1

+0.001   <   . 1:1(0) ack 46001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 47001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 70001:71001(1000) ack 1
+0     >  . 71001:72001(1000) ack 1

+0.001   <   . 1:1(0) ack 48001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 49001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 50001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 51001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 72001:73001(1000) ack 1

+0.001   <   . 1:1(0) ack 52001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 73001:74001(1000) ack 1
+0     >  . 74001:75001(1000) ack 1

+0.001   <   . 1:1(0) ack 53001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 75001:76001(1000) ack 1
+0     >  . 76001:77001(1000) ack 1

+0.001   <   . 1:1(0) ack 54001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 77001:78001(1000) ack 1
+0     >  . 78001:79001(1000) ack 1

+0.001   <   . 1:1(0) ack 55001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 79001:80001(1000) ack 1
+0     >  . 80001:81001(1000) ack 1

+0.001   <   . 1:1(0) ack 56001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 81001:82001(1000) ack 1
+0     >  . 82001:83001(1000) ack 1

+0.001   <   . 1:1(0) ack 57001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 83001:84001(1000) ack 1
+0     >  . 84001:85001(1000) ack 1

+0.001   <   . 1:1(0) ack 58001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 85001:86001(1000) ack 1
+0     >  . 86001:87001(1000) ack 1

+0.001   <   . 1:1(0) ack 59001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 87001:88001(1000) ack 1
+0     >  . 88001:89001(1000) ack 1

+0.001   <   . 1:1(0) ack 67001 win 65535
+0     >  . 89001:90001(1000) ack 1
+0     >  . 90001:91001(1000) ack 1
+0     >  . 91001:92001(1000) ack 1
+0     >  . 92001:93001(1000) ack 1
+0     >  . 93001:94001(1000) ack 1
+0     >  . 94001:95001(1000) ack 1
+0     >  . 95001:96001(1000) ack 1
+0     >  . 96001:97001(1000) ack 1
+0     >  . 97001:98001(1000) ack 1
+0     >  . 98001:99001(1000) ack 1

+0.001   <   . 1:1(0) ack 69001 win 65535
+0     > P. 99001:100001(1000) ack 1

+0.001   <   . 1:1(0) ack 71001 win 65535

+0.001   <   . 1:1(0) ack 73001 win 65535

+0.001   <   . 1:1(0) ack 75001 win 65535

+0.001   <   . 1:1(0) ack 77001 win 65535

+0.001   <   . 1:1(0) ack 79001 win 65535

+0.001   <   . 1:1(0) ack 81001 win 65535

+0.001   <   . 1:1(0) ack 83001 win 65535

+0.001   <   . 1:1(0) ack 85001 win 65535

+0.001   <   . 1:1(0) ack 87001 win 65535

+0.001   <   . 1:1(0) ack 89001 win 65535

+0.001   <   . 1:1(0) ack 91001 win 65535

+0.001   <   . 1:1(0) ack 93001 win 65535



+0.001   <   . 1:1(0) ack 94001 win 65535

+0.001   <   . 1:1(0) ack 96001 win 65535

+0.001   <   . 1:1(0) ack 98001 win 65535

+0.001   <   . 1:1(0) ack 100001 win 65535
+0.010 close(4) = 0
+0 > F. 100001:100001(0) ack 1
+0.1 < F. 1:1(0) ack 100002 win 65535

`
`

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 55361
Build 52250: arc lint + arc unit

Event Timeline

  • set snd_recover to snd_fack on RTO
  • pull snd_nxt right during SACK after RTO
  • init recover to fack on RTO
  • pull snd_nxt right during SACK after RTO
  • only enter FastRecovery per RFC6675 when on the right edge
  • prevent retransmitting of covered SACK holes when FR activates as snd_nxt closes with snd_max
  • prevent sack-resending at nxt==max. Will still enter FR (and CC reaction)
  • Revert previous commit, as it can lead to SACK accounting panics
  • pull snd_nxt forward when transmitting from SACK hole
  • only enter FR by 6675 when at the right edge, and when after RTO, only when new SACK holes show up
  • update sack_adjust comments

With this patch, the classic (pre-SACK) loss-recovery "indication" of snd_nxt < snd_max as well as the IN_FASTRECOVERY() flags will follow the SACK transmission selection path, if appropriate. The difference being that during IN_FASTRECOVERY, the per-ACK CC reaction (cwnd adjustment) is disabled - except for PRR - while in the case of (snd_nxt < snd_max), the CC controls the congestion window (e.g. uses slow start after the RTO until reaching ssthresh, then using congestion avoidance).

tcp_sack_adjust will also inflate cwnd (what normally - IN_FASTRECOVERY - happens when calculating the pipe).

In addition, the RFC6675 trigger to enter Fast Recovery (an ACK with a SACK option covering at least 2*MSS+1 bytes) is only active while snd_nxt is at the right edge, and requires additional indications of new losses right after a Retransmission Timeout.

In order for normal SACK processing to work correctly, after successfully transmitting a packet, snd_nxt has to be dragged along - eventually becoming equal to snd_max again, thus the additional checks in the RFC6675 FR trigger.

  • when retransmitting after RTO, don't inflate cnwd to nxt-una+mss, but only cwnd+mss
  • when retransmitting from (nxt<max), use cwnd only
  • simplify 6675 FR trigger condition
This revision is now accepted and ready to land.Apr 4 2024, 12:45 PM

Your summary section claims "In addition, cwnd used to be 1 MSS right after RTO, increasing to 2 MSS more recently." But I could not find the code change where cwnd is changed to 2MSS after RTO. Please elaborate if the summary needs to be revised or I am missing the point.

This revision now requires review to proceed.Sep 18 2024, 8:08 AM
sys/netinet/tcp_input.c
2732–2733

For readability, it's better to remove () between relational operators:

					if (tp->snd_nxt == tp->snd_max &&
					    tp->t_rxtshift == 0)
2777–2780

For readability, it's better to remove () between relational operators and re-align:

	if (tcp_is_sack_recovery(tp, &to) && tp->snd_nxt == tp->snd_max &&
	    ((tp->t_rxtshift == 0 && sack_changed != SACK_NOCHANGE) ||
	     (tp->t_rxtshift > 0 && sack_changed == SACK_NEWLOSS))) {
		...
	}
sys/netinet/tcp_output.c
399

Same here for readability, it's better to remove () between relational operators:

	if (sack_bytes_rxmt == 0 || SEQ_LT(tp->snd_nxt, tp->snd_max)) {
1631–1637

Same here for readability, it's better to remove () between relational operators and re-align:

	if (error == 0 && tp->rcv_numsacks > 0 &&
	    TCPS_HAVEESTABLISHED(tp->t_state) &&
	    tp->t_flags & TF_SACK_PERMIT) {
1642–1645

Same here for readability, it's better to remove () between relational operators and re-align:

	if (error == 0 && sack_rxmit &&
	    SEQ_LT(tp->snd_nxt, SEQ_MIN(p->rxmit, p->end))) {
sys/netinet/tcp_sack.c
1090–1091

no need ()

1090–1091

no need ()

1101–1102

no need ()

1118

no need ()

1122

no need ()

1124

no need ()

sys/netinet/tcp_sack.c
1090–1091

Isn't style.9 saying:

Values in return statements should be enclosed in parentheses.
sys/netinet/tcp_sack.c
1090–1091

You are right, other tcp files also has the () in return values. I must have kept some old memory about the style.

With the provided packetdrill scripts before/after the fix, my test result is in my wiki: testD43355.

Besides the removing of dup retrans, also looks the cwnd and ssthresh both have improvement.

Other than the readability that I suggested, I have no problem with this patch. Thanks for the fix.

This revision is now accepted and ready to land.Oct 10 2024, 3:18 PM

By the way based on my test, I didn't find this statement In addition, cwnd used to be 1 MSS right after RTO, increasing to 2 MSS more recently. to be true in your SUMMARY section. Also Address this by setting up snd_recover just in cc_cong_signal. needs to be revised.