Page MenuHomeFreeBSD

tcp: revert rxtshift too on a spurious timeout (RTO)
Needs ReviewPublic

Authored by rscheff on Fri, Jan 24, 12:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Feb 1, 6:16 AM
Unknown Object (File)
Thu, Jan 30, 7:17 AM
Unknown Object (File)
Tue, Jan 28, 2:20 PM
Unknown Object (File)
Tue, Jan 28, 2:23 AM
Unknown Object (File)
Mon, Jan 27, 2:33 PM
Unknown Object (File)
Sun, Jan 26, 8:35 PM
Unknown Object (File)
Sun, Jan 26, 5:46 AM
Subscribers

Details

Reviewers
glebius
cc
tuexen
Group Reviewers
transport
Summary

When the tcp stack detects that a retransmission timeout was
spurious, many state variables are reverted back to their
previous values. However, the counter of how many subsequent
retransmission timeouts has been missed.

This can lead to an incorrect calculation of the congestion
window during Limited Transmit, as it is not recomputed
but only extended - possibly allowing more than one segment
to be transmitted for each limited transmit iteration.

Clearing the counter tracking the number of consecutive RTOs
will ensure that the calculation of the congestion window
during limited transmit is correct.

PR: 282605
MFC after: 3 days

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 61974
Build 58858: arc lint + arc unit

Event Timeline

Wouldn't be the RTO after an erroneous RTO be computer incorrectly? Let me write a test for that...

Well, if the RTO was indeed spurious, then the subsequent RTO timeout should be the base RTO timeout (again). So, only a false detection of a spurious timeout would (incorrectly) set the cadence to a slightly faster retransmission schedule in subsequent rounds... But if the RTO was spurious, so by definition shouldn't have been triggered anyway, I don't believe this causes a problematic behavior.

Well, if the RTO was indeed spurious, then the subsequent RTO timeout should be the base RTO timeout (again). So, only a false detection of a spurious timeout would (incorrectly) set the cadence to a slightly faster retransmission schedule in subsequent rounds... But if the RTO was spurious, so by definition shouldn't have been triggered anyway, I don't believe this causes a problematic behavior.

I am just saying: After a spurious RTO, the next RTO should be the same, not doubled. This can easily be tested. Such a test would identify the problem your fix addresses... Much more directly than testing limited transmit...

With this patch I caught this:

20250127 16:13:45 all (2/4): tcp2.sh
Expensive callout(9) function: 0xffffffff80d71390(0xfffff80329f81000) 0.010966104 s
Expensive callout(9) function: 0xffffffff80d71390(0xfffff805e03af540) 0.038594659 s
panic: sacked_bytes < 0
cpuid = 7
time = 1737991404
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01081cc920
vpanic() at vpanic+0x136/frame 0xfffffe01081cca50
panic() at panic+0x43/frame 0xfffffe01081ccab0
tcp_sack_doack() at tcp_sack_doack+0x7e4/frame 0xfffffe01081ccb40
tcp_do_segment() at tcp_do_segment+0x2242/frame 0xfffffe01081ccc20
tcp_input_with_port() at tcp_input_with_port+0x10f8/frame 0xfffffe01081ccd70
tcp_input() at tcp_input+0xb/frame 0xfffffe01081ccd80
ip_input() at ip_input+0x28d/frame 0xfffffe01081ccde0
swi_net() at swi_net+0x19b/frame 0xfffffe01081cce60
ithread_loop() at ithread_loop+0x266/frame 0xfffffe01081ccef0
fork_exit() at fork_exit+0x82/frame 0xfffffe01081ccf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01081ccf30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100091 ]
Stopped at      kdb_enter+0x33: movq    $0,0x10503c2(%rip)
db>

Hi Peter,
thanks for letting us know that one can also trigger these issues with the TCP stress tester. Will try to reproduce it with and without fixing the root cause. There might be more than one problem and I prefer to fix them individually instead of avoiding them altogether...