Page MenuHomeFreeBSD

Only rewind erraneous RTO when previous values are truly valid
ClosedPublic

Authored by rscheff on Jan 21 2022, 6:30 AM.
Tags
None
Referenced Files
F108805852: D33979.diff
Tue, Jan 28, 3:24 AM
F108691679: D33979.id.diff
Mon, Jan 27, 8:34 AM
Unknown Object (File)
Sat, Jan 25, 1:34 AM
Unknown Object (File)
Fri, Jan 24, 11:52 AM
Unknown Object (File)
Fri, Jan 24, 10:44 AM
Unknown Object (File)
Sun, Jan 19, 2:37 PM
Unknown Object (File)
Fri, Jan 10, 5:10 PM
Unknown Object (File)
Dec 27 2024, 12:59 PM

Details

Summary

Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

This was tracked down to happen in this section. This happens
when there has been a retransmission with the previous state
recorded, and on subsequent regular transmissions, TF_PREVVALID
is never cleared.

In the transport call we discussed this, and while this should be
considered as a workaround, the longer term plan is to find in which
codepath the PREVVALID flag is not cleared, and ultimately rely solely
on this flag. Note that this will be a minuscle change in semantics of this
flag, and therefore, it should also be invalideated whenever the TCP
stack changes on a session.
The alternate approach would be to reset TF_PREVVALID whenever
tp->t_rxtshift is updated, such that TF_PREVVALID is only ever
set, when rxtshift is also 1. This would reduce the currently
necessary check for both states.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 44089
Build 40977: arc lint + arc unit

Event Timeline

Looks good to me. Thanks.

sys/netinet/tcp_input.c
1651–1654

As long as you are fixing things, to_tsecr and t_badrxtwin are timestamps not sequence numbers
so though its the same we really should be using

TSTMP_LT

not

SEQ_LT

  • tidying up all the conditionals, where the base stack unwinds RTO use TSTMP_LT, after checking that t_badrxtwin holds a valid value throughout.
  • check for bad retransmits only when TSopt is present in ACK
sys/netinet/tcp_input.c
2892

Is this CC_RTO_ERR handling a DUP of that in line# 1655, given a fact that both looks to be under the same conditions in this change?

sys/netinet/tcp_input.c
2892

I believe the section around #1655 is only relevant for pure ACKs, while here we also process piggy-backed ACKs with data. As calling CC_RTO_ERR resets the PREVVALID flag, is it safe to run over this twice.

If the question was around a more streamlined approach, rather than checking to unwind erraneous RTOs in different places, that would be a change of much larger scope.

sys/netinet/tcp_input.c
2892

On the contrary, the comment just on top of line# 1655 shows "Parse options on any incoming segment.", and my read on line# 1801 shows "This is a pure ack for outstanding data." on top of the non-timestamp case.

If the "Parse options on any incoming segment." place still stands true, and D8556 adds this CC_RTO_ERR handling on any segment with timestamp just below it for years, we may merge all the three places of CC_RTO_ERR handling into a single place as a more streamlined approach.

Thoughts?

  • exclude tsecr == 0, since that likely flags an invalid ts echo return
  • adding Note around the (current) use of the PREVVALID flag
This revision is now accepted and ready to land.Jan 27 2022, 5:11 PM