When a certain period has passed since a TCP sender has
received network feedback on its half-connection, the
congestion window is supposed to be reset to the initial
window.
Until now, the t_rcvtime, which is updated for every
incoming segment, including pure ACKs and data, was
used as a proxy for when the last (data) transmission
was performed. This works fine for sessions doing
mostly bulk transfers in a single direction. However,
this approach fails for transactional IO, where the
server transmits large chunks of data repeatedly,
after the client requests data with a variable pause
in between requests.
In that case, the incoming request would effectively
reset t_rcvtime, and the sender would retain the last
value of its congestion window, however large that
may have been. Ultimately, this results in a large
burst of data to be transmitted blindly into the
network at wirespeed, without considering any potentially
changed network conditions. This can exacerbate any
induced packet losses significantly.
In this Diff, the existing rtt sampling mechanism is
used, to gather more appropriate timestamps of when
the last data segment was sent, and the check, if an
RTT sampling is currently runnig is moved from looking
at t_rtttime to t_rtseq.
Further, we also slightly adjust these variables, in
case they happen to be zero when a new sampling is
started.
There is a minuscle chance, that a dramatically delayed
RTT sample is collected, when a data segment happens
to end with an absolute sequence number of zero (as
that would not stop the RTT sample immediately), and at
that very moment, no further data is exchanged until a
much later time. However, this would always be a transient
effect, as sRTT and RTTvar will converge quickly to
appropriate values again, and the excessive timeout
value may not even be utilized at all either.
Reported-by: rrs