Page MenuHomeFreeBSD

tcp: make inflight data (pipe) calculation consistent
ClosedPublic

Authored by cc on Feb 18 2025, 6:13 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Mar 22, 10:53 AM
Unknown Object (File)
Tue, Mar 18, 2:27 PM
Unknown Object (File)
Sun, Mar 16, 2:09 AM
Unknown Object (File)
Thu, Mar 13, 7:28 PM
Unknown Object (File)
Wed, Mar 12, 1:20 PM
Unknown Object (File)
Wed, Mar 12, 5:02 AM
Unknown Object (File)
Wed, Mar 12, 3:37 AM
Unknown Object (File)
Tue, Mar 11, 9:28 AM

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cc requested review of this revision.Feb 18 2025, 6:13 PM
This revision is now accepted and ready to land.Feb 18 2025, 6:47 PM

Question: why does this function returns signed value? And why some of the sackhint counters are signed?

There are several calls to tcp_compute_pipe() that did not check the V_tcp_do_newsack: 2 in tcp_output and 1 in tcp_input. Their behavior will change with revised SACK turned off. Commit message should explain why this is okay.

Question: why does this function returns signed value? And why some of the sackhint counters are signed?

There are several calls to tcp_compute_pipe() that did not check the V_tcp_do_newsack: 2 in tcp_output and 1 in tcp_input. Their behavior will change with revised SACK turned off. Commit message should explain why this is okay.

Way back to the original tcp_compute_pipe() in commit 12eeb81fc1ce, it was returning signed value, which I would think that fits into many places that require signed comparison or signed operation.
for example, in tcp_default_output()

...
	/*
	 * If we are doing retransmissions, then snd_nxt will
	 * not reflect the first unsent octet.  For ACK only
	 * packets, we do not want the sequence number of the
	 * retransmitted packet, we want the sequence number
	 * of the next unsent octet.  So, if there is no data
	 * (and no SYN or FIN), use snd_max instead of snd_nxt
	 * when filling in ti_seq.  But if we are in persist
	 * state, snd_max might reflect one byte beyond the
	 * right edge of the window, so use snd_nxt in that
	 * case, since we know we aren't doing a retransmission.
	 * (retransmit and persist are mutually exclusive...)
	 */
	if (!sack_rxmit) {
		if (len || (flags & (TH_SYN|TH_FIN)) ||
		    tcp_timer_active(tp, TT_PERSIST))
			th->th_seq = htonl(tp->snd_nxt);
		else
			th->th_seq = htonl(tp->snd_max);
	} else {
		th->th_seq = htonl(p->rxmit);
		p->rxmit += len;
		/*
		 * Lost Retransmission Detection
		 * trigger resending of a (then
		 * still existing) hole, when
		 * fack acks recoverypoint.
		 */
		if ((tp->t_flags & TF_LRD) && SEQ_GEQ(p->rxmit, p->end))
			p->rxmit = tp->snd_recover;
		tp->sackhint.sack_bytes_rexmit += len;                    <<<
	}
...

And given that struct sackhint is hard coded into tcpcb without a toggle, so I think it does no harm or no bother at all with revised SACK turned off. And also I think the toggle V_tcp_do_rfc6675_pipe or V_tcp_do_newsack should be included in tcp_compute_pipe() from the original commit to avoid such later discrepancy.

For these newly added two calls of tcp_compute_pipe() in tcp_output from commit 7dc78150c730, we may double confirm in meeting.

sys/netinet/cc/cc.c
405–410

Missing this part's update.

re-base, add a missing part and update based on Richard's comment in the meeting

This revision now requires review to proceed.Fri, Feb 21, 4:01 PM
sys/netinet/cc/cc.c
410

This old inflight data (pipe) calculation logic could be traced back to the
commit 46f584823798 nearly 25 years ago.

So now we will replace it with a better logic in tcp_compute_pipe().

commit 46f5848237983d7c89140373b8350ec08573b29d
Author: Jonathan Lemon <jlemon@FreeBSD.org>
AuthorDate: Sat May 6 03:31:09 2000 +0000
Commit: Jonathan Lemon <jlemon@FreeBSD.org>
CommitDate: Sat May 6 03:31:09 2000 +0000

Implement TCP NewReno, as documented in RFC 2582.  This allows
better recovery for multiple packet losses in a single window.
The algorithm can be toggled via the sysctl net.inet.tcp.newreno,
which defaults to "on".

Submitted by:  Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>

... in tcp_input.c:

+                } else if (tp->t_dupacks >= tcprexmtthresh &&
+		    !tcp_newreno(tp, th)) {
+                        /*
+                         * Window inflation should have left us with approx.
+                         * snd_ssthresh outstanding data.  But in case we
+                         * would be inclined to send a burst, better to do
+                         * it via the slow start mechanism.
+                         */
+			if (SEQ_GT(th->th_ack + tp->snd_ssthresh, tp->snd_max))
+                                tp->snd_cwnd =
+				    tp->snd_max - th->th_ack + tp->t_maxseg;    <<<<
+			else
+                        	tp->snd_cwnd = tp->snd_ssthresh;
+                        tp->t_dupacks = 0;
+                }

I would suggest to leave the old calculation in place which is only active in the non-default (IIRC) newsack=0 case, but collect all the other instances in one place.

Investigating the impact of changing these 3 locations may warrant dedicated Diffs and validation.

sys/netinet/cc/cc.c
410

Thanks.
Since this was special, let's leave it withoug newsack sysctl in the old stlye; a behavor change (without newsack) of this codepath should be looked at in a dedicated Diff IMHO.

sys/netinet/cc/cc_cubic.c
535–539

Same here..

sys/netinet/cc/cc_htcp.c
385

And here.

code update based on Richard's comments

Didn't see any surprising regression from my test result for this patch:
testD49047

This revision was not accepted when it landed; it landed in state Needs Review.Fri, Feb 28, 8:56 PM
This revision was automatically updated to reflect the committed changes.