Page MenuHomeFreeBSD

tcp: fix a bug where unshifting should be put last in tcp_get_srtt()
ClosedPublic

Authored by cc on May 25 2023, 5:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 9, 5:53 PM
Unknown Object (File)
Sun, Dec 29, 4:33 AM
Unknown Object (File)
Nov 25 2024, 7:31 AM
Unknown Object (File)
Nov 24 2024, 12:59 PM
Unknown Object (File)
Nov 23 2024, 6:06 AM
Unknown Object (File)
Nov 21 2024, 8:19 AM
Unknown Object (File)
Nov 20 2024, 6:34 AM
Unknown Object (File)
Nov 19 2024, 11:01 AM

Details

Summary

This is reported by Bhaskar Pardeshi from VMware.

Test Plan

Verified with siftr

analysis for example:
old minimal srtt_usecs = (1 >> TCP_RTT_SHIFT) * tick = (1 >> 5) * 1000 = 0
new minimal srtt_usecs = (1 * tick) >> TCP_RTT_SHIFT = (1 * (1000)) >> 5 = 31

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cc requested review of this revision.May 25 2023, 5:19 PM
sys/netinet/tcp_subr.c
4665

I think you should add a comment that you use srtt = TICKS_2_USEC(tp->t_srtt) >> TCP_RTT_SHIFT instead of srtt = TICKS_2_USEC(tp->t_srtt >> TCP_RTT_SHIFT) to increase the precision. While doing this, you might want to replace the double space after = by a single one.

Add comment above this fix to explain why it can improve precision.

cc marked an inline comment as done.May 25 2023, 10:27 PM
This revision is now accepted and ready to land.May 26 2023, 7:42 AM

For what its worth...

While this does indeed fix a bug where we lose precision, it also reduces the maximum SRTT value which can be stored in ticks without risking an overflow during the TICKS_2_USEC conversion.

Previously, the value only overflowed when (srtt >> 5) exceeded (UINT32_MAX / hz). Now, the value will overflow if the unshifted srtt exceeds (UINT32_MAX / hz).

Where hz == 1000, this means we will now overflow if srtt > 4,294,967, which translates to roughly 134.217 seconds. Previously, with hz == 1000, we would overflow if srtt > 137,438,975, which translates to roughly 4,294.967 seconds.

It seems unlikely we would see an srtt of over two minutes; however, it is more likely we would see that than an srtt of over 1 hour. Also, using hz values over 1000 (does anyone do this?) will cause the code to be more susceptible to an overflow.

I'm not suggesting a change at this point. Rather, I'm making an observation in case something strange pops up in the future or someone knows of a use case where this will be a problem.

(If necessary, the fix would be to calculate the srtt using a uint64_t and truncate it to uint32_t when returning it.)