This is reported by Bhaskar Pardeshi from VMware.
Details
- Reviewers
rscheff tuexen - Group Reviewers
transport - Commits
- rGf5abdb03119a: tcp: fix a bug where unshifting should be put last in tcp_get_srtt()
Verified with siftr
analysis for example:
old minimal srtt_usecs = (1 >> TCP_RTT_SHIFT) * tick = (1 >> 5) * 1000 = 0
new minimal srtt_usecs = (1 * tick) >> TCP_RTT_SHIFT = (1 * (1000)) >> 5 = 31
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
sys/netinet/tcp_subr.c | ||
---|---|---|
4665 | I think you should add a comment that you use srtt = TICKS_2_USEC(tp->t_srtt) >> TCP_RTT_SHIFT instead of srtt = TICKS_2_USEC(tp->t_srtt >> TCP_RTT_SHIFT) to increase the precision. While doing this, you might want to replace the double space after = by a single one. |
For what its worth...
While this does indeed fix a bug where we lose precision, it also reduces the maximum SRTT value which can be stored in ticks without risking an overflow during the TICKS_2_USEC conversion.
Previously, the value only overflowed when (srtt >> 5) exceeded (UINT32_MAX / hz). Now, the value will overflow if the unshifted srtt exceeds (UINT32_MAX / hz).
Where hz == 1000, this means we will now overflow if srtt > 4,294,967, which translates to roughly 134.217 seconds. Previously, with hz == 1000, we would overflow if srtt > 137,438,975, which translates to roughly 4,294.967 seconds.
It seems unlikely we would see an srtt of over two minutes; however, it is more likely we would see that than an srtt of over 1 hour. Also, using hz values over 1000 (does anyone do this?) will cause the code to be more susceptible to an overflow.
I'm not suggesting a change at this point. Rather, I'm making an observation in case something strange pops up in the future or someone knows of a use case where this will be a problem.
(If necessary, the fix would be to calculate the srtt using a uint64_t and truncate it to uint32_t when returning it.)