Page MenuHomeFreeBSD

tcp: Add hash histogram output and validate bucket length accounting
ClosedPublic

Authored by rscheff on Mar 31 2021, 6:41 AM.
Tags
None
Referenced Files
F108518777: D29506.id86682.diff
Sat, Jan 25, 8:14 PM
Unknown Object (File)
Sat, Jan 18, 5:52 PM
Unknown Object (File)
Sun, Jan 12, 12:59 PM
Unknown Object (File)
Sat, Jan 11, 9:10 AM
Unknown Object (File)
Sat, Jan 11, 9:06 AM
Unknown Object (File)
Sat, Jan 11, 8:55 AM
Unknown Object (File)
Sat, Jan 11, 8:52 AM
Unknown Object (File)
Sat, Jan 11, 6:19 AM
Subscribers

Details

Summary

The current hostcache hashing mechanism apparently
ends up with an very uneven distribution of the
entries into the hostcache hash array.

This was observed by adding a histogram output
function.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 38207
Build 35096: arc lint + arc unit

Event Timeline

  • only the histogram output
rscheff retitled this revision from tcp: Add hostcache hashbucket histogram, and sanitize counters when purging all entries to tcp: Add hostcache hashbucket histogram.Mar 31 2021, 6:31 PM
rscheff edited the summary of this revision. (Show Details)
rscheff added reviewers: thj, rrs, kbowling, jtl, jhb.

I'm contemplating if it makes sense to just provide the V_tcp_hostcache.hashbase[i].hch_length for each bucket and allow to user to do whatever analysis is wanted. I think the focus should be an admin using the system, not a developer debugging it. So we don't need to to count the entries and provide both information. I guess that is only interesting for us right now...

  • remove validating hch_length vs. TALQ members

I'm contemplating if it makes sense to just provide the V_tcp_hostcache.hashbase[i].hch_length for each bucket and allow to user to do whatever analysis is wanted. I think the focus should be an admin using the system, not a developer debugging it. So we don't need to to count the entries and provide both information. I guess that is only interesting for us right now...

Indeed, fair point. removing the (costly) validation against the actual bucket (tailq) length, since the lock around that counter is highly unlikely to have contributed to an issue for hostcache.count.

sys/netinet/tcp_hostcache.c
722

I think you need V_tcp_hostcache.bucket_limit + 1 here.

732

The check needs to be <= instead of <. For consistency you might want to get the mutex.

733

What using a KASSERT V_tcp_hostcache.hashbase[i].hch_length <= V_tcp_hostcache.bucket_limit and then just update the array.

741

You need <= instead of <.

rscheff marked 4 inline comments as done.
  • missed and off-by-one for the histogram size
sys/netinet/tcp_hostcache.c
722

Indeed.

732

I want to avoid the cost of the LOCK/UNLOCK, at the cost of absolute accuracy. Similar to the hostcache.list - which will dump the contents of each hashindex at different times... The idea for this histogram is more to give a sense of the hash usage distribution (which doesn't need to be perfectly accurate). E.g. the jenkins_hash32 offers the possibility to change a salt. Or at some future point, a dynamic resizing (hash array width vs. depth) could be thought about.

733

Since this is an advisory output only, I rather hint at a non-catastrophic state (the bucket should no longer grow - but that could be monitored over time, if detected), than stopping the system. If such a invalid state needs to be frozen and analyzed, there are other ways to achieve this already.

sys/netinet/tcp_hostcache.c
733

My point is that I consider the occurrence of hch_length > V_tcp_hostcache.bucket_limit a bug in the kernel. So I would put a message on the console, not include it in the sysctl output...

sys/netinet/tcp_hostcache.c
717

Is this necessary if you would move the sbuf_new_for_sysctl() call down to be just before the first sbuf_printf() call?

729

Can't this be moved down after the computation of the histogram?

rscheff marked 3 inline comments as done.
  • wiring the sbuf is not needed when no lock is held
sys/netinet/tcp_hostcache.c
712

I guess this needs to be unsigned int, since hch_length is unsigned.

788

This part of the condition is always true, since hch_length is an unsigned int.

788–790

I would move the KASSERT up, just before it is decremented and check that it is positive and not exceeding the limit.

Please note that you are not only adding the histogram features anymore, but also added verification code that hch_length is sane. You might to to reflect that in the commit message.

  • do more complete validation of hch_length
rscheff retitled this revision from tcp: Add hostcache hashbucket histogram to tcp: Add hash histogram output and validate bucket length accounting.Apr 1 2021, 12:41 PM
This revision is now accepted and ready to land.Apr 1 2021, 12:42 PM