o Retire global routinely running ipreass_slowtimo().
o Instead use one callout entry per hash slot. The per-slot callout
would be scheduled only if a slot has entries, and would be driven
by TTL of the very last entry.
o Make net.inet.ip.fragttl read/write.
o Retire IPFRAGTTL, which used to be meaningful only with PR_SLOWTIMO.
Details
- Reviewers
markj tuexen - Group Reviewers
network - Commits
- rGa30cb3158900: ip_reass: retire ipreass_slowtimo() in favor of per-slot callout
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 47037 Build 43926: arc lint + arc unit
Event Timeline
This passes tests/netinet/ip_reass_test and also I smashed it with small program @jtl written for testing of FreeBSD-SA-18:10 fixes.
sys/netinet/ip_reass.c | ||
---|---|---|
182 | Thanks! I forgot to set the flag, was thinking it is already the default. IMHO, for such kind of sysctls we don't need any protection, as non-atomic store to int can't create an invalid value on any platform. It would be either value set by thread A or value set by thread B. However, in the function I just added atomic store. | |
618 | Oops, my bad, thanks for noticing! |
The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.
IMHO, the hash size is an overcompensation after very relaxed implementation that lead to security vulnerabilities. After fixing them, we took overly conservative approach. See a967df1c8ff457891d85a5c6ca6f9c3c861a85eb. Looks like we need more opinions here.
For what it is worth, here are my 2c:
@glebius and I discussed this earlier this week. His motivation for making this change is to reduce the CPU overhead of walking all the buckets on the system, even when most are usually empty. (If there is even one fragment on the queue, it will walk all buckets each time the callout runs.) On the whole, that seems like a win as long as the CPU overhead of dealing with a DoS attack is not appreciably higher. He ran a test to ensure this code handled a DoS attack well.
We talked about some of the alternatives for reducing memory usage, but they have tradeoffs:
- You could decrease the size of the hash table, but that makes you more susceptible to a DoS attack.
- You could achieve this same goal without a callout per bucket (and @glebius and I discussed at least one option for that), but that has a tradeoff in terms of increased code complexity.
At this point, I'm inclined to say that this memory usage is a reasonable tradeoff for decreasing code complexity. However, I will readily admit that I am making assumptions based on the common use cases with which I am most familiar. At some point, it might be worth trying to come to an opinion as a project about how we think about systems that have constrained memory, CPU, etc. and what sorts of tradeoffs we are willing to make.
Again, this is just my opinion as a FreeBSD developer. I'm happy to hear alternate opinions.