Page MenuHomeFreeBSD

ip_reass: retire ipreass_slowtimo() in favor of per-slot callout
ClosedPublic

Authored by glebius on Aug 21 2022, 2:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 11, 3:30 PM
Unknown Object (File)
Oct 2 2024, 10:18 AM
Unknown Object (File)
Sep 30 2024, 4:14 PM
Unknown Object (File)
Sep 30 2024, 2:48 AM
Unknown Object (File)
Sep 30 2024, 12:19 AM
Unknown Object (File)
Sep 29 2024, 5:00 AM
Unknown Object (File)
Sep 22 2024, 9:13 AM
Unknown Object (File)
Sep 2 2024, 7:33 AM
Subscribers

Details

Summary

o Retire global routinely running ipreass_slowtimo().
o Instead use one callout entry per hash slot. The per-slot callout
would be scheduled only if a slot has entries, and would be driven
by TTL of the very last entry.
o Make net.inet.ip.fragttl read/write.
o Retire IPFRAGTTL, which used to be meaningful only with PR_SLOWTIMO.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 47037
Build 43926: arc lint + arc unit

Event Timeline

This passes tests/netinet/ip_reass_test and also I smashed it with small program @jtl written for testing of FreeBSD-SA-18:10 fixes.

sys/netinet/ip_reass.c
182

Or else add some synchronization mechanism and set CTLFLAG_MPSAFE.

618

t is unused.

623
sys/netinet/ip_var.h
64

"ttl" is a misleading name, this field really stores the expiry time.

glebius added inline comments.
sys/netinet/ip_reass.c
182

Thanks! I forgot to set the flag, was thinking it is already the default. IMHO, for such kind of sysctls we don't need any protection, as non-atomic store to int can't create an invalid value on any platform. It would be either value set by thread A or value set by thread B. However, in the function I just added atomic store.

618

Oops, my bad, thanks for noticing!

glebius marked 2 inline comments as done.

Address Mark's comments.

The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.

The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.

IMHO, the hash size is an overcompensation after very relaxed implementation that lead to security vulnerabilities. After fixing them, we took overly conservative approach. See a967df1c8ff457891d85a5c6ca6f9c3c861a85eb. Looks like we need more opinions here.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2022, 8:51 PM
This revision was automatically updated to reflect the committed changes.

The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.

For what it is worth, here are my 2c:

@glebius and I discussed this earlier this week. His motivation for making this change is to reduce the CPU overhead of walking all the buckets on the system, even when most are usually empty. (If there is even one fragment on the queue, it will walk all buckets each time the callout runs.) On the whole, that seems like a win as long as the CPU overhead of dealing with a DoS attack is not appreciably higher. He ran a test to ensure this code handled a DoS attack well.

We talked about some of the alternatives for reducing memory usage, but they have tradeoffs:

  • You could decrease the size of the hash table, but that makes you more susceptible to a DoS attack.
  • You could achieve this same goal without a callout per bucket (and @glebius and I discussed at least one option for that), but that has a tradeoff in terms of increased code complexity.

At this point, I'm inclined to say that this memory usage is a reasonable tradeoff for decreasing code complexity. However, I will readily admit that I am making assumptions based on the common use cases with which I am most familiar. At some point, it might be worth trying to come to an opinion as a project about how we think about systems that have constrained memory, CPU, etc. and what sorts of tradeoffs we are willing to make.

Again, this is just my opinion as a FreeBSD developer. I'm happy to hear alternate opinions.