ip_reass: retire ipreass_slowtimo() in favor of per-slot callout
ClosedPublic
Actions

Authored by glebius on Aug 21 2022, 2:48 PM.

Details

Reviewers

markj
tuexen

Group Reviewers

network

Commits

rGa30cb3158900: ip_reass: retire ipreass_slowtimo() in favor of per-slot callout

Summary

o Retire global routinely running ipreass_slowtimo().
o Instead use one callout entry per hash slot. The per-slot callout
would be scheduled only if a slot has entries, and would be driven
by TTL of the very last entry.
o Make net.inet.ip.fragttl read/write.
o Retire IPFRAGTTL, which used to be meaningful only with PR_SLOWTIMO.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 47037
Build 43926: arc lint + arc unit

Event Timeline

glebius created this revision.Aug 21 2022, 2:48 PM

Herald added subscribers: melifaro, imp. · View Herald TranscriptAug 21 2022, 2:48 PM

glebius requested review of this revision.Aug 21 2022, 2:48 PM

Harbormaster completed remote builds in B47037: Diff 109622.Aug 21 2022, 2:48 PM

This passes tests/netinet/ip_reass_test and also I smashed it with small program @jtl written for testing of FreeBSD-SA-18:10 fixes.

glebius edited the summary of this revision. (Show Details)Aug 21 2022, 2:49 PM

markj added inline comments.Aug 24 2022, 4:27 PM

sys/netinet/ip_reass.c
182	Or else add some synchronization mechanism and set CTLFLAG_MPSAFE.
618	`t` is unused.
623
sys/netinet/ip_var.h
64	"ttl" is a misleading name, this field really stores the expiry time.

glebius marked 3 inline comments as done.Aug 26 2022, 3:02 PM

glebius added inline comments.

sys/netinet/ip_reass.c
182	Thanks! I forgot to set the flag, was thinking it is already the default. IMHO, for such kind of sysctls we don't need any protection, as non-atomic store to int can't create an invalid value on any platform. It would be either value set by thread A or value set by thread B. However, in the function I just added atomic store.
618	Oops, my bad, thanks for noticing!

Address Mark's comments.

Harbormaster completed remote builds in B47139: Diff 109875.Aug 26 2022, 3:02 PM

The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.

In D36275#826020, @markj wrote:

The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.

IMHO, the hash size is an overcompensation after very relaxed implementation that lead to security vulnerabilities. After fixing them, we took overly conservative approach. See a967df1c8ff457891d85a5c6ca6f9c3c861a85eb. Looks like we need more opinions here.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2022, 8:51 PM

Closed by commit rGa30cb3158900: ip_reass: retire ipreass_slowtimo() in favor of per-slot callout (authored by glebius). · Explain Why

This revision was automatically updated to reflect the committed changes.

glebius added a commit: rGa30cb3158900: ip_reass: retire ipreass_slowtimo() in favor of per-slot callout.

In D36275#826020, @markj wrote:

The diff seems ok to me, but I don't really understand why we need one callout per hash bucket. It's an extra 64KB per VNET, which isn't exactly cheap. I can see why it's conceptually simpler to implement this way, since you have a per-bucket mutex to synchronize the callout, but it still feels like overkill. I don't have any other comments on the patch itself.

For what it is worth, here are my 2c:

@glebius and I discussed this earlier this week. His motivation for making this change is to reduce the CPU overhead of walking all the buckets on the system, even when most are usually empty. (If there is even one fragment on the queue, it will walk all buckets each time the callout runs.) On the whole, that seems like a win as long as the CPU overhead of dealing with a DoS attack is not appreciably higher. He ran a test to ensure this code handled a DoS attack well.

We talked about some of the alternatives for reducing memory usage, but they have tradeoffs:

You could decrease the size of the hash table, but that makes you more susceptible to a DoS attack.
You could achieve this same goal without a callout per bucket (and @glebius and I discussed at least one option for that), but that has a tradeoff in terms of increased code complexity.

At this point, I'm inclined to say that this memory usage is a reasonable tradeoff for decreasing code complexity. However, I will readily admit that I am making assumptions based on the common use cases with which I am most familiar. At some point, it might be worth trying to come to an opinion as a project about how we think about systems that have constrained memory, CPU, etc. and what sorts of tradeoffs we are willing to make.

Again, this is just my opinion as a FreeBSD developer. I'm happy to hear alternate opinions.