Page MenuHomeFreeBSD

netgraph(4): Don't process NGQF_MESG items in NET_EPOCH context.
ClosedPublic

Authored by afedorov on Sep 8 2022, 4:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 7, 9:40 AM
Unknown Object (File)
Wed, Nov 6, 2:06 PM
Unknown Object (File)
Tue, Nov 5, 2:43 PM
Unknown Object (File)
Oct 18 2024, 10:08 AM
Unknown Object (File)
Oct 16 2024, 7:20 PM
Unknown Object (File)
Oct 14 2024, 10:05 AM
Unknown Object (File)
Oct 13 2024, 4:03 AM
Unknown Object (File)
Oct 13 2024, 4:03 AM

Details

Summary

Netgraph has two main types of message items:

  1. NGQF_DATA items are used for data processing. This is a hot path that should be called from a NET_EPOCH context.
  1. NGQF_MESG items are used for node configuration. There are many places in netgraph(4) where processing the NGQF_MESG item can call sections of code that are forbidden in the NET_EPOCH context.

All item types can be queued and then processed using ngthread(). But ngthread() is unconditionally enter in the NET_EPOCH section for all types. This causes panic/deadlocks when processing NGQF_MESG elements:

panic: epoch_wait_preempt() called in the middle of an epoch section of the same epoch
cpuid = 0
time = 1600268186
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0154b9e880
vpanic() at vpanic+0x182/frame 0xfffffe0154b9e8d0
panic() at panic+0x43/frame 0xfffffe0154b9e930
epoch_wait_preempt() at epoch_wait_preempt+0x293/frame 0xfffffe0154b9e980
if_detach_internal() at if_detach_internal+0x1ca/frame 0xfffffe0154b9e9e0
if_detach() at if_detach+0x3d/frame 0xfffffe0154b9ea00
ng_eiface_rmnode() at ng_eiface_rmnode+0x55/frame 0xfffffe0154b9ea40
ng_rmnode() at ng_rmnode+0x188/frame 0xfffffe0154b9ea70
ng_mkpeer() at ng_mkpeer+0x7b/frame 0xfffffe0154b9eac0
ng_apply_item() at ng_apply_item+0x547/frame 0xfffffe0154b9eb40
ngthread() at ngthread+0x26e/frame 0xfffffe0154b9ebb0
fork_exit() at fork_exit+0x80/frame 0xfffffe0154b9ebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0154b9ebf0
malloc_dbg() at malloc_dbg+0xd4/frame 0xfffffe0046ffed10
malloc() at malloc+0x2d/frame 0xfffffe0046ffed50
in_getsockaddr() at in_getsockaddr+0x6a/frame 0xfffffe0046ffed80
ng_ksocket_rcvmsg() at ng_ksocket_rcvmsg+0x345/frame 0xfffffe0046ffedf0
ng_apply_item() at ng_apply_item+0x3be/frame 0xfffffe0046ffee80
ngthread() at ngthread+0x200/frame 0xfffffe0046ffeef0
fork_exit() at fork_exit+0x80/frame 0xfffffe0046ffef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0046ffef30

So the "good enough" solution for now is to temporarily leave the NET_EPOCH section for NGQF_MESG elements.

P/S: See https://reviews.freebsd.org/D36451

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This should fix more problems, than create new :)

This revision is now accepted and ready to land.Sep 9 2022, 4:53 AM

Not familiar with the code.
If it's safe to leave temporarily the epoch without worrying about things changing while we were out of the epoch, then I'm ok.
The other option would be to enter/exit the epoch from within ng_apply_item (conditionally), but it would require to enter/exit the epoch for each data msg.

just get it in please so that i can close the ticket internally :) thanks