ifnet: Fix the teardown process of an interface
Needs ReviewPublic
Actions

Authored by zlei on Mar 14 2025, 12:01 PM.

Details

Reviewers

markj
glebius
melifaro
franco_opnsense.org

Group Reviewers

network

Summary

The interface should be brought down before it been detached, but lots
of drivers do not. Ideally this should be done on the driver side, but
that requires lots of modification so let's do the job in if_detach_internal().

PR: 279653
PR: 285129
MFC after: 2 weeks

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

zlei created this revision.Mar 14 2025, 12:01 PM

Herald added subscribers: ae, imp. · View Herald TranscriptMar 14 2025, 12:01 PM

zlei requested review of this revision.Mar 14 2025, 12:01 PM

This is only the partial fix. I'm cleaning up the NET_EPOCH_ENTER / NET_EPOCH_EXIT and trying to find out all possible the races.

Typically the input / output paths should have

NET_EPOCH_ENTER
...
NET_EPOCH_EXIT
...

NET_EPOCH_ENTER
if ((ifp->if_flags & IFF_UP) == 0)
    abort processing

NET_EPOCH_EXIT

Do you have a stress test suite you use for testing these changes?

I'll do a CFT on this change with OPNsense users next week. A few can trigger the if_afdata panics with PPPoE.

Why exactly does this fix the problem? As I understand, the specific bug is (probably) that NET_EPOCH_WAIT() is used too early. At that point, the ifnet is still visible from several global data structures, and in particular the routing tables, since rt_flushifroutes() is not called yet. From what I can see, clearing IFF_UP does not seem to be sufficient. Maybe some drivers set their link state to down when stopped, but is it guaranteed?

sys/net/if.c
1137	I'd explain a bit further that this ensures that the driver sees that IFF_UP is clear.
1139	Should we zero the structure here, just in case drivers peek at other fields besides ifr.ifr_flags and ifr.ifr_flagshigh?

In D49359#1125635, @glebius wrote:

Do you have a stress test suite you use for testing these changes?

I have one Dtrace script and a script for the input path, ether_input_internal() / ip6_input() and can reliable crash the kernel.

There're other crashing paths. Still trying to figure out them all.

In D49359#1126171, @markj wrote:

Why exactly does this fix the problem? As I understand, the specific bug is (probably) that NET_EPOCH_WAIT() is used too early.

Yes. This is only part of the fix. Still working on it yet. Found other paths to crash to kernel.

At that point, the ifnet is still visible from several global data structures, and in particular the routing tables, since rt_flushifroutes() is not called yet. From what I can see, clearing IFF_UP does not seem to be sufficient. Maybe some drivers set their link state to down when stopped, but is it guaranteed?

Yes, you're right. While reviewing the logic teardown and the netisr part, I can conclude that merely clearing IFF_UP is not sufficient. This only fix one path, that is ether_input_internal() / ip6_input().

Well I though that it is the best result that the system panics when hitting this bug, but I was wrong.

The test shows it is even possible to write freed memory. That is,
thread A,

			(*dp->dom_ifdetach)(ifp,
			    ifp->if_afdata[dp->dom_family]);
			ifp->if_afdata[dp->dom_family] = NULL;

but thread B see stall reference, i.e. ifp->if_afdata[dp->dom_family] != NULL.

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

In D49359#1126467, @franco_opnsense.org wrote:

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

Is that based on stable/14 ?

In D49359#1126495, @zlei wrote:

In D49359#1126467, @franco_opnsense.org wrote:

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

Is that based on stable/14 ?

Correct. Am I missing patches for this to make more sense?

In D49359#1126497, @franco_opnsense.org wrote:

In D49359#1126495, @zlei wrote:

In D49359#1126467, @franco_opnsense.org wrote:

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

Is that based on stable/14 ?

Correct. Am I missing patches for this to make more sense?

No, this is not complete, but your feedback is useful.

The logic of if_vmove() of stable/13 is a little different from stable/14, and I'm not testing on stable/13 yet. Just asked to confirm.

In D49359#1126467, @franco_opnsense.org wrote:

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

I tested the patch for days, it appears quite stable as far as now ( surely, for the input path only ).

Can you share the steps to repeat the Github issue 207 ? I do not understand the original reporter's steps.

In D49359#1127313, @zlei wrote:

In D49359#1126467, @franco_opnsense.org wrote:

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

I tested the patch for days, it appears quite stable as far as now ( surely, for the input path only ).

Can you share the steps to repeat the Github issue 207 ? I do not understand the original reporter's steps.

I assume you are not testing with netgraph/ppppoe device? The condition doesn't appear to trigger otherwise.

In D49359#1127318, @franco_opnsense.org wrote:

In D49359#1127313, @zlei wrote:

In D49359#1126467, @franco_opnsense.org wrote:

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

I tested the patch for days, it appears quite stable as far as now ( surely, for the input path only ).

Can you share the steps to repeat the Github issue 207 ? I do not understand the original reporter's steps.

I assume you are not testing with netgraph/ppppoe device? The condition doesn't appear to trigger otherwise.

I tested only with if_epair(4) and ure(4). The former for if_vmove() and the latter for hot plug / unplug. @bz reported a similar bug with wireless interface but I do not have that so use ure(4) instead.

For the netgraph/ppppoe case, can I have a simple script or steps to repeat ??

In D49359#1126431, @zlei wrote:
Well I though that it is the best result that the system panics when hitting this bug, but I was wrong.

The test shows it is even possible to write freed memory. That is,
thread A,
			(*dp->dom_ifdetach)(ifp,
			    ifp->if_afdata[dp->dom_family]);
			ifp->if_afdata[dp->dom_family] = NULL;
but thread B see stall reference, i.e. ifp->if_afdata[dp->dom_family] != NULL.