Page MenuHomeFreeBSD

WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath
Needs ReviewPublic

Authored by zlei on Fri, Mar 21, 5:24 PM.

Details

Reviewers
glebius
markj
melifaro
Group Reviewers
network
transport
Summary

An interface will be removed from the global network interface list when
is requested to be detached, either by user or by a hardware event. Once
detached, network applications or sysadmin tools, e.g. ifmcstat(8) and
ifconfig(8), shall make reference to it.

As for moving interface between vnet jails, the interface will be
detached from one jail and later be attached to another. An interface
belongs only to one vnet jail at the same time, so the above statement
still applies.

There're reports that show races between the teardown process and output
path. That, a thread on output path is entering net epoch and is
referencing a detached interface, while the detaching thread is
deallocating resources binded to the interface, notably the address
family dependent data, e.g., if_afdata[AF_INET6]. That will lead to
either NULL pointer derefence or accessing to freed memory.

There're still cases to make references to an interface, so leave
ifnet_byindex() unchanged but introduce a _attached variant. It filter
only attached ifnet. The _ref variant is redireced to use the _attached
variant. The outpaths are all converted to use the _attached or the _ref
variant.

PR: 279653
PR: 285129
MFC after: 1 month

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

zlei requested review of this revision.Fri, Mar 21, 5:24 PM

This is still WIP, but I want it open for discussing. ifnet_byindex() is renamed to ifnet_byindex_ori() only to show clearly all its usage. Will not be in the final revision.

zlei retitled this revision from ifnet: Introduce and use ifnet_byindex_attached() for outpath to WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath.Fri, Mar 21, 5:27 PM

So instead of a lockless check for IFF_DYING you use lockless check for presence in STAILQ. I don't see a principal change here, but it could make the races less probable.

Do I understand it right that the goal is for the output path to stop "seeing" an interface that is being detached earlier?

So instead of a lockless check for IFF_DYING you use lockless check for presence in STAILQ. I don't see a principal change here, but it could make the races less probable.

IFF_DYING is not set on an interface that is moving between vnet jails, but only for those being clone destroying or being detached via hardware event ( unplug etc. ) . So when vmoing an interface it is still possible to have races. I have a test can re-produce the races stably.

Do I understand it right that the goal is for the output path to stop "seeing" an interface that is being detached earlier?

Yes. In D49359 I commented

Ideally the interface has been in down state before it been detached

I have plan to move the if_down() logic prior to if_detach_internal(). So when either input / output / fwd path not see an interface ( or being detached ), it means they will also see the interface is down already.

@glebius

So instead of a lockless check for IFF_DYING

I think we eventually do not need IFF_DYING.

you use lockless check for presence in STAILQ.

This can be optimized. I'm considering reusing ife_gencnt to speed up the checking of the attached status of an interface.

Do I understand it right that the goal is for the output path to stop "seeing" an interface that is being detached earlier?

Yes. In D49359 I commented

Any delayed free synchronization mechanism, be it RCU in Linux, or epoch(9) in the network stack of FreeBSD, or FreeBSD smr(9), doesn't prevent from "seeing" a piece of memory that is being destroyed right now. Trying to achieve that is basically working against the design! The design is that you can obtain the pointer locklessly, and the obtained pointer indeed can point to a memory that is already detached from the lookup structure and is already on the free list. Once you got this pointer, in general you have two options: 1) use it while you stay in the epoch only 2) transfer from the lockless access to a locked and holding the lock re-check the status of the memory. If you want to exit the epoch but store the pointer somewhere, you can bump refcount on it. However, if the refcount is stored in the piece of memory itself, then the delayed free procedure, executed after epoch end, shall consult this refcount and possibly delay the free further. Things get complicated, when our piece of memory on the free list has its own pointers, and some of them may also be freed together with the main piece. In that case, freeing of them shall also be delayed with epoch.

Our current state of the ifnet synchronization is the following. The epoch is already cast upon the entire network stack and used to for several ways to obtain the ifnet pointer. However, not all rules of the delayed free are followed. First, the piece of memory called af_data hanging off ifnet is not delayed on free and that's why your other review adds NET_EPOCH_WAIT(). If af_data was freed following the rules, you won't needed it. Sidenote: I would actually dare to say that with proper design NET_EPOCH_WAIT() should not be used by modules at all, only by epoch(9) itself. The af_data is not the only violator. Second, the delayed free in if_free_deferred() doesn't consult the refcount and mechanism of further delay is not implemented. Also, a lot of remnants of old strong locking model (sidenote: its implementation was not flawless and also had races) are still remaining around, for example the IFF_DYING flag. Many of these remnants are misleading.

The complexity of the problem you are on is greatly increased by the fact that ifnet(9) interacts with so many other subsystems.

I would suggest to not focus on particular panics (af_data as the most annoying one), but focus on overall design. Let's discuss how it should look. Here is how I see it:

  1. ifnet pointer obtained in epoch shall have any hanging off pointer follow at least one of the rules:
    1. have a lock in the struct ifnet to protect the pointer and [possibly] the memory it points to
    2. shall be delayed free together with the ifnet itself
  2. On the input path obtained ifnet pointer with all its dependencies can be used to deliver a packet all the way to the top of the stack.
  3. On the output path obtained ifnet pointer can be used to deliver a packet all the way to the driver. This has implications:
    1. Since driver instance can be detached at any time, the driver instance destruction shall be either synchronized with the epoch, or the driver should use a lock in ifnet itself to check if it can follow the if_softc pointer.
    2. Driver unload needs to be also epoch synchronized.
  4. Storing ifnet pointers outside the epoch.
    1. Either all places of epoch to reference transition should use if_try_ref() or we need a mechanism of "further delay".
    2. Use pointer serialized into index+gencnt.