Page MenuHomeFreeBSD

new-bus: Fix some shortcomings in disabling devices via hints
ClosedPublic

Authored by jhb on Nov 20 2024, 10:23 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 22, 7:07 AM
Unknown Object (File)
Dec 1 2024, 10:56 AM
Unknown Object (File)
Dec 1 2024, 7:50 AM
Unknown Object (File)
Dec 1 2024, 3:06 AM
Unknown Object (File)
Nov 28 2024, 5:23 PM
Unknown Object (File)
Nov 23 2024, 5:13 PM
Unknown Object (File)
Nov 23 2024, 11:56 AM
Unknown Object (File)
Nov 21 2024, 7:11 PM
Subscribers

Details

Summary

A device can be disabled via a hint after it is probed (but before it
is attached). The initial version of this marked the device disabled,
but left the device "alive" meaning that dev->driver and dev->desc
were untouched and still pointed into the driver that probed the
device. If that driver lives in a kernel module that is later
unloaded, device_detach() called from devclass_delete_driver() doesn't
do anything (the device's state is DS_ALIVE). In particular, it
doesn't call device_set_driver(dev, NULL) to disassociate the device
from the driver that is being unloaded.

There are several places where these stale pointers can be tripped
over. After kldunload, invoking the sysctl to fetch device info can
dereference dev->desc and dev->driver causing panics. Even without
kldunload, a system suspend request will call the device_suspend and
device_resume DEVMETHODs of the driver in question even though the
device is not attached which can cause some excitement.

To clean this up, more fully detach a device that is disabled by a
hint by clearing the driver and setting the state to DS_NOTPRESENT.
However, to keep the device name+unit combination reserved, leave the
device attached to its devclass.

This requires a change to 'devctl enable' handling to deal with this
updated state. It now checks for a non-NULL devclass to determine if
a disabled device is in this state and if so it clears the hint.
However, it also now clears the devclass before attaching the device.
This gives all drivers an opportunity to attach to the now-enabled
device.

Reported by: adrian
Discussed with: imp

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb requested review of this revision.Nov 20 2024, 10:23 PM
This revision is now accepted and ready to land.Nov 20 2024, 10:24 PM

I love "... which can cause some excitement".

I'm tempted to ask about getting this into 14.2 given how easy it is for users to trip over it (and how likely they are to just say "oh well, I guess suspend/resume doesn't work on this system") but I'm afraid it really is too late. :-(

In D47691#1087646, @imp wrote:

This is super low risk

Is that "super low" as in "a driver would have to be very broken for this to cause problems", or "super low" as in "any place where this changes behaviour was guaranteed to be a kernel panic"?

I would accept a patch of the form "if a function pointer is NULL then don't call it" but anything more than that is just too much for a patch landing the day before RC1. If we need to do RC2 for some reason we can probably include this.

Yes. This turns guaranteed crashes into proper behavior only. Any system with a disabled device will crash on suspend

This bug has been around for over a decade (introduced in commit rG8dbce2a343d264c6d9a9fa23ed1949ca64d3d4ba. Reports of crashes first came in this week from @adrian. While it does fix crashes, I think those crashes are probably quite rare in practice and it is not worth rushing this into 14.2.