Page MenuHomeFreeBSD

sleepqueue: Address a lock order reversal
ClosedPublic

Authored by markj on Feb 8 2022, 3:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 19, 8:04 PM
Unknown Object (File)
Wed, Oct 16, 4:15 PM
Unknown Object (File)
Sep 28 2024, 9:32 PM
Unknown Object (File)
Sep 24 2024, 9:01 AM
Unknown Object (File)
Sep 9 2024, 12:01 PM
Unknown Object (File)
Sep 8 2024, 3:04 PM
Unknown Object (File)
Sep 8 2024, 10:36 AM
Unknown Object (File)
Sep 4 2024, 10:12 PM
Subscribers

Details

Summary

After commit 74cf7cae4d22 ("softclock: Use dedicated ithreads for
running callouts."), there is a lock order reversal between the per-CPU
callout lock and the scheduler lock. softclock_thread() locks callout
lock then the scheduler lock, when preparing to switch off-CPU, and
sleepq_remove_thread() stops the timed sleep callout while potentially
holding a scheduler lock. In the latter case, it's the thread itself
that's locked, and if the thread is sleeping then its lock will be a
sleepqueue lock, but if it's still in the process of going to sleep
it'll be a scheduler lock.

We could perhaps change softclock_thread() to try to acquire locks in
the opposite order, but that'd require dropping and re-acquiring the
callout lock, which seems expensive for an operation that will happen
quite frequently. We can instead perhaps avoid stopping the
td_slpcallout callout if the thread is still going to sleep, which is
what this patch does. This will result in a spurious call to
sleepq_timeout(), but some counters suggest that this is very rare.

PR: 261198
Fixes: 74cf7cae4d22 ("softclock: Use dedicated ithreads for running callouts.")

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 44338
Build 41226: arc lint + arc unit

Event Timeline

markj requested review of this revision.Feb 8 2022, 3:41 PM
markj created this revision.

For reference, I'm running 16 busy bhyve VMs and a -j32 buildworld on a 32-cpu system. sleepq_timeout() gets called 1000-2000 times per second and after 90 minutes debug.sleepq.spurious_timeouts is 368.

The sleepq+timeout() was specifically changed to ignore spurious wakeups, esp. from the non-stopped callout.

This revision is now accepted and ready to land.Feb 8 2022, 4:44 PM
sys/kern/subr_sleepqueue.c
153

I suppose you will remove this counter for final commit, or at least move it under SLEEPQUEUE_PROFILING.

sys/kern/subr_sleepqueue.c
153

Indeed, I won't commit this, I just tried to verify my intuition about the frequency of spurious callouts.

This seems to fix the panic I was hitting. Before I could reliably panic by doing buildworld with -j 16 in a bhyve guest with 16 cores.

This revision was automatically updated to reflect the committed changes.