Page MenuHomeFreeBSD

sched_ule: Fix racy loads of pc_curthread
ClosedPublic

Authored by markj on Jul 6 2022, 6:38 PM.
Tags
None
Referenced Files
F102598208: D35736.diff
Thu, Nov 14, 2:35 PM
Unknown Object (File)
Sun, Oct 20, 12:41 AM
Unknown Object (File)
Sat, Oct 19, 12:04 PM
Unknown Object (File)
Sat, Oct 19, 12:03 PM
Unknown Object (File)
Sat, Oct 19, 12:03 PM
Unknown Object (File)
Sat, Oct 19, 11:48 AM
Unknown Object (File)
Oct 1 2024, 3:41 PM
Unknown Object (File)
Oct 1 2024, 4:25 AM
Subscribers

Details

Summary

Thread switching used to be atomic with respect to the current CPU's
tdq lock. Since commit 686bcb5c14ab that is no longer the case. Now
sched_switch() does this:

  1. lock tdq (might already be locked)
  2. maybe put the current thread in the tdq, choose a new thread to run

2a. update tdq_lowpri

  1. unlock tdq
  2. switch CPU context, update curthread

Some code paths in ULE will load pc_curthread from a remote CPU with
that CPU's tdq lock held, usually to inspect its priority. But, as of
the aforementioned commit this is racy.

The problem I noticed is in tdq_notify(), which optionally sends an IPI
to a remote CPU when a new thread is added to its runqueue. If the new
thread's priority is higher (lower) than the currently running thread's
priority, then we deliver an IPI. But inspecting
pc_curthread->td_priority doesn't work, since pc_curthread might be
between steps 3 and 4 above. If pc_curthread's priority is higher than
that of the newly added thread, but pc_curthread is switching to a
lower-priority thread, then tdq_notify() might fail to deliever an IPI,
leaving a high priority thread stuck on the runqueue for longer than it
should. This can cause multi-millisecond stalls in
interactive/ithread/realtime threads.

Fix this problem by modifying tdq_add() and tdq_move() to return the
value of tdq_lowpri before the addition of the new thread. This ensures
that tdq_notify() has the correct priority value to compare against.

The other two uses of pc_curthread are susceptible to the same race. To
fix the one in sched_rem()->tdq_setlowpri() we need to have an exact
value for curthread. Thus, introduce a new tdq_curthread field to the
tdq which gets updated any time a new thread is selected to run on the
CPU. Because this field is synchronized by the thread lock, its
priority reflects the correct lowpri value for the tdq.

PR: 264867

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 46293
Build 43182: arc lint + arc unit

Event Timeline

markj requested review of this revision.Jul 6 2022, 6:38 PM

General idea seems sane. Just a few nits I noticed.

sys/kern/sched_ule.c
327

s/int/u_char/? Priorities are generally stored as u_char in the schedulers?

329

This one would stay as int due to the -1 return value.

331

s/int lowpri/u_char lowpri/ here as well?

856

Why use atomic_load_ptr here and above? The store is plain and given TDQ_LOCK is supposed to synchronize this field, can't you just use 'tdq->tdq_curthread' directly?

This revision is now accepted and ready to land.Jul 6 2022, 6:49 PM
markj marked 2 inline comments as done.Jul 6 2022, 7:01 PM
markj added inline comments.
sys/kern/sched_ule.c
327

It's a mix in practice. tdq_lowpri is an int, as are the priority parameters to sched_shouldpreempt(). Because I wanted to use -1 as a sentinel, it seemed a bit saner to keep them all as ints. Also, at no point do we have to convert these ints to u_chars.

I can change it to a u_char if you still prefer that, I don't have a strong preference.

856

You're right, I had changed it to atomic_load_ptr(&pcpu_find(high)->pc_curthread) first and failed to remove the atomic load later.

mav added inline comments.
sys/kern/sched_ule.c
2088

I'd probably use < 0 here, but as you wish.

So basically, you add the tdq curthread because pcpu curthread is updated in the MD asm context switch code?

Sorry, forgot to click "submit."

In D35736#810796, @kib wrote:

So basically, you add the tdq curthread because pcpu curthread is updated in the MD asm context switch code?

Right. Even if the tdq is locked, pc_curthread can be stale: it may point to a thread T1 that is currently switching to a different thread T2. In this scenario, tdq_curthread will point to T2.

sys/kern/sched_ule.c
327

Humm, ok. I do think in general in the kernel we use u_char (e.g. in struct thread and turnstile code, etc.). It's just cosmetic though and not important.

This revision was automatically updated to reflect the committed changes.