sched_ule: Fix racy loads of pc_curthread
ClosedPublic
Actions

Authored by markj on Jul 6 2022, 6:38 PM.

Details

Reviewers

mav
jeff
kib
jhb

Commits

rG6d3f74a14a83: sched_ule: Fix racy loads of pc_curthread

Summary

Thread switching used to be atomic with respect to the current CPU's
tdq lock. Since commit 686bcb5c14ab that is no longer the case. Now
sched_switch() does this:

lock tdq (might already be locked)
maybe put the current thread in the tdq, choose a new thread to run

2a. update tdq_lowpri

unlock tdq
switch CPU context, update curthread

Some code paths in ULE will load pc_curthread from a remote CPU with
that CPU's tdq lock held, usually to inspect its priority. But, as of
the aforementioned commit this is racy.

The problem I noticed is in tdq_notify(), which optionally sends an IPI
to a remote CPU when a new thread is added to its runqueue. If the new
thread's priority is higher (lower) than the currently running thread's
priority, then we deliver an IPI. But inspecting
pc_curthread->td_priority doesn't work, since pc_curthread might be
between steps 3 and 4 above. If pc_curthread's priority is higher than
that of the newly added thread, but pc_curthread is switching to a
lower-priority thread, then tdq_notify() might fail to deliever an IPI,
leaving a high priority thread stuck on the runqueue for longer than it
should. This can cause multi-millisecond stalls in
interactive/ithread/realtime threads.

Fix this problem by modifying tdq_add() and tdq_move() to return the
value of tdq_lowpri before the addition of the new thread. This ensures
that tdq_notify() has the correct priority value to compare against.

The other two uses of pc_curthread are susceptible to the same race. To
fix the one in sched_rem()->tdq_setlowpri() we need to have an exact
value for curthread. Thus, introduce a new tdq_curthread field to the
tdq which gets updated any time a new thread is selected to run on the
CPU. Because this field is synchronized by the thread lock, its
priority reflects the correct lowpri value for the tdq.

PR: 264867

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Jul 6 2022, 6:38 PM

Herald added a subscriber: imp. · View Herald TranscriptJul 6 2022, 6:38 PM

markj requested review of this revision.Jul 6 2022, 6:38 PM

Harbormaster completed remote builds in B46293: Diff 107853.Jul 6 2022, 6:38 PM

markj added a child revision: D35737: ule: Use explicit atomic accesses for tdq fields.Jul 6 2022, 6:38 PM

General idea seems sane. Just a few nits I noticed.

sys/kern/sched_ule.c
327	s/int/u_char/? Priorities are generally stored as `u_char` in the schedulers?
329	This one would stay as int due to the -1 return value.
331	s/int lowpri/u_char lowpri/ here as well?
856	Why use atomic_load_ptr here and above? The store is plain and given TDQ_LOCK is supposed to synchronize this field, can't you just use 'tdq->tdq_curthread' directly?

This revision is now accepted and ready to land.Jul 6 2022, 6:49 PM

markj marked 2 inline comments as done.Jul 6 2022, 7:01 PM

markj added inline comments.

sys/kern/sched_ule.c
327	It's a mix in practice. tdq_lowpri is an int, as are the priority parameters to sched_shouldpreempt(). Because I wanted to use -1 as a sentinel, it seemed a bit saner to keep them all as ints. Also, at no point do we have to convert these ints to u_chars. I can change it to a u_char if you still prefer that, I don't have a strong preference.
856	You're right, I had changed it to `atomic_load_ptr(&pcpu_find(high)->pc_curthread)` first and failed to remove the atomic load later.

mav accepted this revision.Jul 6 2022, 7:25 PM

mav added inline comments.

sys/kern/sched_ule.c
2088	I'd probably use < 0 here, but as you wish.

So basically, you add the tdq curthread because pcpu curthread is updated in the MD asm context switch code?

Sorry, forgot to click "submit."

In D35736#810796, @kib wrote:

So basically, you add the tdq curthread because pcpu curthread is updated in the MD asm context switch code?

Right. Even if the tdq is locked, pc_curthread can be stale: it may point to a thread T1 that is currently switching to a different thread T2. In this scenario, tdq_curthread will point to T2.

jhb added inline comments.Jul 13 2022, 5:36 PM

sys/kern/sched_ule.c
327	Humm, ok. I do think in general in the kernel we use u_char (e.g. in struct thread and turnstile code, etc.). It's just cosmetic though and not important.