sched_ule: Use a single runqueue per CPU
Needs ReviewPublic
Actions

Authored by olce on May 27 2024, 10:27 PM.

Details

Reviewers

jhb
markj
mav
jeff

Summary

Please see overview of project at D45393.

Previously, ULE would use 3 separate runqueues per CPU to store threads,
one for each of its selection policies, which are realtime, timesharing
and idle. They would be examined in this order, and the first thread
found would be the one selected.

This choice indeed appears as the easiest evolution from the single
runqueue used by sched_4bsd (4BSD): It allows sharing most of the same
runqueue code, which currently defines 64 levels per runqueue, while
multiplying the number of levels (by 3). However, it has several
important drawbacks:

The number of levels is the same for each selection policy. 64 is

unnecessarily large for the idle policy (only 32 distinct levels would
be necessary, given the 32 levels of our RTP_PRIO_IDLE and their future
aliases in the to-be-introduced SCHED_IDLE POSIX scheduling policy) and
unnecessary restrictive both for the realtime policy (which should
include 32 distinct levels for PRI_REALTIME, given our implementation of
SCHED_RR/SCHED_FIFO, leaving at most 32 levels for ULE's interactive
processes where the current implementation provisions 48 (perhaps taking
into account the spreading problem, see next point)) and the timesharing
one (88 distinct levels currently provisioned).

A runqueue has only 64 distinct levels, and maps priorities in the

range [0;255] to a queue index by just performing a division by 4.
Priorities mapped to the same level are treated exactly the same from
a scheduling perspective, which is generally both unexpected and
incorrect. ULE's code tries to compensate for this aliasing in the
timesharing selection policy, by spreading the 88 levels into 256,
knowing the latter amount in the end to only 64 distinct ones. This
scaling is unfortunately not performed for the other policies, breaking
the expectations mentioned in the previous point about distinct priority
levels.

With this change, only a single runqueue is now used to store all
threads, regardless of the scheduling policy ULE applies to them (going
back to what 4BSD has always been doing). ULE's 3 selection policies
are assigned non-overlapping ranges of levels, and helper functions have
been created to select or steal a thread in these distinct ranges,
preserving the "circular" queue mechanism for the timesharing selection
policy that (tries to) ensure fair execution despite dynamic priority
adjustments.

This change allows to choose any arbitrary repartition of runqueue
levels between selection policies. It is a prerequisite to the increase
to 256 levels per runqueue, which will allow to dispense with all the
drawbacks listed above.