Queue all XPT_ASYNC ccb's and run those in a new cam async thread. This thread
is allowed to have bouneded sleeps. This should allow us to make all the
registration routines for cam periph driverrs simpler since they can assume they
can always allocate memory. This is a separate thread so that any I/O that's
completed in xpt_done_td isn't held up.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
sys/cam/nvme/nvme_xpt.c | ||
---|---|---|
765 ↗ | (On Diff #85533) | Oh, this is wrong, or at least not relevant for this patch. |
My "build machine" runs a plain GENERIC kernel, so when it's running head, the INVARIANTS option is enabled. Its normal daily update got to main-n245338-221622ec0c8e, but after updating sources to main-n245363-b3dac3913dc9, it panicked (after successful build/install/reboot) before entering multi-user mode. Likewise after updating to main-n245372-d1cbe7908986.
I applied the diff from this review to the (clean) sources at main-n245372-d1cbe7908986; rebuilt/installed; the following reboot did make it to multi-user mode, and now reports:
freebeast(14.0-C)[3] uname -aUK FreeBSD freebeast.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1209 main-n245372-d1cbe7908986-dirty: Wed Mar 10 15:23:53 PST 2021 root@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 1400005 1400005
which seems rather like a "win" to me.
sys/cam/cam_xpt.c | ||
---|---|---|
3169 | The wakeup() should be done before you unlock the lock, otherwise the worker thread can miss the wakeup. (xpt_done() has the same problem.) |
sys/cam/cam_xpt.c | ||
---|---|---|
3169 | I suspect that if we move that, we can get rid of the run interlock too... At best it saves extra wakeups in the rare case where races are lost if we move this... I'll do a separate commit for the done part. |
I re-tested after imp@'s update of Thu, Mar 11, 09:24; no issues:
freebeast(14.0-C)[1] uname -aUK FreeBSD freebeast.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1211 main-n245387-913e7dc3e0eb-dirty: Thu Mar 11 10:38:49 PST 2021 root@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 1400005 1400005
sys/cam/cam_xpt.c | ||
---|---|---|
3169 |
| |
3169 | As discussed in the separate review, the wakeup can be outside of the lock safely in this case. The only consequence of losing the "race" here is that the other thread might grab the lock as soon as this one releases it and handle the work request and go to sleep before wakeup() runs (assume a preemption just before wakeup() as the worst case to make the race longer). All that happens then is that you wakeup the thread when there is nothing to do so it wakes up and goes right back to sleep again. However, by moving the wakeup() out from under the lock, you reduce lock contention since if the wakeup() resumes the thread on another CPU, it will then immediately contest on this lock and possibly block (again, assume a worst case of a preemption just _after_ wakeup() to exacerbate this race). For this reason, it's actually a fairly common pattern to do wakeup's after dropping the lock when it is feasible. You could perhaps do the 'run' optimization from xpt_done() here as well where you only do the wakeup if the queue was empty before you inserted the new ccb into it. Note that this can in theory delay the wakeup() (e.g. if you get preempted before the wakeup in the original thread and meanwhile another thread enqueue's a second ccb and wouldn't do the wakeup, so both requests sit on the queue until the first thread resumes). The tradeoff there is potential latency vs the overhead of calling wakeup() which still has to go lock a sleepq chain spin lock to look for the right sleep queue to see if there is a thread to wake up or not. It's not clear to me what the right tradeoff there is, and it might be fine to just do the wakeup() unconditionally instead. |
Move the wakeup outside the lock.
While classically there's a race if thread S is about to sleep and thread W
wants to wake it in general, in this case all 'lost races' are harmless.
We won't sleep with work in the work queue.
sys/cam/cam_xpt.c | ||
---|---|---|
3169 | Since async events are relatively rare, I suspect the other optimizations for more contended paths won't matter (even though it will add a tiny amount to the sleepq lock contention). I may try to get xpt_done_td to just handle both and allow sleeping for the async queue (they will be separate threads, so the sleeping won't delay other transactions). That would allow us to keep the same optimizations for both cases. But there's already a couple of bugzilla bugs on this so I'd like to get it in as it is. |
Tested again after Warner's last change; no issues:
freebeast(14.0-C)[1] uname -aUK FreeBSD freebeast.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #1213 main-n245402-6385cabd5be6-dirty: Fri Mar 12 10:01:51 PST 2021 root@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 1400005 1400005