Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality. Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.
Details
Quite surprisingly to me, it doesn't seem to immediately panic. So I'm probably missing something. ;-)
In our getblk-heavy workloads (which have other scaling limitations beyond those in FreeBSD), on 2x48 CPU (2P) metal, we see significantly better scaling with this change at 6+ threads (4.3x single-thread vs 3.7x single-thread), and the gap gets wider at higher thread counts. I don't have any great benchmarks on vanilla FreeBSD.
Diff Detail
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 32528 Build 30000: arc lint + arc unit
Event Timeline
sys/kern/vfs_bio.c | ||
---|---|---|
3867 | The blocking there is not needed. | |
3879 | I think the rule should be that any failure on lockless path for buffer lookup results in locked retry. Imagine that buffer was reclaimed meantime, then you would not reach the check for bufobj/blkno at all. | |
3884 | Why do you think it should be handled there ? | |
sys/kern/vfs_subr.c | ||
2370 | return () |
Thanks!
sys/kern/vfs_bio.c | ||
---|---|---|
3867 | will fix | |
3879 | We can abort without retry for EINTR/ERESTART, right? Retrying in all other cases makes sense to me. | |
3884 | #define BUF_UNLOCK(bp) do { \ KASSERT(((bp)->b_flags & B_REMFREE) == 0, \ ("BUF_UNLOCK %p while B_REMFREE is still set.", (bp))); \ \ (void)_lockmgr_args(&(bp)->b_lock, LK_RELEASE, NULL, \ LK_WMESG_DEFAULT, LK_PRIO_DEFAULT, LK_TIMO_DEFAULT, \ LOCK_FILE, LOCK_LINE); \ | |
sys/kern/vfs_subr.c | ||
2370 | will fix |
- Fix style issues
- Fallback to locked getblk path on all BUF_TIMELOCK errors except EINTR/ERESTART.
sys/kern/vfs_bio.c | ||
---|---|---|
3884 | Huh. I think you should use lockmgr() directly there, to not touch buffer we do not intend to operate on, at all. |
Add BUF_UNLOCK_NOASSERT macro for raw lockmgr unlock operation, and use it in unlocked identity-race path.
sys/kern/vfs_bio.c | ||
---|---|---|
3882 | Sure, will do. |
head/sys/kern/vfs_subr.c | ||
---|---|---|
2348–2352 ↗ | (On Diff #74887) | Suppose a buf moved from dirty to clean here, after checking the clean queue, before checking the dirty queue. No buf is found. Is SMR intended to avoid that, or is it only consistent per queue? We found, presumably, that getblk(GB_NOCREAT) gets the wrong result because of this. |
head/sys/kern/vfs_subr.c | ||
---|---|---|
2348–2352 ↗ | (On Diff #74887) | SMR only helps you on a per-queue basis. I think maybe ideally there would be a single pctrie here and some other mechanism for efficiently tracking dirty bufs, but I don't know how realistic that is. Probably the right fix for getblk GB_NOCREAT is to re-check with the lock if unlocked lookup returned NULL. Something like: bp = gbincore_unlocked(bo, blkno); if (bp == NULL) { if (flags & GB_NOCREAT) goto loop; else goto newbuf_unlocked; } |
head/sys/kern/vfs_subr.c | ||
---|---|---|
2348–2352 ↗ | (On Diff #74887) | (In the !GB_NOCREAT case, we might see the same race from gbincore_unlocked, but it is harmless: we'd create a useless buf at the same lbn, re-lock to attempt to insert it, and discover we'd lost the race. That race can happen even without SMR-gbincore.) |
head/sys/kern/vfs_subr.c | ||
---|---|---|
2348–2352 ↗ | (On Diff #74887) | Suraj came up with the same patch idea. I'm wondering if it could be more efficient to just try gbincore_unlocked a second time rather than wait for the bo lock, in the GB_NOCREAT case. |
head/sys/kern/vfs_subr.c | ||
---|---|---|
2348–2352 ↗ | (On Diff #74887) | Ah, cool. I hope Suraj is doing well. Spinning locklessly again (GB_NOCREAT case) might be worth doing, but you'll still need the same locked fallback if you continue to get NULL (for correctness). If you should spin, and for how long, probably depends on how many GB_NOCREAT getblk requests will already exist in the bufq and be transitioning queues, vs true-positive misses in your workload. (And relative costs of a few lockless rechecks vs taking the bufobj lock.) |
head/sys/kern/vfs_subr.c | ||
---|---|---|
2348–2352 ↗ | (On Diff #74887) |