Page MenuHomeFreeBSD

getblk: track "non-sterile" bufobj to avoid bo lock on miss if sterile
ClosedPublic

Authored by rlibby on Jun 12 2024, 7:28 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 13, 11:28 PM
Unknown Object (File)
Dec 2 2024, 4:36 AM
Unknown Object (File)
Dec 2 2024, 4:34 AM
Unknown Object (File)
Nov 22 2024, 9:37 AM
Unknown Object (File)
Oct 29 2024, 9:38 PM
Unknown Object (File)
Oct 24 2024, 5:09 PM
Unknown Object (File)
Sep 28 2024, 9:08 AM
Unknown Object (File)
Sep 28 2024, 9:07 AM
Subscribers

Details

Summary

This is a scheme to avoid taking the bufobj lock and doing a second
lookup in the case where in getblk we do an unlocked lookup and find no
buf. Was there really no buf, or were we in the middle of a reassignbuf
race? By tracking any use of reassignbuf with a flag, we can know if
there can't have been a race because there has been no reassignbuf.
Because this scheme is spoiled on the first use of reassignbuf, it is
mostly only beneficial for cases where a certain vnode is never expected
to use dirty bufs at all.

Sponsored by: Dell EMC Isilon


I hesitated on submitting this patch because it appears that in-tree we
only have a couple uses of GB_NOCREAT, and they probably won't benefit
from this, and we may slightly penalize everyone else with the branch
and atomic in reassignbuf. So I understand if we don't see a benefit to
having this in-tree. We have an out-of-tree use of getblk that involves
GB_NOCREAT on vnodes that don't use dirty bufs.

Likewise, there are more sophisticated ways to resolve the unlocked
lookup race, like an atomic reassignbuf counter or a scheme like
vm_object_mightbedirty_ uses. I avoided those because neither the
in-tree uses nor our out-of-tree use would benefit.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 58160
Build 55048: arc lint + arc unit

Event Timeline

Taking advice on style and proper atomic conventions.

And on whether to take this at all.

sys/kern/vfs_bio.c
4010–4011

Unsure if the atomic_load is required after the fence.

sys/kern/vfs_subr.c
3217

This might be more clear with atomic_thread_fence_rel. In that case, as above, would atomic_set_int be required still, or would just a plain bo->bo_flag |= BO_NONSTERILE followed by the fence suffice?

sys/sys/bufobj.h
119

With BO_NONSTERILE, we avoid setting up flags in bufobj_init. Reversing the sense to BO_STERILE might read more clearly though.

IMO this is somewhat strange change in isolation. What is the action of your caller if GB_NOCREAT does not return a buffer?

I am asking this because if you consider the buffer cache autonomously, there is a big race there, since another thread might create the buffer right after we did not found it, and the answer we get is already outdated. But, buffer cache is not used alone, the vnode must be locked when doing the lookup (except for devvp, but devvp code does not use GB_NOCREAT), and it must be locked exclusively when creating dirty buffers. So really even fences are not strictly needed for correctness, but their cost is zero on x86 and the code is more correct alone with them.

sys/kern/vfs_subr.c
3217

You need the _rel fence to complement the _acq in getblkx() for this scheme to work. But then, there is no reason to use atomics to set the NONSTERILE flag, since other flags manipulations are not atomic (and must be protected by bo_lock).

In D45571#1039766, @kib wrote:

IMO this is somewhat strange change in isolation. What is the action of your caller if GB_NOCREAT does not return a buffer?

I am asking this because if you consider the buffer cache autonomously, there is a big race there, since another thread might create the buffer right after we did not found it, and the answer we get is already outdated.

I agree it's a little strange on its own.

We have a few kinds of uses of GB_NOCREAT out of tree. One of them is certain kinds of file prefetch logic. They are logically okay with the buffer existence races, it only affects performance by influence prefetch. They may not use dirty bufs at all, so they see a benefit from being able to avoid the bufobj lock on a "sterile" bufobj. That benefit can be significant with a lot of concurrent activity on a file.

But, buffer cache is not used alone, the vnode must be locked when doing the lookup (except for devvp, but devvp code does not use GB_NOCREAT), and it must be locked exclusively when creating dirty buffers.

Right, so, we also have uses of GB_NOCREAT on devvps that are not okay with a false negative in the lookup. Generally there would be some kind of synchronization in a higher layer that prevents or renders harmless what you mention about the result being immediately outdated.

An example use is a cache invalidation operation that wants to invalidate a buf if it exists, or an operation to shootdown an IO if there's a buf, or read or write around the buffer cache but only if there's not a buf. Also the motivation for the 2021 patch c926114f2ff20d7c3e5a157f1618b6228f391bf7 (D28375) is related.

Anyway, this is all just to paint a picture about why it makes some sense in our out of tree use.

So really even fences are not strictly needed for correctness, but their cost is zero on x86 and the code is more correct alone with them.

I'll revise this according to what I've understood. In short it would be in getblkx without the bo lock, acq fence; plain read of bo_flag, and in reassignbuf with the bo lock, plain write of bo_flag; rel fence.

However, as I said, I understand if there's just no benefit to taking it in-tree.

kib (IIUC):

  • reassignbuf, given fence and bo lock, don't need atomic_set_acq_int
  • getblkx, given fence, don't need atomic_load_int
This revision is now accepted and ready to land.Jun 13 2024, 6:04 PM