getblk: fail faster with GB_LOCK_NOWAIT
ClosedPublic
Actions

Authored by rlibby on May 19 2024, 1:44 AM.

Details

Reviewers

markj
mckusick
kib

Commits

rGa332ba32d4a2: getblk: fail faster with GB_LOCK_NOWAIT

Summary

If we asked not to wait on a lock, and then we failed to get a buf lock
because we would have had to wait, then just return the error. This
avoids taking the bufobj lock and a second trip to lockmgr.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 57802
Build 54690: arc lint + arc unit

Event Timeline

rlibby created this revision.May 19 2024, 1:44 AM

Herald added subscribers: olce, imp. · View Herald TranscriptMay 19 2024, 1:44 AM

rlibby requested review of this revision.May 19 2024, 1:44 AM

Harbormaster completed remote builds in B57756: Diff 138725.May 19 2024, 1:45 AM

I think that this is technically correct but unfair. Intent was that failed unlocked buffer lookup should not change the behavior, in particular, allowing the normal lookup quirks to proceed.

I do not have rational arguments against the change.

This revision is now accepted and ready to land.May 20 2024, 2:55 AM

In D45245#1032333, @kib wrote:

I think that this is technically correct but unfair. Intent was that failed unlocked buffer lookup should not change the behavior, in particular, allowing the normal lookup quirks to proceed.

Can you say more about the behavior that might need to be preserved, or about what might be unfair?

To me, this doesn't seem to change anything logically. Here are two cases I thought about where we would now simply exit after getting an error from BUF_TIMELOCK:

The buf lock is held by someone else, and it still has the same identity. Without the patch, we goto loop where we acquire the bufobj lock, and then try again. Most likely we then find the buf again and fail to lock it again. But maybe we find it unlocked or non-existent. It doesn't really matter because either could have happened anyway.
The buf lock is held by someone else, and its identity has changed (b_bufobj or b_lblkno). Changing the buf's identity itself requires the buf lock, and we could have contended against that. So even if we might now exit after having contended against the "wrong" buf lock, it isn't wrong to return EBUSY because the only way this could happen is when we might have also contended against the "right" buf lock.

My motivation here is to avoid the bo lock more and reduce time under it. I have a few more ideas and in-progress patches along these lines, so I appreciate any discussion of what I might be missing or might need to consider here. Thanks.

I accepted the review, I said that I do not have rational arguments against it.

My only comment is that, reading the code, it's not immediately obvious why we give up right away when failing to acquire the buf lock, but not when we successfully acquire the buf lock and discover that the buffer identity has changed. A comment explaining that we want to avoid touching the interlock when the "locked lock" is probably going to fail anyway would help, I think.

Note that GB_LOCK_NOWAIT has somewhat subtle meaning, it does not request to not sleep, or even does not request to not sleep on a buffer lock. It only signals to avoid sleeping on buffer lock with the specified identity.

In D45245#1032681, @markj wrote:

My only comment is that, reading the code, it's not immediately obvious why we give up right away when failing to acquire the buf lock, but not when we successfully acquire the buf lock and discover that the buffer identity has changed.

There's no reason not to, it just didn't seem likely enough to worry about. My main issue comes from encountering locked bufs with the expected identity. But maybe it's better to be consistent so that the intent is easier to understand. I'll add this to the next diff but I'll take guidance on whether we actually do this or not.

A comment explaining that we want to avoid touching the interlock when the "locked lock" is probably going to fail anyway would help, I think.

I'll work something up to address your and @kib's comments and try not to be too wordy.

In D45245#1032860, @rlibby wrote:

In D45245#1032681, @markj wrote:

My only comment is that, reading the code, it's not immediately obvious why we give up right away when failing to acquire the buf lock, but not when we successfully acquire the buf lock and discover that the buffer identity has changed.

There's no reason not to, it just didn't seem likely enough to worry about. My main issue comes from encountering locked bufs with the expected identity. But maybe it's better to be consistent so that the intent is easier to understand. I'll add this to the next diff but I'll take guidance on whether we actually do this or not.

I don't have strong feelings either way. I agree that it's probably not worth worrying about that case; a comment would just make it clear why we don't (in the current version of the patch, anyway).

getblk: fail faster with GB_LOCK_NOWAIT
GB_LOCK_NOWAIT markj feedback

This revision now requires review to proceed.May 20 2024, 7:34 PM

Harbormaster completed remote builds in B57802: Diff 138814.May 20 2024, 7:34 PM

kib accepted this revision.May 20 2024, 9:08 PM

This revision is now accepted and ready to land.May 20 2024, 9:08 PM

rlibby added inline comments.May 20 2024, 9:09 PM

sys/kern/vfs_bio.c
4011–4015	Maybe this is too aggressive? lockmgr can also return EDEADLK with LK_NOWAIT, when we already hold an exclusive lock and then request a shared lock. We don't have shared buf locks in tree though.

kib added inline comments.May 21 2024, 4:17 AM

sys/kern/vfs_bio.c
4011–4015	We only use exclusive mode for buffer locks. Even if we started trying to use shared lock for bread()/brelse(), the situation you mention is invalid. I have some memories that recursion for the buffer lock was needed for xfs, but it is long time gone.