Page MenuHomeFreeBSD

vn_copy_file_range(): busy both in and out mp around call to VOP_COPY_FILE_RANGE()
ClosedPublic

Authored by kib on Nov 12 2023, 6:43 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 5, 9:46 PM
Unknown Object (File)
Tue, Nov 5, 9:46 PM
Unknown Object (File)
Tue, Nov 5, 7:44 PM
Unknown Object (File)
Mon, Oct 21, 2:45 AM
Unknown Object (File)
Mon, Oct 21, 2:44 AM
Unknown Object (File)
Mon, Oct 21, 2:44 AM
Unknown Object (File)
Mon, Oct 21, 2:44 AM
Unknown Object (File)
Mon, Oct 21, 2:44 AM

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Nov 12 2023, 6:43 PM
kib updated this revision to Diff 129983.

Avoid double-busying.

The logic looks fine to me, although I'll
admit I do not understand why the mounts
need to be busied? But I am sure Kostik
understands this.

This revision is now accepted and ready to land.Nov 13 2023, 3:19 AM
olce added a subscriber: olce.
olce added inline comments.
sys/kern/vfs_vnops.c
3094–3097

This sequence is enough, but not "optimal" because vfs_busy()'s return value doesn't allow to distinguish between failures caused by either an unmount still pending (which can only occur with MBF_NOWAIT) or an effectively completed unmount. This could be fixed here, but in any case I'll submit a(n independent) general patch later for that.

The logic looks fine to me, although I'll
admit I do not understand why the mounts
need to be busied? But I am sure Kostik
understands this.

They need to be busied at least to access mnt_vfc in a safe manner. But kib@'s change is more than that, it also takes care of keeping a copy of the v_mount fields after having checked they are not null, instead of reloading them from the vnodes in the name comparison. Once the mount is busied, mnt_vfc is stable and can never be wrong. The only thing that still can happen after this is that the in and out vnodes can get doomed (which causes the crash seen by pho@ in the name comparison, because a valid v_mount was not cached), and hopefully this is taken care of in the implementations of VOP_COPY_FILE_RANGE() (I didn't check) and of vn_generic_copy_file_range() (I've checked).

The logic looks fine to me, although I'll
admit I do not understand why the mounts
need to be busied? But I am sure Kostik
understands this.

nullfs bypass needs to safely convert nullfs vnode into lower vnode. It is fine for vnode to be reclaimed meantime (i.e. not be locked), but the mnt_data for nullfs must be stable. Usually this is ensured by the dispatch vnode lock, since fs cannot be unmounted until vnode is reclaimed, that is blocked by lock. But VOP_COPY_FILE_RANGE() is special, it does not lock vnode(s).

sys/kern/vfs_vnops.c
3094–3097

The concurrent-unmount case is not the common case here, and I would be surprised if we care at all about performance in that case. Given that, what would be the point of "optimizing" for the MNTK_REFEXPIRE case?

In fact, I would ask why we even need to check for ENOENT here?

sys/kern/vfs_vnops.c
3094–3097

In fact, I would ask why we even need to check for ENOENT here?

Failed unmount should be transparent, it must not cause transient errors.

sys/kern/vfs_vnops.c
3094–3097

Right, so if a concurrent attempted (but ultimately failed) unmount caused the vfs_busy(..., MBF_NOWAIT) call at line 3092 to transiently fail, would we not end up retrying the sequence even without the error == ENOENT check at line 3095? In other words, since we already know the vfs_busy(..., MBF_NOWAIT) call at line 3092 failed, we should just unconditionally unbusy inmp, busy (without MBF_NOWAIT) and unbusy outmp, and then repeat the loop.

vfs_busy() doesn't appear to return any other error besides ENOENT, and I don't see any point in making it more complex to distinguish between the MBF_NOWAIT+pending-umount and MNTK_REFEXPIRE cases just to serve this path.

Do not handle ENOENT specially.

This revision now requires review to proceed.Nov 13 2023, 3:35 PM

This version look fine to me (I was wondering why you
checked specifically for ENOENT).

And thanks for the comments. I was thinking of how
the vnode would be busted (assuming checks for VI_DOOMED),
but forgot about the mount data.

This revision is now accepted and ready to land.Nov 13 2023, 4:08 PM
olce added inline comments.
sys/kern/vfs_vnops.c
3094–3097

@jah: I had forgotten to point out that vfs_busy() can only return 0 or ENOENT, thanks for doing it.

There's is in the end not much performance benefit in distinguishing the pending and finished unmount cases, the only real advantage is code hygiene, which helps understanding what is written quickly. I think that the fact that this piece of code was not immediately clear and prompted comments/changes kind of proves that point. There would have been no need (speaking for me, at least) to look at vfs_busy()'s implementation if that function returned, e.g., EAGAIN for the pending unmount case. Additionally, the impact of this change is very minor for callers (it's just an additional if). The only real drawback is that all call sites must be inspected and adapted (once).

sys/kern/vfs_vnops.c
3094–3097

I think in some cases it's true that distinguishing between different (non-success) return codes aids code readability, but I'm skeptical that this is one of those cases. For me at least, the only thing that wasn't immediately clear was why there was an explicit check against ENOENT. IMO the way the code is now (with the only distinction being success vs. failure) is clearer (and probably easier to maintain in the long run) than having to distinguish between success, EAGAIN, and some other kind of failure.