sys/kern/vfs_vnops.c
3094–3097	This sequence is enough, but not "optimal" because `vfs_busy()`'s return value doesn't allow to distinguish between failures caused by either an unmount still pending (which can only occur with `MBF_NOWAIT`) or an effectively completed unmount. This could be fixed here, but in any case I'll submit a(n independent) general patch later for that.

In D42554#971019, @rmacklem wrote:

The logic looks fine to me, although I'll
admit I do not understand why the mounts
need to be busied? But I am sure Kostik
understands this.

They need to be busied at least to access mnt_vfc in a safe manner. But kib@'s change is more than that, it also takes care of keeping a copy of the v_mount fields after having checked they are not null, instead of reloading them from the vnodes in the name comparison. Once the mount is busied, mnt_vfc is stable and can never be wrong. The only thing that still can happen after this is that the in and out vnodes can get doomed (which causes the crash seen by pho@ in the name comparison, because a valid v_mount was not cached), and hopefully this is taken care of in the implementations of VOP_COPY_FILE_RANGE() (I didn't check) and of vn_generic_copy_file_range() (I've checked).

In D42554#971019, @rmacklem wrote:

The logic looks fine to me, although I'll
admit I do not understand why the mounts
need to be busied? But I am sure Kostik
understands this.

nullfs bypass needs to safely convert nullfs vnode into lower vnode. It is fine for vnode to be reclaimed meantime (i.e. not be locked), but the mnt_data for nullfs must be stable. Usually this is ensured by the dispatch vnode lock, since fs cannot be unmounted until vnode is reclaimed, that is blocked by lock. But VOP_COPY_FILE_RANGE() is special, it does not lock vnode(s).

jah added inline comments.Nov 13 2023, 2:37 PM

sys/kern/vfs_vnops.c
3094–3097	The concurrent-unmount case is not the common case here, and I would be surprised if we care at all about performance in that case. Given that, what would be the point of "optimizing" for the MNTK_REFEXPIRE case? In fact, I would ask why we even need to check for ENOENT here?

kib added inline comments.Nov 13 2023, 2:58 PM

sys/kern/vfs_vnops.c
3094–3097	In fact, I would ask why we even need to check for ENOENT here? Failed unmount should be transparent, it must not cause transient errors.

jah added inline comments.Nov 13 2023, 3:17 PM

sys/kern/vfs_vnops.c
3094–3097	Right, so if a concurrent attempted (but ultimately failed) unmount caused the vfs_busy(..., MBF_NOWAIT) call at line 3092 to transiently fail, would we not end up retrying the sequence even without the error == ENOENT check at line 3095? In other words, since we already know the vfs_busy(..., MBF_NOWAIT) call at line 3092 failed, we should just unconditionally unbusy inmp, busy (without MBF_NOWAIT) and unbusy outmp, and then repeat the loop. vfs_busy() doesn't appear to return any other error besides ENOENT, and I don't see any point in making it more complex to distinguish between the MBF_NOWAIT+pending-umount and MNTK_REFEXPIRE cases just to serve this path.

Do not handle ENOENT specially.

This revision now requires review to proceed.Nov 13 2023, 3:35 PM

This version look fine to me (I was wondering why you
checked specifically for ENOENT).

And thanks for the comments. I was thinking of how
the vnode would be busted (assuming checks for VI_DOOMED),
but forgot about the mount data.

This revision is now accepted and ready to land.Nov 13 2023, 4:08 PM

jah accepted this revision.Nov 13 2023, 4:18 PM

olce accepted this revision.Nov 13 2023, 4:42 PM

olce added inline comments.

sys/kern/vfs_vnops.c
3094–3097	@jah: I had forgotten to point out that `vfs_busy()` can only return `0` or `ENOENT`, thanks for doing it. There's is in the end not much performance benefit in distinguishing the pending and finished unmount cases, the only real advantage is code hygiene, which helps understanding what is written quickly. I think that the fact that this piece of code was not immediately clear and prompted comments/changes kind of proves that point. There would have been no need (speaking for me, at least) to look at `vfs_busy()`'s implementation if that function returned, e.g., `EAGAIN` for the pending unmount case. Additionally, the impact of this change is very minor for callers (it's just an additional `if`). The only real drawback is that all call sites must be inspected and adapted (once).

jah added inline comments.Nov 13 2023, 4:57 PM

sys/kern/vfs_vnops.c
3094–3097	I think in some cases it's true that distinguishing between different (non-success) return codes aids code readability, but I'm skeptical that this is one of those cases. For me at least, the only thing that wasn't immediately clear was why there was an explicit check against ENOENT. IMO the way the code is now (with the only distinction being success vs. failure) is clearer (and probably easier to maintain in the long run) than having to distinguish between success, EAGAIN, and some other kind of failure.