This is required e.g. for nullfs to ensure liveness of the lower mount points.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
The logic looks fine to me, although I'll
admit I do not understand why the mounts
need to be busied? But I am sure Kostik
understands this.
This fixes the page fault I got in vn_copy_file_range()
https://people.freebsd.org/~pho/stress/log/log0497.txt
sys/kern/vfs_vnops.c | ||
---|---|---|
3094–3097 | This sequence is enough, but not "optimal" because vfs_busy()'s return value doesn't allow to distinguish between failures caused by either an unmount still pending (which can only occur with MBF_NOWAIT) or an effectively completed unmount. This could be fixed here, but in any case I'll submit a(n independent) general patch later for that. |
They need to be busied at least to access mnt_vfc in a safe manner. But kib@'s change is more than that, it also takes care of keeping a copy of the v_mount fields after having checked they are not null, instead of reloading them from the vnodes in the name comparison. Once the mount is busied, mnt_vfc is stable and can never be wrong. The only thing that still can happen after this is that the in and out vnodes can get doomed (which causes the crash seen by pho@ in the name comparison, because a valid v_mount was not cached), and hopefully this is taken care of in the implementations of VOP_COPY_FILE_RANGE() (I didn't check) and of vn_generic_copy_file_range() (I've checked).
nullfs bypass needs to safely convert nullfs vnode into lower vnode. It is fine for vnode to be reclaimed meantime (i.e. not be locked), but the mnt_data for nullfs must be stable. Usually this is ensured by the dispatch vnode lock, since fs cannot be unmounted until vnode is reclaimed, that is blocked by lock. But VOP_COPY_FILE_RANGE() is special, it does not lock vnode(s).
sys/kern/vfs_vnops.c | ||
---|---|---|
3094–3097 | The concurrent-unmount case is not the common case here, and I would be surprised if we care at all about performance in that case. Given that, what would be the point of "optimizing" for the MNTK_REFEXPIRE case? In fact, I would ask why we even need to check for ENOENT here? |
sys/kern/vfs_vnops.c | ||
---|---|---|
3094–3097 |
Failed unmount should be transparent, it must not cause transient errors. |
sys/kern/vfs_vnops.c | ||
---|---|---|
3094–3097 | Right, so if a concurrent attempted (but ultimately failed) unmount caused the vfs_busy(..., MBF_NOWAIT) call at line 3092 to transiently fail, would we not end up retrying the sequence even without the error == ENOENT check at line 3095? In other words, since we already know the vfs_busy(..., MBF_NOWAIT) call at line 3092 failed, we should just unconditionally unbusy inmp, busy (without MBF_NOWAIT) and unbusy outmp, and then repeat the loop. vfs_busy() doesn't appear to return any other error besides ENOENT, and I don't see any point in making it more complex to distinguish between the MBF_NOWAIT+pending-umount and MNTK_REFEXPIRE cases just to serve this path. |
This version look fine to me (I was wondering why you
checked specifically for ENOENT).
And thanks for the comments. I was thinking of how
the vnode would be busted (assuming checks for VI_DOOMED),
but forgot about the mount data.
sys/kern/vfs_vnops.c | ||
---|---|---|
3094–3097 | @jah: I had forgotten to point out that vfs_busy() can only return 0 or ENOENT, thanks for doing it. There's is in the end not much performance benefit in distinguishing the pending and finished unmount cases, the only real advantage is code hygiene, which helps understanding what is written quickly. I think that the fact that this piece of code was not immediately clear and prompted comments/changes kind of proves that point. There would have been no need (speaking for me, at least) to look at vfs_busy()'s implementation if that function returned, e.g., EAGAIN for the pending unmount case. Additionally, the impact of this change is very minor for callers (it's just an additional if). The only real drawback is that all call sites must be inspected and adapted (once). |
sys/kern/vfs_vnops.c | ||
---|---|---|
3094–3097 | I think in some cases it's true that distinguishing between different (non-success) return codes aids code readability, but I'm skeptical that this is one of those cases. For me at least, the only thing that wasn't immediately clear was why there was an explicit check against ENOENT. IMO the way the code is now (with the only distinction being success vs. failure) is clearer (and probably easier to maintain in the long run) than having to distinguish between success, EAGAIN, and some other kind of failure. |