Page MenuHomeFreeBSD

unionfs: use VV_ROOT to check for root vnode in unionfs_lock()
ClosedPublic

Authored by jah on Jan 17 2022, 1:01 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 15, 4:38 AM
Unknown Object (File)
Dec 12 2024, 11:52 AM
Unknown Object (File)
Nov 26 2024, 7:46 AM
Unknown Object (File)
Nov 17 2024, 6:43 PM
Unknown Object (File)
Nov 17 2024, 6:37 PM
Unknown Object (File)
Nov 17 2024, 6:31 PM
Unknown Object (File)
Nov 17 2024, 4:21 PM
Unknown Object (File)
Nov 5 2024, 7:27 AM
Subscribers

Details

Summary

Also, apply VV_ROOT to the root vnode directly in unionfs_domount()
instead of indirectly through unionfs_nodeget(). This allows both
functions to be simplified, and eliminates potentially wild access
to unionfs mount private data in unionfs_lock().

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jah requested review of this revision.Jan 17 2022, 1:01 AM

What if um_rootvp is reclaimed?

In D33914#766950, @kib wrote:

What if um_rootvp is reclaimed?

Would that matter as far as the check for VV_ROOT is concerned? It looks as though v_vflag isn't cleared until the vnode is freed, and we always keep a reference to um_rootvp.
Also, unionfs_root() directly returns um_rootvp instead of calling unionfs_nodeget().

In D33914#766951, @jah wrote:
In D33914#766950, @kib wrote:

What if um_rootvp is reclaimed?

Would that matter as far as the check for VV_ROOT is concerned? It looks as though v_vflag isn't cleared until the vnode is freed, and we always keep a reference to um_rootvp.
Also, unionfs_root() directly returns um_rootvp instead of calling unionfs_nodeget().

Suppose that um_rootvp is reclaimed, and another unionfs vnode is instantiated over corresponding upper or lower vnode. In this case that new vnode does not get VV_ROOT set. This breaks dotdot lookups at least.

In D33914#766955, @kib wrote:
In D33914#766951, @jah wrote:
In D33914#766950, @kib wrote:

What if um_rootvp is reclaimed?

Would that matter as far as the check for VV_ROOT is concerned? It looks as though v_vflag isn't cleared until the vnode is freed, and we always keep a reference to um_rootvp.
Also, unionfs_root() directly returns um_rootvp instead of calling unionfs_nodeget().

Suppose that um_rootvp is reclaimed, and another unionfs vnode is instantiated over corresponding upper or lower vnode. In this case that new vnode does not get VV_ROOT set. This breaks dotdot lookups at least.

What exactly would this breakage look like? Would the reclamation only happen in the case of forced unmount of unionfs (and if so, would we care?), or could it happen for some other reason such as a debug sysctl?

You can see yourself what would be the breakage for dotdots: in vfs_lookup.c, the block starting at line 1038 if (cnp->cn_flags & ISDOTDOT) { checks for VV_ROOT to detect the case when it should walk over the mount point instead of calling VOP_LOOKUP(). Most likely, it would result in either upper or lower filesystem call to VOP_LOOKUP(), which again would either break VFS invariant that dotdot lookup is never done on VV_ROOT, or (if unionfs allows sub-directory covering, nullfs does allow it) causes exposition of the vnodes not supposed to be accessed through this mount.

Right now I think that indeed, there are only forced unmount and debugging sysctls that result in the referenced vnode reclamation. But VFS does not guarantee that, and we might grow additional cases for reclaim without breaking the promises.

In D33914#766971, @kib wrote:

You can see yourself what would be the breakage for dotdots: in vfs_lookup.c, the block starting at line 1038 if (cnp->cn_flags & ISDOTDOT) { checks for VV_ROOT to detect the case when it should walk over the mount point instead of calling VOP_LOOKUP(). Most likely, it would result in either upper or lower filesystem call to VOP_LOOKUP(), which again would either break VFS invariant that dotdot lookup is never done on VV_ROOT, or (if unionfs allows sub-directory covering, nullfs does allow it) causes exposition of the vnodes not supposed to be accessed through this mount.

Right now I think that indeed, there are only forced unmount and debugging sysctls that result in the referenced vnode reclamation. But VFS does not guarantee that, and we might grow additional cases for reclaim without breaking the promises.

The other reason I think this *might* not be an issue in practice is that the unionfs node keeps a reference to the parent directory vnode, which is returned for ISDOTDOT lookups instead of calling unionfs_nodeget().
But it also seems as though there are enough corner cases and things that might change in the future here, that it's probably better to restore the VV_ROOT logic in unionfs_nodeget().

In D33914#766976, @jah wrote:
In D33914#766971, @kib wrote:

You can see yourself what would be the breakage for dotdots: in vfs_lookup.c, the block starting at line 1038 if (cnp->cn_flags & ISDOTDOT) { checks for VV_ROOT to detect the case when it should walk over the mount point instead of calling VOP_LOOKUP(). Most likely, it would result in either upper or lower filesystem call to VOP_LOOKUP(), which again would either break VFS invariant that dotdot lookup is never done on VV_ROOT, or (if unionfs allows sub-directory covering, nullfs does allow it) causes exposition of the vnodes not supposed to be accessed through this mount.

Right now I think that indeed, there are only forced unmount and debugging sysctls that result in the referenced vnode reclamation. But VFS does not guarantee that, and we might grow additional cases for reclaim without breaking the promises.

The other reason I think this *might* not be an issue in practice is that the unionfs node keeps a reference to the parent directory vnode, which is returned for ISDOTDOT lookups instead of calling unionfs_nodeget().
But it also seems as though there are enough corner cases and things that might change in the future here, that it's probably better to restore the VV_ROOT logic in unionfs_nodeget().

On a related note, one reason I wanted to see if I could get away with removing the VV_ROOT detection from unionfs_nodeget() is that the logic there seemed fragile. Both unionfs and nullfs rely on simple pointer comparison between the vnode returned from lower FS lookup and the stored lower root vp to detect when VV_ROOT should be set. Could there conceivably be some case in which lookup returns a different vnode (but representing the same directory), such that the comparison would fail and VV_ROOT would not be set when it should be? It seems that forced unmount of the lower FS could do this, but that forced unmount would also recursively unmount the unionfs/nullfs.

In D33914#767090, @jah wrote:

On a related note, one reason I wanted to see if I could get away with removing the VV_ROOT detection from unionfs_nodeget() is that the logic there seemed fragile. Both unionfs and nullfs rely on simple pointer comparison between the vnode returned from lower FS lookup and the stored lower root vp to detect when VV_ROOT should be set. Could there conceivably be some case in which lookup returns a different vnode (but representing the same directory), such that the comparison would fail and VV_ROOT would not be set when it should be? It seems that forced unmount of the lower FS could do this, but that forced unmount would also recursively unmount the unionfs/nullfs.

This is true, of course. Comparing with lowervp root to detect root of the upper fs has the same concerns as comparing the own vp with cached root. The only relief is that the lower mp is protected against forced unmount (but not debug reclaim). Unfortunately, there is no alternative right now.

Restore VV_ROOT management to unionfs_nodeget()

sys/fs/unionfs/union_vfsops.c
245

Why is upperrootvp must be non-NULL?
But if it should, why nodeget() checks lowervp for VV_ROOT?

sys/fs/unionfs/union_vfsops.c
245

It doesn't make much sense to have a unionfs mount with a non-existent upper FS. Also, we lookup upperrootvp right before this and exit if namei() returns an error. Is there ever a situation in which namei would return a NULL vp on a lookup without also returning an error such as ENOENT?

The check in nodeget is a minor defensive measure against upper/lower root vnode reclamation. It's also worth noting that with unionfs, either the upper or the lower root will be the covered vp for the unionfs mount, although from what I can tell this doesn't make any difference today in whether that vnode will be reclaimed.

kib added inline comments.
sys/fs/unionfs/union_vfsops.c
245

I mean, at line 379, what is the point of checking lowervp if uppervp is guaranteed to be non-NULL?

This revision is now accepted and ready to land.Jan 24 2022, 11:52 PM
sys/fs/unionfs/union_vfsops.c
245

Yes, that's what I'm referring to. Earlier I brought up the concern with the lower/upper root vnodes possibly being reclaimed and thus breaking the pointer comparison. Checking them both seems like a way to reduce the likelihood of this happening, but perhaps this is not worth doing?

sys/fs/unionfs/union_vfsops.c
245

Hm, ok, I see the point. Perhaps add a comment explaining the rationale.

In fact, you might check for the VN_IS_DOOMED() on lower/upperrootvp as well, and do something in this case.

sys/fs/unionfs/union_vfsops.c
245

I like the VN_IS_DOOMED() idea.

Actually, what would you think of returning this patch to the way I originally had it, with VV_ROOT set directly from unionfs_domount(), but with unionfs_nodeget() changed to reject new node creation if ump->um_rootvp is doomed?

Does that seem reasonable, or could it be too restrictive in the face of future VFS changes?

sys/fs/unionfs/union_vfsops.c
245

You mean, refuse to create unionfs vnode is upper/lower is doomed? I do not think it is correct.

Consider vnodes as the memory cache for on-disk inodes. From this PoV, vnodes lifetime is transient and does not reflect inode lifetime. The fact that we expose doomed state to other observers except the thread which does vgone() is just a matter of implementation.

sys/fs/unionfs/union_vfsops.c
245

I understand the idea of vnodes as a transient software cache, but as a practical concern I don't think a unionfs mount is very useful if its root vnodes have been doomed.

Above, I was actually referring to the unionfs root vnode (um_rootvp), which is backed by the upper/lower root vnodes on the underlying FSes (um_lowervp, um_uppervp). The unionfs mount holds a reference to all 3 vnodes, so if these vnodes are reclaimed while the unionfs mount is active then it must be through some explicit action. If such an action takes place, it doesn't seem unreasonable to prevent unionfs node creation in order to avoid bigger problems.

um_rootvp is directly returned by unionfs_root, so if it becomes doomed, then cross-mount lookups that enter the unionfs mount won't produce a useful result. If um_lowervp and um_uppervp are doomed, the same will happen, plus if they are replaced by new vnodes that represent the same underlying inodes, unionfs will no longer be able to accurately detect a request to create a new unionfs root vnode.

sys/fs/unionfs/union_vfsops.c
245

Yes, but unionfs can detect that cached lower and upper root vps are reclaimed and try to do something. For instance, it might re-lookup them. Or we can declare our fs dead and start unmounting it. Or may be we can do something different and better.

sys/fs/unionfs/union_vfsops.c
245

That's entirely true. I have another related question as well:

Even if the lower and upper root vps aren't reclaimed, is there any guarantee that lookup againt the underlying FSes will for the lower/upper inodes will always produce the same cached vnode pointers? And if not, is this another thing that could break e.g. setting of VV_ROOT on dot-dot lookup, even without some unusual case like forced unmount?

Regardless, I have more questions than answers about how things should ultimately be done for root vnodes. So for now I'll just follow your suggestion to add a comment explaining why both lower and upper are checked in unionfs_nodeget().

Add comment explaining VV_ROOT check

This revision now requires review to proceed.Jan 30 2022, 12:27 AM
sys/fs/unionfs/union_vfsops.c
245

No, there is no guarantee. As a real-world example, it might be interesting to look at 76b1b5ce6d81f66b09be8 / PR 253593

This revision is now accepted and ready to land.Jan 30 2022, 2:43 AM