Page MenuHomeFreeBSD

unionfs: rework unionfs_getwritemount()
ClosedPublic

Authored by jah on Feb 15 2022, 3:44 AM.
Tags
None
Referenced Files
F102602536: D34282.diff
Thu, Nov 14, 4:00 PM
Unknown Object (File)
Mon, Nov 11, 4:34 AM
Unknown Object (File)
Sun, Nov 10, 4:54 AM
Unknown Object (File)
Wed, Nov 6, 10:45 AM
Unknown Object (File)
Mon, Nov 4, 4:33 AM
Unknown Object (File)
Sat, Nov 2, 9:55 PM
Unknown Object (File)
Sat, Nov 2, 9:28 PM
Unknown Object (File)
Wed, Oct 23, 6:19 AM
Subscribers

Details

Summary

VOP_GETWRITEMOUNT() is called on the vn_start_write() path without any
vnode locks guaranteed to be held. It's therefore unsafe to blindly access per-mount
and per-vnode data. Instead, follow the approach taken by nullfs and
use the vnode interlock coupled with the hold count to ensure the
mount and the vnode won't be recycled while they are being accessed.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jah requested review of this revision.Feb 15 2022, 3:44 AM
sys/fs/unionfs/union_vnops.c
1782

I removed the check for VREG from the updated code. Wouldn't we also want to check the upper vnode when beginning a directory write operation that will ultimately create a unionfs shadow dir?

sys/fs/unionfs/union_vnops.c
1777–1778

Since this is a frequently executed path, would it make sense to instead use vfs_smr_enter()/vfs_smr_exit() here, and an SMR pool for unionfs nodes? Presumably that would be coupled with atomic loads of vnode and mount fields.

mjg added inline comments.
sys/fs/unionfs/union_vnops.c
1777–1778

imo this is yet another argument for reworking how forced unmount works and most notably leaving both the mount point and v_data intact as long as the vnode is alive.

in the meantime I would suggest benchmarking e.g., buildkernel over unionfs. chances are decent there is so much slowness in there already that the interlock here does not make any difference.

sys/fs/unionfs/union_vnops.c
1777–1778

imo this is yet another argument for reworking how forced unmount works and most notably leaving both the mount point and v_data intact as long as the vnode is alive.

That's also a thought I've had quite a few times since I've started working on unionfs. It seems like each filesystem has to jump through quite a few hoops to accommodate forced unmount, which is both not the normal use case and somewhat hard to test.

What I'll probably do for the time being is commit this change as-is (modulo review feedback). I'll need to get a bigger machine before I can do meaningful scalability testing.

sys/fs/unionfs/union_vnops.c
1777–1778

So I ran a quick test of -j 104 buildkernel (of stable/12 on main), along with mount -t unionfs -o noatime -o above /mnt/tmp /mnt/tmp2

tmpfs: 2566.10s user 322.43s system 8669% cpu 33.319 total
unionfs: 1785.18s user 14403.61s system 8877% cpu 3:02.37 total

Looking at the profile reveals the issue is exclusive vnode locking for path lookup.

sys/fs/unionfs/union_vnops.c
1777–1778

Thanks for testing that out, and also thanks for reminding me: I've been meaning to see if I can get away with setting the shared-lookup flag on the unionfs mount, but there are at least 3 bugs (including this getwritemount) issue I need to fix before I can try that.

sys/fs/unionfs/union_vnops.c
1777–1778

That of course would also be contingent on both underlying FSes supporting shared lookups, but I think most of them do these days.

sys/fs/unionfs/union_vnops.c
1804

Are we effectively assuming that uvp isn't NULL here? I would guess that a preceding VOP_LOOKUP should provide that guarantee. If uvp can legitimately be NULL, meaning that a shadow directory for the parent doesn't exist, then I can't see why it's ok for this VOP to return 0.

sys/fs/unionfs/union_vnops.c
1804

Yes, I believe the prior lookup should prevent this. But it's a good question: should we instead return EACCES below this for the case in which uvp is NULL? This was based on the similar logic in nullfs, so if the answer is "yes" it seems like we should change nullfs too. The nullfs implementation also doesn't pass up the error returned from VOP_GETWRITEMOUNT() on the lower vnode, which seemed strange to me so I did decide to propagate the error here.

sys/fs/unionfs/union_vnops.c
1804

What is the similar logic in nullfs? For nullfs it is simpler, if we have nullfs vnode reclaimed, we cannot provide an answer for VOP_GETWRITEMOUNT(), otherwise we can. For unionfs it is more complicated, but perhaps the fall-back should be um_uppervp->v_mount or such.

sys/fs/unionfs/union_vnops.c
1804

I note that no implementations of vop_getwritemount return anything other than 0, including dead_getwritemount(), but passing up the error from VOP_GETWRITEMOUNT does seem like the right thing to do both here and in nullfs.

With respect to unp->un_uppervp being non-NULL, where unp corresponds to the parent vnode, if we think that a prior lookup guarantees the presence of an upper vnode, can't we add an assertion to that effect?

sys/fs/unionfs/union_vnops.c
1804

I'd somewhat prefer to return EACCES if the parent's uppervp is NULL, as that seems less coupled to current unionfs and VFS lookup behavior.

w.r.t. falling back to um_uppervp->v_mount, I'd rather not do that for the same reason: if at some point there is a situation where the parent uppervnode is NULL, we can't realistically complete whatever write operation is being attempted, so we should reject it.

sys/fs/unionfs/union_vnops.c
1804

VOP_GETWRITEMOUNT() is not the write, it is about getting a point to the mount point that would be the target of a write, if write happens. Basically it is a mount point that gets vn_start_write().

sys/fs/unionfs/union_vnops.c
1804

I understand that, it just seems not quite right to me to return success here in a case when a potential write using the specified vnode can't succeed, or to directly return um_uppervp->v_mount without querying the underlying FS.

Return EACCES if we can't find a suitable upper vnode; add comments and assertion

kib added inline comments.
sys/fs/unionfs/union_vnops.c
1777–1778

I do not think it is in the specification of this VOP to check for MNT_RDONLY, in fact. And I do not see much use for this check.

This revision is now accepted and ready to land.Feb 18 2022, 2:10 AM

Remove unnecessary MNT_RDONLY check

This revision now requires review to proceed.Feb 18 2022, 4:51 AM

I ran tests with D34282.102922.patch for 5 hours. No problems seen.

This revision is now accepted and ready to land.Feb 19 2022, 2:08 AM
This revision was automatically updated to reflect the committed changes.