"rm-style" system calls such as kern_frmdirat() and kern_funlinkat() don't supply SAVENAME to preserve the pathname buffer for subsequent vnode ops. For unionfs this poses an issue because the pathname may be needed for a relookup operation in unionfs_remove()/unionfs_rmdir(). Currently unionfs doesn't check for this case, leading to a panic on DIAGNOSTIC kernels and use-after-free of cn_nameptr otherwise. The unionfs node's stored buffer would suffice as a replacement for cnp->cn_nameptr in some (but not all) cases, but it's cleaner to just ensure that unionfs vnode ops always have a valid cn_nameptr by setting SAVENAME in unionfs_lookup(). While here, do some light cleanup in unionfs_lookup() and assert that HASBUF is always present in the relevant relookup calls.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 41769 Build 38658: arc lint + arc unit
Event Timeline
I do have a question here:
Instead of taking this approach, would it be acceptable to simply change kern_frmdirat()/kern_funlinkat() to pass SAVENAME to namei()?
unionfs might of course be the only FS that would make use of the name buffer in those cases, but it seems like a straightforward change.
If that's not acceptable, then I have a similar (but less likely to be encountered) issue to fix in unionfs_remove(), where we try to resolve the upper/lower vnode for VSOCK vnodes.
That will be harder to fix because unionfs doesn't actually create unionfs_nodes for VSOCK vnodes, so there won't be a path readily available. But I also don't immediately see why unionfs needs to make that special case for VSOCK; we probably don't want unionfs copy-on-write behavior for sockets, but I don't see why we need to completely avoid allocating a unionfs_node for them, so maybe with some other refactoring that code can just go away.
I do not see why SAVENAME would be not acceptable? Why did you think that it could be?
I thought adding SAVENAME in unionfs_lookup() might be frowned upon, based on the comment for NDVALIDATE():
/* * Validate the final state of ndp after the lookup. * * Historically filesystems were allowed to modify cn_flags. Most notably they * can add SAVENAME to the request, resulting in HASBUF and pushing subsequent * clean up to the consumer. In practice this seems to only concern != LOOKUP * operations. * * As a step towards stricter API contract this routine validates the state to * clean up. Note validation is a work in progress with the intent of becoming * stricter over time. */ #define NDMODIFYINGFLAGS (LOCKLEAF | LOCKPARENT | WANTPARENT | SAVENAME | SAVESTART | HASBUF)
It looks like I can get away with it for any nameiop besides LOOKUP, so if this is still OK to do then I'm fine doing that instead.
That might be more foolproof for unionfs since it would ensure a valid pathname buffer regardless of what the namei() caller supplied.
Just do it. The idea here is to not put unionfs-specific stuff outside of it if it can be helped.
I got this panic while testing with D32148.95756.diff:
panic: vn_lock: error 16 incompatible with flags 0x82400 cpuid = 1 time = 1632729527 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0141a13740 vpanic() at vpanic+0x187/frame 0xfffffe0141a137a0 panic() at panic+0x43/frame 0xfffffe0141a13800 _vn_lock_fallback() at _vn_lock_fallback+0xd6/frame 0xfffffe0141a13860 _vn_lock() at _vn_lock+0x86/frame 0xfffffe0141a138c0 unionfs_nodeget() at unionfs_nodeget+0x719/frame 0xfffffe0141a13960 unionfs_lookup() at unionfs_lookup+0x63b/frame 0xfffffe0141a13ab0 VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x5a/frame 0xfffffe0141a13ad0 vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe0141a13b20 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x5a/frame 0xfffffe0141a13b40 lookup() at lookup+0x4d1/frame 0xfffffe0141a13be0 namei() at namei+0x379/frame 0xfffffe0141a13c90 kern_frmdirat() at kern_frmdirat+0x15e/frame 0xfffffe0141a13e00 amd64_syscall() at amd64_syscall+0x147/frame 0xfffffe0141a13f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0141a13f30
That's an unrelated issue; I have a local fix for it (along with the ENOENT-on-mkdir issue) that I'll post after this one.