This can happen if VOP_READLINK returns such a path. So far I've seen
it in fusefs, but it can probably happen in other file systems, too.
PR: 274268
MFC after: 1 week
Sponsored by: Axcient
Differential D42081
Don't panic in vfs_lookup due to paths with embedded NULs asomers on Oct 4 2023, 6:54 PM. Authored by Tags None Referenced Files
Details
This can happen if VOP_READLINK returns such a path. So far I've seen PR: 274268 Test case added to the fusefs test suite, though the bug isn't fusefs specific.
Diff Detail
Event TimelineComment Actions by "here" I meant the vfs layer ;) i don't think the patch is legitimate in that length indicated by VOP_READLINK does not line up with the real length -- it is that discrepancy which needs to be avoided instead. i would make it an invariant that VOP_READLINK returns correct size or an error like in the case above. then a debug_pre func for VOP_READLINK could validate it. i also note the proposed patch pessimizes all lookups (not only those which encounter a symlink) by adding a func call and no longer avoiding an extra branch per character (currently achieved by slapping / at the end) with this i'm buggering off the area Comment Actions I thought of that. But I think it's better this way for two reasons:
I can of course inline strchrnul if you think it would be worthwhile. Comment Actions Why do other filesystems need to be modified if we check the invariant in a vop_readlink_post()?
Comment Actions Because I didn't know about the vop _post hooks ;). But now that I look at it, it isn't possible. vop_readlink_post only has access to the struct uio _after_ the copy operation. So it doesn't know where the path actually starts, and it can't check it. Can you think of any other common point in the call stack? Comment Actions You could add something to namei_follow_link(), where VOP_READLINK is actually called during name resolution. But I also want to ask, why is not sufficient to modify fuse alone? How would other filesystems allow this situation to occur, barring corruption of on-disk data? Comment Actions
Exactly, corruption of on-disk data. From inspection, I think that ext2fs is vulnerable. I haven't checked others. But I think we should be resilient against corrupt disks. Comment Actions But your approach to resiliency should depend on the filesystem. FUSE explicitly needs to guard against this particular possibility, so it should check for nul bytes in symlink targets. OpenZFS probably does not, assuming that it always writes valid symlinks to disk and can catch corruption using checksums. UFS and ext2 provide no mechanism to detect data corruption, so we cannot be resilient to corruption in any general sense. Certainly they could explicitly check for this case and panic, but then we should presumably also make sure that directory entries don't contain nul bytes (ext2_readdir() and ufs_readdir() don't appear to verify this), and the rabbit hole doesn't end there. Comment Actions ZFS checksums only guard against hardware-caused corruption. Corruption caused by software bugs is not prevented. For example, I've dealt with a software bug that caused ZFS to record compressed records but indicate in the block pointer that they were uncompressed. If any of those records were for symlink targets (does ZFS ever allocate a whole record for a really long symlink?) then it would've contained NULs. Comment Actions The way we find such bugs is with assertions. This diff removes an assertion which could have caught the bug you described. In any case, I'm opposed to the vfs_lookup.c portion of the diff. If we have a symlink pointing to foo/bar and some kind of corruption turns that into foo\0bar, an INVARIANTS kernel would now treat that as "foo" when before it would have failed closed. FUSE is special here since its VOP_READLINK returns a target provided by an untrusted source. FUSE should be verifying that its inputs to the VFS don't violate VFS invariants. Other filesystems could likely do more to avoid trusting on-disk data more than is strictly necessary without overly hurting performance, but that seems like a separate concern. Comment Actions @markj I've moved the check into namei_follow_link . Do you like this version better? I could also have it print some kind of warning to dmesg, subject to INVARIANTS, if you would prefer. Or I could remove it entirely and restore the panic, if you prefer. The problem with doing the check in namei_follow_link is that it causes an extra scan through the string, rather than piggy backing on the scan that already takes place in vfs_lookup. Comment Actions BTW is there a check in fusefs that lookup never returns vp == dvp (of non-dot cnp, but this should never reach VOP)? IMO it is even more fundamental invariant. Comment Actions Very close. There is such a check at the interface to the fuse server. So it checks that FUSE_LOOKUP doesn't return the same inode number as the parent directory. And similar for FUSE_CREATE, FUSE_MKDIR, etc. Comment Actions Sorry, but I still don't like it. :) fuse_vnop_readlink() should be sanitizing its inputs to the VFS rather than expecting the VFS to handle this situation in some arbitrary way. With this patch, we are now hiding what could be bugs in all VOP_READLINK implementations. If it were ZFS returning corrupted symlink targets, then I'd want the system to panic (at least on INVARIANTS systems). If this situation can legitimately arise in FUSE filesystems, then fuse_vnop_readlink() needs to decide how to handle it; in this case it seems like you want to simply return the string up to the first nul terminator. Comment Actions Well, it can't _legitimately_ occur in fusefs. The best behavior I think would be to return EIO if a fuse server does this. The only reason I chose to swallow the NUL in vfs_lookup was because it was easier to do that than to return an error. What I really want is for the system to not panic, at least on non-debug kernels, due to a naughty fuse server or corrupted disk or something. Do you really prefer a panic over returning an error and printing something to dmesg? Comment Actions In the patch you're changing the VFS to treat the first nul as the end of the symlink value. You can change fuse_vnop_readlink() to do that instead.
In an INVARIANTS kernel, of course I prefer to panic. The panic you reported in PR 274268 is from a KASSERT, so I'm not sure what you mean about non-debug kernels. As far as I can see, an embedded nul won't trigger a panic directly in such kernels, but it's hard to reason about what will happen with or without your patch. In any case, I think FUSE needs to handle the possibility of a naughty fuse server without weakening VFS invariants. |