Page MenuHomeFreeBSD

Add support for managing UFS/FFS snapshots to fsck_ffs(8)
ClosedPublic

Authored by mckusick on Sep 8 2022, 12:54 AM.
Tags
None
Referenced Files
F102811634: D36491.id112753.diff
Sun, Nov 17, 12:16 PM
F102805175: D36491.diff
Sun, Nov 17, 10:11 AM
Unknown Object (File)
Wed, Nov 13, 10:12 PM
Unknown Object (File)
Tue, Nov 12, 2:33 PM
Unknown Object (File)
Mon, Nov 11, 4:40 AM
Unknown Object (File)
Tue, Nov 5, 10:34 AM
Unknown Object (File)
Tue, Oct 29, 9:29 PM
Unknown Object (File)
Thu, Oct 24, 11:16 AM
Subscribers

Details

Summary

The kernel handles the managment of UFS/FFS snapshots. Since UFS/FFS updates filesystem data (rather than always writing changes to new locations like ZFS), the kernel must check every filesystem write to see if the block being written is part of a snapshot. If it is part of a snapshot, then the kernel must make a copy of the old block value into a newly allocated block for the snapshot before allowing the write to be done. Similarly, if a block is being freed, the kernel must check to see if it is part of a snapshot and let the snapshot claim the block rather than freeing it for future use. When a snapshot is freed, its blocks need to be offered to older snapshots and freed only if no older snapshots wish to claim them.

When snapshots were added to UFS/FFS they were integrated into soft updates and just a small part of the management of snapshots needed to be added to fsck_ffs(8) as soft updates minimized the set of snapshot changes that might need correction. When journaling was added to soft updates a much more complete knowledge of snapshots needed to be added to fsck_ffs(8) for it to be able to properly handle the filesystem changes that a journal rollback needs to do (specifically the freeing and allocation of blocks). Since this functionality was unavailable, the use of snapshots was disabled when running with journaled soft updates.

This set of changes imports the kernel code for the management of snapshots to fsck_ffs(8). With this code in place it will become possible to enable snapshots when running with journalled soft updates. The most immediate benefit will be the ability to use snapshots to take consistent filesystem dumps on live filesystems. Future work will be done to update fsck_ffs(8) to be able to use snapshots to run in background on live filesystems running with journaled soft updates.

Test Plan

Working with Peter Holm to run his existing filesystem snapshot tests and to add some new tests.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Could you, please, upload the diff with full context? git diff -U 9999999 ... should do it

Is there any sanity checking for the snapshot state in the existing fsck code? I believe there is a lot that can be broken with snapshots, and we must ensure that after fsck pass, kernel is protected against the corruption.

sbin/fsck_ffs/fsutil.c
324
471

Can those assumptions be broken in the corrupted filesystem?

479

Same

522

Same

sbin/fsck_ffs/inode.c
820
mckusick added inline comments.
sbin/fsck_ffs/fsutil.c
471

The superblock and all cylinder groups are copied as part of the creation of the snapshot, and it will not be marked as a snapshot until all the copies have been done and committed. We create a list of all the blocks that were copied as part of the snapshot creation which we check at the top of ffs_copyonwrite() to avoid needing to look them up in the snapshot vnode. The table is never modified after the creation, so absent disk corruption it cannot get broken.

I could get fsck to recreate the list and make sure that it matches if you think that is useful or necessary. I could also check to see that the blocks have actually been copied in the snapshot (that is they point to a disk block rather than being 0 (uncopied), BLK_SNAP (owned by other snapshot), or BLK_UNCOPIED (unallocated when we were created). But like the above mentioned list of copied blocks, they are set up as part of creating the snapshot so will only get corrupted by disk corruption.

479

See above comment on line 458 about blocks copied during snapshot creation.

522

See above comment on line 458 about blocks copied during snapshot creation.

sbin/fsck_ffs/fsutil.c
471

As I understand, users' expectations (myself included) is that after fsck reported that the filesystem is clean, it is usable for safe mounting. Failures that fsck is supposed to fix include both kernel and storage corruption.

For instance, once I saw the problem where hardware RAID overwrite enough blocks with other data from the same disk, as if write request incorrectly calculated started block. As result, once cg block was destroyed. I expect that fsck would be able to do something that would give me access to the data from other cgs.

So I think that fsck_ffs must check the consistency there, and either try to correct, or try to safely remove the corrupted snapshot.

sbin/fsck_ffs/fsutil.c
471

How about I add a check-hash to the list that tracks what has been pre-copied and if the check-hash fails offer to remove the offending snapshot file?

sbin/fsck_ffs/fsutil.c
471

I would say that this is some step in the right direction, but not the ultimate solution.

Also, I suppose by removal you mean zeroing out the snapshot inode, instead of proper removal. Then, the unreferenced blocks from the snapshot are garbage-collected by fsck.

Also, I think that removing the snapshot, we need to remove all later snapshots as well, since AFAIR, blocks are not CoW if they are already owned by an earlier snapshot.

Also, please note that removing snapshot causes user data loss, which sometimes can be avoided. User might rely on the data caught by the snapshot, which is later modified (which is the reason to take the snapshot, after all).

But regardless, even with all limitations, this would be a step into better fsck handling of the corrupted snapshots.

Add verification of snapshot having made copies of its superblock and cylinder groups (new code near the end of setup.c). Checking the correctness of the existing list is less disruptive than changing the on-disk format to allow the addition of a check-hash.

Some other fixes while running various test scenarios.

Thanks for your continued feedback.

Also, I suppose by removal you mean zeroing out the snapshot inode, instead of proper removal. Then, the unreferenced blocks from the snapshot are garbage-collected by fsck.

Yes, the offending snapshot will be freed.

Also, I think that removing the snapshot, we need to remove all later snapshots as well, since AFAIR, blocks are not CoW if they are already owned by an earlier snapshot.

No other snapshots will be affected. When a snapshot is removed, every other snapshot is given the opportunity to claim a block that it is no longer claiming. Only when no snapshot wants a block is it returned to the free list.

Also, please note that removing snapshot causes user data loss, which sometimes can be avoided. User might rely on the data caught by the snapshot, which is later modified (which is the reason to take the snapshot, after all).

The only way that a snapshot is lost is when its metadata suffers disk corruption. Similar data loss occurs when any other file suffers data corruption (such as an indirect block gets smashed). Unlike ZFS which has metadata duplication, UFS loses files when their metadata is corrupted. Hence the need for backups.

But regardless, even with all limitations, this would be a step into better fsck handling of the corrupted snapshots.

At least for now fsck_ffs will prevent corrupted snapshots from bringing down a system. Practically speaking I have never seen the copy lists get corrupted though the indirect blocks can get corrupted and are handled no differently than any other file type with corrupt indirect blocks.

sbin/fsck_ffs/setup.c
252

Could this be a function? You would need to pass several parameters, including manually constructed string fragment for pwarn, and a return value to pass control to fail label outside, but IMO it is much cleaner than such unwieldy macro.

279

Could you, please, add more comments to the code.

It is not completely clear to me what is checked there. In fact, the line 239

if (*blkp++ != (chkblk))

looks like we just check that the superblock/summary/cgs lists fit into the snapshot file size.

Am I missing something?

Add comments to checksnapinfo() and its macro CHKBLKINLIST().

Document checksnapinfo().

sbin/fsck_ffs/setup.c
252

I started with this as a function and found that it was even less clear than the macro. It ends up looking a lot like the macro except that it is more convoluted. One of its parameters is daddr_t **blkp so every use of blkp has an extra level of indirection. At least in the macro it is clear when you are incrementing the pointer versus using the pointer.

279

Hopefully the added comments make it clearer what is being checked.

This revision is now accepted and ready to land.Oct 13 2022, 5:34 AM

I have not observed any new problems while testing D36491.id111589.diff

In D36491#839750, @pho wrote:

I have not observed any new problems while testing D36491.id111589.diff

Well, ... I just got this looping problem with a new test scenario and the patched fsck_ffs + this disk image: https://people.freebsd.org/~pho/diskimage.20221013.xz

Well, ... I just got this looping problem with a new test scenario and the patched fsck_ffs + this disk image: https://people.freebsd.org/~pho/diskimage.20221013.xz

It was a nasty race condition that ended up trying to double remove an entry from a hash list. Ended up with a rather scrambled list that pointed back at itself and got into an infinite loop trying to traverse the list. Now fixed.

Fixes for bugs found by Peter Holm.

Clarification of control flow in the checksnapinfo() routine.

This revision now requires review to proceed.Nov 7 2022, 11:59 PM
This revision is now accepted and ready to land.Nov 8 2022, 11:53 PM