growfs: avoid deadlock between stdout/stderr and ufssuspend
Needs ReviewPublic
Actions

Authored by cgull_glup.org on Dec 30 2022, 8:52 AM.

Details

Reviewers

mckusick
kib
jkim

Summary

Code to mostly prevent this deadlock in kernel has already been
committed as 701b3696; this userland code provides earlier, better
error messages for this problem.

Do not keep UFS suspended while writing new cylinder groups.

Since the new cylinder groups are outside the existing filesystem, UFS
does not need to be suspended while writing them. We can write the
new cylinder groups first, before editing the existing filesystem.

THis has two benefits: Most of the writes (and potential I/O errors)
are done before editing the existing filesystem, and it limits the
maximum suspended time to one cylinder group's worth of I/O. This is
quite significant for filesystems that have a lot of new cylinder
groups to write.

Unfortunately, to get write access to the filesystem's volume, we
still need to suspend UFS via the /dev/ufssuspend interface, even for
blocks beyond the existing filesystem. Do it repeatedly for each
cylinder group. This means the accumulated suspended time remains
the same, or maybe even grows, if filesystem reloads are costly.
Kernel changes in ufssuspend will be needed to remove this lmitations.

Inmprove growfs/ufssuspend to write new blocks without suspending UFS

Like the old code, this prevents unmounts while growfs is running;
this is now technically only required at the very end when growfs
rewrites what used to be the last cg and the superblock, but it
provides a better UX if unmounts are attempted during growfs.

This work is forward and backward compatible: a new growfs will work
with an unchanged kernel, and vice versa.

Test Plan

Actual growfs functionality has been tested by stubbing out
non-deterministic functions and comparing filesystems created by old
vs new growfs, iterating over a range of filesystem sizes that spans
several cylinder groups.

New growfs + unchanged kernel, new kernel + unchanged growfs

Not much testing of error handling or concurrency problems yet

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 48911
Build 45800: arc lint + arc unit

Event Timeline

cgull_glup.org created this revision.Dec 30 2022, 8:52 AM

Herald added a subscriber: imp. · View Herald TranscriptDec 30 2022, 8:52 AM

cgull_glup.org requested review of this revision.Dec 30 2022, 8:52 AM

Harbormaster completed remote builds in B48911: Diff 114649.Dec 30 2022, 8:52 AM

This code supposedly further avoids growfs deadlocks, and may occasionally be able to grow a UFS filesystem mostly without blocking writes to the affected filesystem.

This code is lightly tested, and I've not yet fully run through style(9) changes. It's also my first time using Phabricator.

Comments welcome!

cgull_glup.org added inline comments.Dec 30 2022, 9:24 AM

sbin/growfs/growfs.c
88–90	Oops-- forgotten devel code, will be deleted
177–198	This is moved to run after writing new cylinder groups
1595–1602	Check for deadlock on stdout/stderr by checking filesystem ids
sys/ufs/ffs/ffs_suspend.c
254	`ffs_susp_resume()` is a refactoring to allow an ioctl to downgrade from a suspended state to a writable state without closing /dev/ufssuspend completely
287	The VFS lock now follows the presence/absence of the cdev privdata, rather than its previous implicit association with the filesystem being suspended.
sys/ufs/ffs/fs.h
904	Naming is Hard. This started as `UFSAPPEND`, `UFSEXTEND` is better, but better names will be grafefully adopted

cgull_glup.org added inline comments.Dec 30 2022, 9:34 AM

sys/ufs/ffs/ffs_suspend.c
326	Misuse of an unrelated variable-- fortunately I noticed it before having to debug it

What if the filesystem is already suspended at the moment when the UFSEXTEND call is done?

I suggest changing the API. Add some kind of UFSOPEN (or might be UFSSUSPBIND ?) which associates mp with the ufs/suspend file. It would always allocate cdevpriv data for file, and busy the mp. Successful bind allows writes to the unused part of the volume, through the file. Then you can do UFSSUSPEND/UFSRESUME, which are tracked in the cdevpriv, so you do not rely on the ffs_susp_suspended() alone.

Might be add UFSSUSPUNBIND for completeness, which resumes if needed, and unbusies the mp.

sbin/growfs/growfs.c
1816	Why this check is needed?
1826	Same question.
sys/ufs/ffs/ffs_suspend.c
118	Why do you need this local var?
326	mp is NULL there, no?
383	This is the only place in the switch control flow that started do return. As result, it complicates (unneeded) lock scope analysis for ffs_susp_lock, and liveness for mp.