ZFS has a per-mountpoint read-mostly "teardown lock" which is acquired
in read mode by most VOPs. It is used to suspend filesystem operations
in preparation for some dataset-level operation like a rollback
(reverting a filesystem to an earlier snapshot). In particular, the ZFS
VOP_GETPAGES implementation acquires this lock.
When rolling back a dataset, ZFS invalidates all file data resident in
the page cache, as a rollback can cause a file's contents to change and
we don't want to let stale data linger. To do this, it calls
vn_page_remove() on each vnode associated with the mountpoint. This
introduces a lock order reversal: to handle a page fault we busy vnode
pages before calling VOP_GETPAGES, and during rollback we busy vnode
pages in vm_object_page_remove() with the teardown lock held.
Resolve the deadlock by exploiting the fact that rollback only needs to
purge valid pages: invalid pages need not be purged by definition, and
then a busy lock holder of invalid ZFS vnode pages can safely block on
the teardown lock. This assumes that we will not pass valid pages to
VOP_GETPAGES via vm_pager_get_pages(). Since ZFS is solely responsible
for marking vnode pages valid, I believe it is a safe assumption.
Add a new mode to vm_object_page_remove() to skip over invalid pages.
Use it when rolling back a dataset.
PR: 258208