HomeFreeBSD

deadlock between spa_errlog_lock and dp_config_rwlock

Description

deadlock between spa_errlog_lock and dp_config_rwlock

There is a lock order inversion deadlock between spa_errlog_lock and
dp_config_rwlock:

A thread in spa_delete_dataset_errlog() is running from a sync task.
It is holding the dp_config_rwlock for writer (see
dsl_sync_task_sync()), and waiting for the spa_errlog_lock.

A thread in dsl_pool_config_enter() is holding the spa_errlog_lock
(see spa_get_errlog_size()) and waiting for the dp_config_rwlock (as
reader).

Note that this was introduced by #12812.

This commit address this by defining the lock ordering to be
dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock.
spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this
order, and then process_error_block() and get_head_and_birth_txg() can
verify that the dp_config_rwlock is already held.

Additionally, a buffer overrun in spa_get_errlog() is corrected. Many
code paths didn't check if *count got to zero, instead continuing to
overwrite past the beginning of the userspace buffer at uaddr.

Tested by having some errors in the pool (via `zinject -t data
/path/to/file`), one thread running zpool iostat 0.001, and another
thread runs zfs destroy (in a loop, although it hits the first time).
This reproduces the problem easily without the fix, and works with the
fix.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14239
Closes #14289

Details

Provenance
mahrensAuthored on Dec 22 2022, 7:48 PM
GitHub <noreply@github.com>Committed on Dec 22 2022, 7:48 PM
Parents
rG29e1b089c14b: Documentation corrections
Branches
Unknown
Tags
Unknown