deadlock between spa_errlog_lock and dp_config_rwlock
There is a lock order inversion deadlock between spa_errlog_lock and
dp_config_rwlock:
A thread in spa_delete_dataset_errlog() is running from a sync task.
It is holding the dp_config_rwlock for writer (see
dsl_sync_task_sync()), and waiting for the spa_errlog_lock.
A thread in dsl_pool_config_enter() is holding the spa_errlog_lock
(see spa_get_errlog_size()) and waiting for the dp_config_rwlock (as
reader).
Note that this was introduced by #12812.
This commit address this by defining the lock ordering to be
dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock.
spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this
order, and then process_error_block() and get_head_and_birth_txg() can
verify that the dp_config_rwlock is already held.
Additionally, a buffer overrun in spa_get_errlog() is corrected. Many
code paths didn't check if *count got to zero, instead continuing to
overwrite past the beginning of the userspace buffer at uaddr.
Tested by having some errors in the pool (via `zinject -t data
/path/to/file`), one thread running zpool iostat 0.001, and another
thread runs zfs destroy (in a loop, although it hits the first time).
This reproduces the problem easily without the fix, and works with the
fix.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14239
Closes #14289