HomeFreeBSD

Fix concurrent resilvers initiated at same time

Description

Fix concurrent resilvers initiated at same time

For draid vdevs it was possible to initiate both the
sequential and healing resilver at same time.

This fixes the following two scenarios.

  1. There's a window where a sequential rebuild can

be started via ZED even if a healing resilver has been
scheduled.

  • This is fixed by adding additional check in

spa_vdev_attach() for any scheduled resilver and return
appropriate error code when a resilver is already in
progress.

  1. It was possible for zpool clear to start a healing

resilver when it wasn't needed at all. This occurs because
during a vdev_open() the device is presumed to be healthy not
until the device is validated by vdev_validate() and it's set
unavailable. However, by this point an async resilver will
have already been requested if the DTL isn't empty.

  • This is fixed by cancelling the SPA_ASYNC_RESILVER

request immediately at the end of vdev_reopen() when a resilver
is unneeded.

Finally, added a testcase in ZTS for verification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #14881
Closes #14892

Details

Provenance
Akash B <akash-b@hpe.com>Authored on May 24 2023, 7:28 PM
GitHub <noreply@github.com>Committed on May 24 2023, 7:28 PM
Parents
rGf8447cf22ec3: Linux 6.4 compat: reclaimed_slab renamed to reclaimed
Branches
Unknown
Tags
Unknown

Event Timeline

GitHub <noreply@github.com> committed rG9d618615d1ed: Fix concurrent resilvers initiated at same time (authored by Akash B <akash-b@hpe.com>).May 24 2023, 7:28 PM