
Improve resilver ETAs


Improve resilver ETAs

When resilvering the estimated time remaining is calculated using
the average issue rate over the current pass. Where the current
pass starts when a scan was started, or restarted, if the pool
was exported/imported.

For dRAID pools in particular this can result in wildly optimistic
estimates since the issue rate will be very high while scanning
when non-degraded regions of the pool are scanned. Once repair
I/O starts being issued performance drops to a realistic number
but the estimated performance is still significantly skewed.

To address this we redefine a pass such that it starts after a
scanning phase completes so the issue rate is more reflective of
recent performance. Additionally, the zfs_scan_report_txgs
module option can be set to reset the pass statistics more often.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14410


Brian Behlendorf <behlendorf1@llnl.gov>Authored on Jan 25 2023, 7:28 PM
rGa68dfdb88c88: Fix "Detach spare vdev in case if resilvering does not happen"

Event Timeline

Brian Behlendorf <behlendorf1@llnl.gov> committed rG9fe3da9364fe: Improve resilver ETAs (authored by Brian Behlendorf <behlendorf1@llnl.gov>).Apr 24 2023, 7:55 PM