HomeFreeBSD

zed: Take no action on scrub/resilver checksum errors

Description

zed: Take no action on scrub/resilver checksum errors

When scrubbing/resilvering a pool it can be counter productive to
cancel the scan and kick of a replace operation to a hot spare
when encountering checksum errors. In this case, the best course
of action is to allow the scrub/resilver to complete as quickly
as possible and to keep the vdevs fully online if possible.

Realistically, this is less of an issue for a RAIDZ since a
traditional resilver must be used and checksums will be verified.
However, this is not the case for a mirror or dRAID pool which is
sequentially resilvered and checksum verification is deferred
until after the replace operation completes.

Regardless, we apply this policy to all pool types since it's
a good idea for all vdevs. Degrading additional vdevs has the
potential to make a bad situation worse. Note the checksum
errors will still be reported as both an event and by
zpool status. This change only prevents the ZED from
proactively taking any action.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13499

Details

Provenance
Brian Behlendorf <behlendorf1@llnl.gov>Authored on May 24 2022, 4:36 PM
GitHub <noreply@github.com>Committed on May 24 2022, 4:36 PM
Parents
rG2cd0f98f4aae: Verify BPs in spa_load_verify_cb() and dsl_scan_visitbp()
Branches
Unknown
Tags
Unknown

Event Timeline

GitHub <noreply@github.com> committed rGcf70c0f8ae01: zed: Take no action on scrub/resilver checksum errors (authored by Brian Behlendorf <behlendorf1@llnl.gov>).May 24 2022, 4:36 PM