Page MenuHomeFreeBSD

mpr/mps: Fix a race in diagnostic reset
ClosedPublic

Authored by imp on Jan 24 2022, 8:39 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 19, 6:41 PM
Unknown Object (File)
Fri, Oct 18, 8:06 PM
Unknown Object (File)
Oct 5 2024, 12:19 PM
Unknown Object (File)
Oct 3 2024, 1:06 PM
Unknown Object (File)
Oct 2 2024, 4:01 AM
Unknown Object (File)
Sep 19 2024, 5:16 PM
Unknown Object (File)
Sep 17 2024, 6:12 PM
Unknown Object (File)
Sep 17 2024, 6:52 AM
Subscribers
None

Details

Summary

There's a small race in freezing the simq when performing a diagnostic
reset. During this time, a transaction can slip through and encounter
the target id of 0. If we're still in diagnostic reset when we detect
this, don't say the device isn't there. Instead, freeze the queue and
return a requeue status, similar to what we do when we're resetting
a target and a transaction get here.

Sponsored by: Netflix

Test Plan

This race would be hit in about 1-2% of diagnostic rests, though some scenarios are more likely than others.
A heavily loaded system with a lsiutil induced diag reset would see it, but at a much lower rate.
Some IOC Fault-caused resets would set a hit rate of closer to 10%.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Jan 24 2022, 8:39 PM
imp added reviewers: scottl, ken, mav.

Yes, unfortunately separation of SIM and queue locks created this race window.

This revision is now accepted and ready to land.Jan 25 2022, 2:10 AM
This revision was automatically updated to reflect the committed changes.