nvme: Only reset once on attach.
ClosedPublic
Actions

Authored by imp on Sep 30 2021, 10:17 PM.

Details

Reviewers

cperciva
mav
chuck

Group Reviewers

manpages

Commits

rG50b3d57e7161: nvme: Only reset once on attach.
rG4b3da659bf62: nvme: Only reset once on attach.

Summary

The FreeBSD nvme driver has reset the nvme controller twice on attach to
address a theoretical issue assuring the hardware is in a known
state. However, exierence has shown the second reset is unnecessary and
increases the time to boot. Eliminate the second reset. Should there be
a situation when you need a second reset (for buggy or at least somewhat
out of the mainstream hardware), the hardware option NVME_2X_RESET will
restore the old behavior. Document this in nvme(4).

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

imp created this revision.Sep 30 2021, 10:17 PM

Herald added a reviewer: manpages. · View Herald TranscriptSep 30 2021, 10:17 PM

Herald added a subscriber: dab. · View Herald Transcript

imp requested review of this revision.Sep 30 2021, 10:17 PM

Harbormaster completed remote builds in B41869: Diff 96031.Sep 30 2021, 10:17 PM

imp added reviewers: cperciva, mav, chuck.Sep 30 2021, 10:18 PM

Confirmed that this boots and speeds up the boot process in EC2, by ~ 130 ms. (Looks like it's 100 ms + 15 ms + 15 ms, but I haven't checked very carefully.)

This revision is now accepted and ready to land.Sep 30 2021, 11:14 PM

I am slightly worried about the original motivation, but have no real objections.

Thinking about reducing boot time, can we also remove pause() from nvme_ctrlr_hw_reset() at least if !ctrlr->is_initialized? We know that we haven't sent anything to the controller yet, just set up queue memory and allocated the interrupt, so what are we waiting for there? If the idea is to let stray interrupt complete, then may be we could use bus_suspend_intr?

In D32241#727562, @mav wrote:

I am slightly worried about the original motivation, but have no real objections.

The reason we don't need a second reset is that the card has recently had a slot level reset done to it (or it just powered up for hot plug). So while one could have the situation in theory, in practice we'll never encounter it, even if we have a load/unload of the driver because we do a controlled shutdown.

Thinking about reducing boot time, can we also remove pause() from nvme_ctrlr_hw_reset() at least if !ctrlr->is_initialized? We know that we haven't sent anything to the controller yet, just set up queue memory and allocated the interrupt, so what are we waiting for there? If the idea is to let stray interrupt complete, then may be we could use bus_suspend_intr?

No. It's more complicated than that. some cards absolutely require a 'hands off' period there. A few require an extreme hands off period. Most won't care. Jim gave me clues years ago about the bits of the standard that had the delay values. I'm going through the standard now trying to line up delay values with advice and/or requirements in the standard so I can document the ones that are needed and move away from the ones that aren't. I have a few of the old, fussy cards as well that I'll be testing with to ensure that we don't break them (since a certain large video streaming platform has those same cards in their fleet). cperciva and I have already been talking about when and/or if we can avoid it. I'm also looking to make this state-machine driven so that we can do whatever delays we might need in parallel (if it turns out we absolutely need them in some cases).

This commit is just low hanging fruit that's easy. I suspect that much of the remaining delays can be eliminated with careful study to understand why they were put there in the first place.

After some study of the code and the standard, I think we can just drop the pause(), unconditionally.
If we're not initialized, then there's nothing to wait for.
If we have initialized, then there might be outstanding I/O. If so, then the qpair 'recovery state' will transition to WAITING when we disable, which will ignore any interrupts
(though I've not thought through the legacy case: we may need to set the interrupt mask for it).

If we go on to fail the controller, we'll cancel the outstanding I/O transactions.
If we reset the controller, the I/O transactions are cancelled and we cancel all the I/O transactions.

The standard imposes no wait times here, so it isn't needed from that perspective.

I also likely need to look at suspend/resume since that code is similar, but not quite the same.

chuck accepted this revision.Oct 1 2021, 2:38 PM

chuck added inline comments.

sys/dev/nvme/nvme_ctrlr.c
1678–1679	This comment should go as a) `nvme_attach` doesn't explain this and b) the code isn't (really) doing this.
1687	Instead of the comment above, maybe document the legacy behavior here.