Page MenuHomeFreeBSD

Delay GEOM disk_create() until CAM periph probe complete
ClosedPublic

Authored by mav on Jul 12 2022, 3:26 PM.
Tags
Referenced Files
F108469826: D35784.id108191.diff
Sat, Jan 25, 5:07 AM
Unknown Object (File)
Thu, Jan 23, 6:59 PM
Unknown Object (File)
Thu, Jan 23, 8:06 AM
Unknown Object (File)
Wed, Jan 22, 2:54 AM
Unknown Object (File)
Tue, Jan 21, 1:51 PM
Unknown Object (File)
Mon, Jan 20, 3:28 AM
Unknown Object (File)
Mon, Jan 13, 3:22 PM
Unknown Object (File)
Fri, Jan 10, 11:53 PM
Subscribers

Details

Summary

Before this patch CAM periph drivers called both disk_alloc() and disk_create() same time on periph creation. But they prevented disks from opening until the periph probe completion with cam_periph_hold(). As result, especially if disk misbehaves during the probe, GEOM event thread triggered to taste the disk is blocked on open attempt, potentially for a long time, and is unable to process other events.

This patch moves disk_create() call from periph creation to the end of the probe. To allow disk_create() calls from non-sleepable CAM contexts some of its duties requiring memory allocations are moved either back to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and disk_add_alias() are the only disk methods that require sleeping. If disk fail during the probe disk_create() may just be skipped, going directly to disk_destroy(). Other method calls during that time are just ignored. Since GEOM may now see the disks after CAM bus scan is already completed, introduce per-periph boot hold functions. Enclosure driver already had such mechanism, so just generalize it.

Test Plan

Without this change HDD failing to spin-up and constantly attaching, detaching and causing timeouts blocks GEOM event thread for 30-60 seconds at a time. With this change probe still delays boot, but after that or in case of hot-plug GEOM is not affected.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mav requested review of this revision.Jul 12 2022, 3:26 PM
mav edited the summary of this revision. (Show Details)

Remove unneeded now disk_attr_changed() call.

I generally like this, even though I had a bunch of nit-picky comments and questions.
I only see one real issue for sure, but maybe a second race (but maybe not, I'm asking to make sure it can't happen).

sys/cam/ata/ata_da.c
1849

The reference is actually dropped in adaprobedone which we always call to terminate the state machine, even when a peripheral is invalidated during probe (the peripheral won't go away because of this reference). It's that reference that causes us to always call adaprobedone and that's why the root hold below will always have a root release if the probe finishes.... but that may take a few minutes if the SET FEATUREs for read ahead and write cache repeatedly timeout. That's likely OK since it is bounded.

1967
  • HERE *** we call adaprobedone() w/o calling cam_periph_hold_boot() which will then call cam_periph_release_boot()
1971

This worries me a little... If a drive misbehaves at boot, what bounds this hold? The answer is the state machine.
It might make sense to address my comment above with a comment 'dropped in adaprobedone()' or similar here.
That's enough of a bread-crumb comment to let people figure it out w/o it being too verbose.

2702

Cut and paste w/o adjustment ehre? I think we want 'adadiskgonecb()'

2707

In the case we never probe, we never acquire the boot hold, so we'd release something we didn't acquire. (see *HERE* comment above). Is this OK?

sys/cam/nvme/nvme_da.c
889

So no root hold/release here because we don't have a state machine to wait for? Correct?

sys/cam/scsi/scsi_cd.c
737
/* Released on transition to CD_STATE_NORMAL */

maybe?

sys/cam/scsi/scsi_da.c
2997
/* Released in daprobedone() */
sys/geom/geom_disk.c
905

Is it possible for cevent and devent to race? Or are geom events guaranteed to be in order?

sys/cam/ata/ata_da.c
1967

Good catch. Thanks. I'll move cam_periph_hold_boot() before this.

1971

I was thinking about it too. Console will print periodically "Root mount waiting for: ada" until state machine resolve it. I think it should be sufficient pointer. It would be nicer to add unit number there, but that needs some buffer for that string, which we don't have right now. Same problem we have about bus names. We could fix that separately.

2702

Yea. From da obviously. Fixed.

2707

I think it is not fatal, but I've made it match.

sys/cam/nvme/nvme_da.c
889

Yep.

sys/cam/scsi/scsi_cd.c
737

Added different comment: "Released after probe when disk_create() call pass it to GEOM."

sys/geom/geom_disk.c
905

There is only one event thread in GEOM, so always in order.

Update after Warner's comments.

mav edited the summary of this revision. (Show Details)

Move softc->state = NDA_STATE_NORMAL; before disk_create().

sys/cam/scsi/scsi_da.c
2874–2875

This can only happen if the periph has been marked invalid. Is that possible at this point, so early in register?

sys/geom/geom_disk.c
905

OK. Then I think this is likely safe. I don't see any way it can be weird, but geom has occasionally surprised me...

sys/cam/scsi/scsi_da.c
2874–2875

I don't believe it is. The periph was just created and the periph lock was never dropped yet. cam_periph_invalidate() also require the periph lock. Though it is your comment. ;) I'll remove it.

Remove incorrect comment.

sys/cam/scsi/scsi_da.c
2874–2875

Indeed. I'd had a good reason for adding it, but can't think of what it was now... most likely I was confused and just learning CAM and jumped to an unwarranted conclusion...

At this point, I'm only worried about da's state machine holding up boot now where it didn't used to do so... And I'm not sure, exactly, how to measure that worry ...

sys/cam/ata/ata_da.c
1971

For ada it will be fine since the state machine is simple and the bound is around 2 minutes if I'm reading everything correctly (2 tries 30 seconds 2 different commands). For da's state machine it may be longer, though since it's more complex. I'm sure we'll see it if there are any problems when this lands as we have a lot of disks that are at best described as pathological...

Without this it does not explicitly hold boot, but it blocks GEOM event thread on taste attempt, which also blocks boot, but also blocks most of GEOM commands.

In D35784#812281, @mav wrote:

Without this it does not explicitly hold boot, but it blocks GEOM event thread on taste attempt, which also blocks boot, but also blocks most of GEOM commands.

Oh yea. I meant to tick 'approved' box since I know it makes things better enough that it's worth the risk

This revision is now accepted and ready to land.Jul 12 2022, 11:09 PM