Page MenuHomeFreeBSD

nvme: avoid callout_reset_on in early boot
AbandonedPublic

Authored by kevans on Oct 29 2021, 9:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 14, 9:47 PM
Unknown Object (File)
Oct 1 2024, 5:39 PM
Unknown Object (File)
Sep 24 2024, 12:01 PM
Unknown Object (File)
Sep 19 2024, 9:58 AM
Unknown Object (File)
Sep 9 2024, 2:16 AM
Unknown Object (File)
Sep 8 2024, 11:24 PM
Unknown Object (File)
Sep 6 2024, 1:39 AM
Unknown Object (File)
Sep 5 2024, 1:14 AM
Subscribers

Details

Summary

For !EARLY_AP_STARTUP && NUMA systems, this may result in a callout on
a non-boot cpu before we're prepared for it.

This fixes my two-domain setup on arm64.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 42444
Build 39332: arc lint + arc unit

Event Timeline

So I still don't understand exactly why it's problematic to schedule a callout on an AP before SI_SUB_SMP. Is that not supposed to work? I wonder if PSR_DAIF_DEFAULT should include PSR_I? It's like interrupts are being enabled during cpu_throw.

So I still don't understand exactly why it's problematic to schedule a callout on an AP before SI_SUB_SMP. Is that not supposed to work? I wonder if PSR_DAIF_DEFAULT should include PSR_I? It's like interrupts are being enabled during cpu_throw.

My follow-up question is if it's generally OK for the callout to be delayed all the way until SI_SUB_SMP, if there's a more appropriate fix.

As noted, this change is wrong.

sys/dev/nvme/nvme_qpair.c
1072

It's OK, I guess, as a short-term hack, but we should be able to schedule a timeout for anytime. It doesn't matter, in this case, that the timeout may be delayed, so long as the CPU starts. This timeout is a backstop against commands on NVMe cards never completing.

What's the traceback for this call?

sys/dev/nvme/nvme_qpair.c
1072

I'll force a panic here later tonight and check.

sys/dev/nvme/nvme_qpair.c
1072
nvme_qpair_submit_tracker() at nvme_qpair_submit_tracker+0x19c                  
bus_dmamap_load() at bus_dmamap_load+0xf8                                       
_nvme_qpair_submit_request() at _nvme_qpair_submit_request+0x178                
nvme_qpair_submit_request() at nvme_qpair_submit_request+0x40                   
nvme_ctrlr_identify() at nvme_ctrlr_identify+0x3c                               
nvme_ctrlr_start_config_hook() at nvme_ctrlr_start_config_hook+0x50             
run_interrupt_driven_config_hooks() at run_interrupt_driven_config_hooks+0x90   
boot_run_interrupt_driven_config_hooks() at boot_run_interrupt_driven_config_hooks+0x2c
mi_startup() at mi_startup+0x12c                                                
virtdone() at virtdone+0x6c
sys/dev/nvme/nvme_qpair.c
1072

So the APs aren't started by the time we run intrhooks? That seems wrong to me...

1072

And thanks...

sys/dev/nvme/nvme_qpair.c
1072

Yeah -- for !x86:

160         SI_SUB_INT_CONFIG_HOOKS = 0xa800000,    /* Interrupts enabled config */ 
...
171 #ifndef EARLY_AP_STARTUP                                                        
172         SI_SUB_SMP              = 0xf000000,    /* start the APs*/              
173 #endif                                                                          
...

I managed to get this root-caused tonight: D32797 -- abandon ship!