Memory and PCI resources are freed with no particular order. As a result, later code access pointers/resources that are already freed.
For instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but iflib_tqg_detach() attempts to access this array. Similarly, adapter queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to access adapter queues to free PCI resources.
Details
- Reviewers
markj gallatin shurd erj olivier krzysztof.galazka_intel.com - Group Reviewers
iflib NetApp - Commits
- rG5b4cfd3f1315: iflib: Free resources in a consistent order during detach
rG859d72fa6538: iflib: Free resources in a consistent order during detach
rG38bfc6dee33b: iflib: Free resources in a consistent order during detach
Tested Intel 1G/10G/40G on NetApp platforms in error path of iflib_device_register() via failpoints.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 35581 Build 32481: arc lint + arc unit
Event Timeline
FYI, I recently committed rS368281 (also obtained from NetApp), which added a safeguard against NULL adapter queues in em_free_pci_resources(). Your patch seems to solve the problem more completely.
sys/net/iflib.c | ||
---|---|---|
5154 | You are right. We can absorb IFDI_QUEUES_FREE() into IFDI_DETACH(). May be the real intention to keep separate is to handle iflib_queues_alloc() failure. But iflib_queues_alloc() failure trickle down and eventually becomes iflib_device_register() failure, where we do IFDI_DETACH() anyway. |
sys/net/iflib.c | ||
---|---|---|
5144 | Hmm, sorry for not noticing sooner, but there is an ordering issue here. IFDI_DETACH releases the irq resources for the driver, but iflib_free_intr_mem() is responsible for releasing MSI-X vectors back to the bus. This is happening in the wrong order now, so I believe the pci_release_msi() call will fail. |
- Ensure to release MSIX after freeing IRQ resources
sys/net/iflib.c | ||
---|---|---|
5144 | That is very true. Thanks for noticing this. pci_release_msi() was returning 16 i.,e EBUSY prior to the fix. |