Page MenuHomeFreeBSD

AMD-vi: Fix IOMMU device interrupts being overridden
ClosedPublic

Authored by khng on Feb 28 2021, 10:23 AM.
Tags
None
Referenced Files
F108518791: D28984.id84840.diff
Sat, Jan 25, 8:14 PM
F108517381: D28984.id85045.diff
Sat, Jan 25, 8:00 PM
F108514787: D28984.id85044.diff
Sat, Jan 25, 7:40 PM
F108514620: D28984.id84838.diff
Sat, Jan 25, 7:38 PM
F108513764: D28984.id86126.diff
Sat, Jan 25, 7:30 PM
Unknown Object (File)
Fri, Jan 24, 6:54 PM
Unknown Object (File)
Wed, Jan 22, 10:35 PM
Unknown Object (File)
Wed, Jan 22, 9:39 PM

Details

Summary

Currently, AMD-vi PCI-e passthrough will lead to the following lines in
dmesg:
"kernel: CPU0: local APIC error 0x40
ivhd0: Error: completion failed tail:0x720, head:0x0."

After some tracing, the problem is due to the interaction with
amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the
identification of AMD-vi IVHD is done by walking over the ACPI IVRS
table and ivhdX device_ts are added under the acpi bus, while there are
no driver handling the corresponding IOMMU PCI function. In
amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX
device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is
called on ivhdX. the IOMMU pci function device_t is only used for
pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci
function, the IOMMU PCI function device_t's dinfo->cfg.msi is never
updated to reflect the supposed msi_data and msi_addr. So the msi_data
and msi_addr stay in the value 0. When pci_driver_added() tried to loop
over the children of a pci bus, and do pci_cfg_restore() on each of
them, msi_addr and msi_data with value 0 will be written to the MSI
capability of the IOMMU pci function, thus explaining the errors in
dmesg.

This change includes an amdiommu driver which currently does attaching,
detaching and providing DEVMETHODs for setting up and tearing down
interrupt. The purpose of the driver is to prevent pci_driver_added()
from calling pci_cfg_restore() on the IOMMU PCI function device_t.
The introduction of the amdiommu driver handles allocation of an IRQ
resource within the IOMMU PCI function, so that the dinfo->cfg.msi is
populated.

This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU.

Submitted by: Ka Ho Ng <khng@freebsdfoundation.org>
Sponsored by: The FreeBSD Foundation
MFC after: 7 days

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 37528
Build 34417: arc lint + arc unit

Event Timeline

khng requested review of this revision.Feb 28 2021, 10:23 AM

Remove "All rights reserved."

There are some other directions we could take:

  • Expanding the bus KPI to skip probing the marked devices
  • pci_cfg_save_msi/msix remembering the msi/msix.
  • Adding a new KPI manipulating the pci_devinfo's msi_data and msi_addr.
In D28984#648829, @khng300_gmail.com wrote:

There are some other directions we could take:

  • Expanding the bus KPI to skip probing the marked devices
  • pci_cfg_save_msi/msix remembering the msi/msix.
  • Adding a new KPI manipulating the pci_devinfo's msi_data and msi_addr.

and also:

  • pci_enable_msi remembering the address and data argument if msi_addr is missing (the value 0).

Remove the device_printf in amdiommu

So it is a "feature" that the save/restore only saves/restores the MSI bits that the PCI bus layer knows about rather than interrupts allocated behind its back. A stub driver is probably ok for now. Longer term I think what we want is @kib's work to integrate bhyve with the generic IOMMU layer (kib@'s patch does this for DMAR for Intel), and for the ivhd bits to instead become a proper IOMMU driver for AMD which would be useful both for bhyve as well as for other use cases on AMD.

sys/amd64/vmm/amd/amdvi_hw.c
795

I'm not a big fan of this. I think what I'd rather you is instead define a new kobj class that you can call and ask the PCI device to register/unregister the interrupt. All of the MSI and bus_alloc_resouce/bus_setup_intr handling would be in the stub driver instead in the implementations of the new kobj methods. You could call it ivhd_if.m and have 'ivhd_setup_intr(device_t dev, driver_intr_t handler, void *arg, const char *desc)' and 'ivhd_teardown_intr(device_t dev)'. Here you would just call IVHD_SETUP_INTR()' and fail if it fails.

  • Moved interrupt setup code to separate kobj methods in AMD IOMMU pci stub driver
  • AMD IOMMU pci stub driver will be loaded earlier than ivrs_drv
khng marked an inline comment as done.Mar 3 2021, 9:03 AM
  • Adjust the amdiommu to be the first rank of drivers to be loaded in the subsystem
  • Okay lets make amdiommu loading at default order.
  • Add base class and sub-class checking
In D28984#649317, @jhb wrote:

So it is a "feature" that the save/restore only saves/restores the MSI bits that the PCI bus layer knows about rather than interrupts allocated behind its back. A stub driver is probably ok for now. Longer term I think what we want is @kib's work to integrate bhyve with the generic IOMMU layer (kib@'s patch does this for DMAR for Intel), and for the ivhd bits to instead become a proper IOMMU driver for AMD which would be useful both for bhyve as well as for other use cases on AMD.

For now I simply spin a driver that do 1:1 matching with the IVHD and a IOMMU capability. But in future if MSI-x support become a must we need something new at the same time. I didn't do this at this time because the official spec is confusing in this regards: They said the IOMMU PCI function comes without BAR but I failed to see how MSI-x could be implemented without a BAR holding MSI-x table and PBA.

Thanks!

sys/amd64/vmm/amd/ivrs_drv.c
48–49

Sorting nit

This revision is now accepted and ready to land.Mar 3 2021, 10:36 PM

Nit: alphabetical order

This revision now requires review to proceed.Mar 4 2021, 6:18 AM

Nit: alphabetical order

A couple of very minor style nits left. Functionally looks good.

sys/amd64/vmm/amd/amdiommu.c
45

style nit: alphabetise these headers too.

108

style nit: blank line after {

115

style nit: blank line after {

sys/amd64/vmm/amd/amdiommu.c
108

This requirement seems optional in style(9) now:

static void
usage(void)
{
        /* Optional blank line goes here. */

Optionally, insert a blank line at the beginning of functions with no
local variables.  Older versions of this style document required the
blank line convention, so it is widely used in existing code.
115

The same as above.

khng marked 3 inline comments as done.Mar 22 2021, 7:34 AM

Approved by: philip (mentor)

(Sorry, I hadn't realised that space was now optional. Good catch!)

This revision is now accepted and ready to land.Mar 22 2021, 8:14 AM