Page MenuHomeFreeBSD

ppt: Fix panic when configuring unavailable MSI-X vector
AcceptedPublic

Authored by krzysztof.galazka_intel.com on Feb 3 2025, 2:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Mar 5, 4:40 AM
Unknown Object (File)
Sat, Mar 1, 8:07 PM
Unknown Object (File)
Tue, Feb 25, 7:59 AM
Unknown Object (File)
Mon, Feb 17, 8:24 PM
Unknown Object (File)
Feb 9 2025, 8:26 PM
Unknown Object (File)
Feb 8 2025, 7:03 PM

Details

Reviewers
jhb
Group Reviewers
bhyve
Summary

In some cases VM may have different idea about number
of available MSI-X vectors then PPT driver. Return
an error when VM requests setup for more vectors
than expected.

It was observed while using SR-IOV on an Intel E810 Ethernet adapter.
VF driver in a VM sees a correct number of available MSI-X vectors,
which depends on num-queues assigned in iovctl.conf, while
pci_msix_count in the PPT driver always returns 1.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 62218
Build 59102: arc lint + arc unit

Event Timeline

I'm not sure why pci_msix_count returns incorrect number. pciconf output looks fine:

ppt0@pci0:216:0:8:      class=0x020000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x1889 subvendor=0x8086 subdevice=0x0000
    cap 11[70] = MSI-X supports 5 messages
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
jhb added a subscriber: jhb.

I believe pci_msix_count() is returning a cached value from when the VF was first created in pci_read_cap() called from pci_fill_devinfo() called from pci_add_iov_child() called from PCI_CREATE_IOV_CHILD. If the MSI-X count is then adjusted by the driver in the PCI_IOV_ADD_VF callback, the cached value will be wrong.

I think the fix for that is probably we should not be caching the msi_ctrl and msix_ctrl registers except across suspend/resume and should otherwise be reading the values from the registers each time pci_count_* is called.

The bounds check is a good fix regardless.

This revision is now accepted and ready to land.Feb 4 2025, 3:48 PM
In D48812#1113788, @jhb wrote:

I believe pci_msix_count() is returning a cached value from when the VF was first created in pci_read_cap() called from pci_fill_devinfo() called from pci_add_iov_child() called from PCI_CREATE_IOV_CHILD. If the MSI-X count is then adjusted by the driver in the PCI_IOV_ADD_VF callback, the cached value will be wrong.

Yup, that's exactly the case. MSI-X count for VFs of E800 adapters is set in PCI_IOV_ADD_VF callback according to number of assigned queues.

I think the fix for that is probably we should not be caching the msi_ctrl and msix_ctrl registers except across suspend/resume and should otherwise be reading the values from the registers each time pci_count_* is called.

I tried something simpler and added:

pci_read_cap(device_get_parent(bus), &vfinfo->cfg);

call in pci_iov.c after PCI_IOV_ADD_VF, and it seems to work. Do you think it's worth to use that as a temporary solution? I can submit a patch.

In D48812#1113788, @jhb wrote:

I believe pci_msix_count() is returning a cached value from when the VF was first created in pci_read_cap() called from pci_fill_devinfo() called from pci_add_iov_child() called from PCI_CREATE_IOV_CHILD. If the MSI-X count is then adjusted by the driver in the PCI_IOV_ADD_VF callback, the cached value will be wrong.

Yup, that's exactly the case. MSI-X count for VFs of E800 adapters is set in PCI_IOV_ADD_VF callback according to number of assigned queues.

I think the fix for that is probably we should not be caching the msi_ctrl and msix_ctrl registers except across suspend/resume and should otherwise be reading the values from the registers each time pci_count_* is called.

I tried something simpler and added:

pci_read_cap(device_get_parent(bus), &vfinfo->cfg);

call in pci_iov.c after PCI_IOV_ADD_VF, and it seems to work. Do you think it's worth to use that as a temporary solution? I can submit a patch.

I'm worried that this might have other side effects for things like EA caps. However, I think it's also true that caching the MSI counts is probably wrong in general. It's not just VFs, in theory writing to some other register in a PF can change the value of the MSI control registers and we should just always read them.

In D48812#1114102, @jhb wrote:

I'm worried that this might have other side effects for things like EA caps. However, I think it's also true that caching the MSI counts is probably wrong in general. It's not just VFs, in theory writing to some other register in a PF can change the value of the MSI control registers and we should just always read them.

I put a draft on GIthub: https://github.com/freebsd/freebsd-src/pull/1593
I left msix_msgnum in the struct to let pci_alloc_msix_method use it. I'm not sure if it is safe to assume that the count won't decrease between pci_msix_count and pci_alloc_msix calls, but current implementation also does not care about such case.