Page MenuHomeFreeBSD

bhyve: Avoid holding /dev/pci open unnecessarily
AcceptedPublic

Authored by markj on Mon, Feb 10, 3:18 AM.

Details

Reviewers
corvink
jhb
Group Reviewers
bhyve
Summary

Some device models will call pci_host_read_config() when probing for
devices. Currently this results in pcifd_init() opening /dev/pci, and
thus bhyve holds the fd open even when it's not needed.

Modify pci_host_{read,write}_config() to use the global pcifd only if
it's already open, otherwise just open the device and close it
immediately after we're done. This works fine so long as bhyve is still
initializing and isn't yet in capability mode, which appears to be the
case for existing calls that aren't in the passthru code.

Fixes: 563fd2240e13 ("bhyve: export funcs for read/write pci config")

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 62370
Build 59254: arc lint + arc unit

Event Timeline

markj requested review of this revision.Mon, Feb 10, 3:18 AM

Would it be simpler to just avoid using pcifd at all?

In D48908#1115607, @jhb wrote:

Would it be simpler to just avoid using pcifd at all?

It would be simpler, but would give these interfaces a weird quirk where they don't work after bhyve enters capability mode. I'm not sure whether that would matter today, but in general it seems like a surprising property.

In D48908#1115607, @jhb wrote:

Would it be simpler to just avoid using pcifd at all?

It would be simpler, but would give these interfaces a weird quirk where they don't work after bhyve enters capability mode. I'm not sure whether that would matter today, but in general it seems like a surprising property.

I guess I'm not quite understanding why we want this change then? That is, what are the use cases where this leaks (I guess a device model that uses this but there isn't a pptX device for the guest) and what is the case that benefits from using pcifd (a non-passthrough device model that uses these routines outside of initialization in which case this change will break that device model if a pptX device isn't also active in the guest?) It's more surprising to me that whether or not this works post-initialization depends on whether or not a passthrough device is also present in the guest. A device model that uses this either only needs it during initialization, or it needs it post-initialization always regardless of if a passthrough device is present?

In D48908#1115686, @jhb wrote:
In D48908#1115607, @jhb wrote:

Would it be simpler to just avoid using pcifd at all?

It would be simpler, but would give these interfaces a weird quirk where they don't work after bhyve enters capability mode. I'm not sure whether that would matter today, but in general it seems like a surprising property.

I guess I'm not quite understanding why we want this change then? That is, what are the use cases where this leaks (I guess a device model that uses this but there isn't a pptX device for the guest) and what is the case that benefits from using pcifd (a non-passthrough device model that uses these routines outside of initialization in which case this change will break that device model if a pptX device isn't also active in the guest?) It's more surprising to me that whether or not this works post-initialization depends on whether or not a passthrough device is also present in the guest. A device model that uses this either only needs it during initialization, or it needs it post-initialization always regardless of if a passthrough device is present?

In D48908#1115686, @jhb wrote:
In D48908#1115607, @jhb wrote:

Would it be simpler to just avoid using pcifd at all?

It would be simpler, but would give these interfaces a weird quirk where they don't work after bhyve enters capability mode. I'm not sure whether that would matter today, but in general it seems like a surprising property.

I guess I'm not quite understanding why we want this change then? That is, what are the use cases where this leaks (I guess a device model that uses this but there isn't a pptX device for the guest) and what is the case that benefits from using pcifd (a non-passthrough device model that uses these routines outside of initialization in which case this change will break that device model if a pptX device isn't also active in the guest?) It's more surprising to me that whether or not this works post-initialization depends on whether or not a passthrough device is also present in the guest. A device model that uses this either only needs it during initialization, or it needs it post-initialization always regardless of if a passthrough device is present?

The problem is that there are some non-passthrough device models which use these routines during initialization (but not after). Specifically, pci_lpc_init() via pci_lpc_get_sel(). It does that since commit f4ceaff56ddaa, since we apparently need the LPC device IDs when passing through an intel GPU. However, the implementation means that we always leave /dev/pci open, including when there's no passthrough device, and I'd like to fix that.

I had thought there were other device models which do this, but it's just LPC. We could instead modify it to use the host IDs only when a GPU is passed through, but that's kind of fragile: any call to pci_host_config_read() means that the pcifd is leaked. This way, we keep the global pcifd open only so long as passthru_init() is called.

In D48908#1115686, @jhb wrote:
In D48908#1115607, @jhb wrote:

Would it be simpler to just avoid using pcifd at all?

It would be simpler, but would give these interfaces a weird quirk where they don't work after bhyve enters capability mode. I'm not sure whether that would matter today, but in general it seems like a surprising property.

I guess I'm not quite understanding why we want this change then? That is, what are the use cases where this leaks (I guess a device model that uses this but there isn't a pptX device for the guest) and what is the case that benefits from using pcifd (a non-passthrough device model that uses these routines outside of initialization in which case this change will break that device model if a pptX device isn't also active in the guest?) It's more surprising to me that whether or not this works post-initialization depends on whether or not a passthrough device is also present in the guest. A device model that uses this either only needs it during initialization, or it needs it post-initialization always regardless of if a passthrough device is present?

The problem is that there are some non-passthrough device models which use these routines during initialization (but not after). Specifically, pci_lpc_init() via pci_lpc_get_sel(). It does that since commit f4ceaff56ddaa, since we apparently need the LPC device IDs when passing through an intel GPU. However, the implementation means that we always leave /dev/pci open, including when there's no passthrough device, and I'd like to fix that.

I had thought there were other device models which do this, but it's just LPC. We could instead modify it to use the host IDs only when a GPU is passed through, but that's kind of fragile: any call to pci_host_config_read() means that the pcifd is leaked. This way, we keep the global pcifd open only so long as passthru_init() is called.

That is the case I was aware of (LPC), but it is why I wonder why we can't just always open /dev/pci in these routines and never use pcifd? The idea would be that they only ever work during initialization. If we ever needed them to work post-init, then we would revert that change so that they always use the global pcifd. Mixing the two is what I find a bit odd in that it provides inconsistent semantics.

Provide two wrappers for pci_host_{read,write}_config(): one which
is externally visible and always opens the pci device node, intended
for use only during initialization, and one for internal use in the
passthru code which uses the global pcifd.

In D48908#1116148, @jhb wrote:
In D48908#1115686, @jhb wrote:
In D48908#1115607, @jhb wrote:

Would it be simpler to just avoid using pcifd at all?

It would be simpler, but would give these interfaces a weird quirk where they don't work after bhyve enters capability mode. I'm not sure whether that would matter today, but in general it seems like a surprising property.

I guess I'm not quite understanding why we want this change then? That is, what are the use cases where this leaks (I guess a device model that uses this but there isn't a pptX device for the guest) and what is the case that benefits from using pcifd (a non-passthrough device model that uses these routines outside of initialization in which case this change will break that device model if a pptX device isn't also active in the guest?) It's more surprising to me that whether or not this works post-initialization depends on whether or not a passthrough device is also present in the guest. A device model that uses this either only needs it during initialization, or it needs it post-initialization always regardless of if a passthrough device is present?

The problem is that there are some non-passthrough device models which use these routines during initialization (but not after). Specifically, pci_lpc_init() via pci_lpc_get_sel(). It does that since commit f4ceaff56ddaa, since we apparently need the LPC device IDs when passing through an intel GPU. However, the implementation means that we always leave /dev/pci open, including when there's no passthrough device, and I'd like to fix that.

I had thought there were other device models which do this, but it's just LPC. We could instead modify it to use the host IDs only when a GPU is passed through, but that's kind of fragile: any call to pci_host_config_read() means that the pcifd is leaked. This way, we keep the global pcifd open only so long as passthru_init() is called.

That is the case I was aware of (LPC), but it is why I wonder why we can't just always open /dev/pci in these routines and never use pcifd? The idea would be that they only ever work during initialization. If we ever needed them to work post-init, then we would revert that change so that they always use the global pcifd. Mixing the two is what I find a bit odd in that it provides inconsistent semantics.

I did it that way since the passthru code also uses these routines internally. My latest patch introduces separate helpers for the two cases, so we get consistent semantics at the expense of a bit more complexity.

Ah, ok, I guess I missed the detail about the dual use of the API.

This revision is now accepted and ready to land.Tue, Feb 11, 4:28 PM