Page MenuHomeFreeBSD

bhyve: Do not enable PCI BAR decoding if a boot ROM is present
ClosedPublic

Authored by markj on May 1 2024, 4:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 30, 1:08 PM
Unknown Object (File)
Sun, Oct 27, 8:20 AM
Unknown Object (File)
Sep 29 2024, 9:31 PM
Unknown Object (File)
Sep 26 2024, 9:07 AM
Unknown Object (File)
Sep 23 2024, 10:28 AM
Unknown Object (File)
Sep 21 2024, 9:49 PM
Unknown Object (File)
Sep 21 2024, 4:26 PM
Unknown Object (File)
Sep 21 2024, 4:25 PM

Details

Summary

Let the boot ROM handle BAR initialization. This fixes a problem where
u-boot's BAR remapping conflicts with some limitations in bhyve. See
https://lists.freebsd.org/archives/freebsd-virtualization/2024-April/002103.html
for a description of what goes wrong.

The old behaviour can be restored by setting the pci.enable_bars
configuration variable.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.May 1 2024, 4:36 PM

I believe it's also legal to go and map two BARs to overlap so long as you don't try to access the overlapping range when that's the case? QEMU just maintains a list of PCI BARs and goes for the first one in the list that matches. Allocating the big chunk of MMIO memory statically at the start and just maintaining the list layered on top in PCI code as you mess with BARs seems like it would be a simple, more general fix, and should perform just fine? (Though passthrough may well be "fun" as you mention)

I believe it's also legal to go and map two BARs to overlap so long as you don't try to access the overlapping range when that's the case? QEMU just maintains a list of PCI BARs and goes for the first one in the list that matches. Allocating the big chunk of MMIO memory statically at the start and just maintaining the list layered on top in PCI code as you mess with BARs seems like it would be a simple, more general fix, and should perform just fine? (Though passthrough may well be "fun" as you mention)

Yes, we could also permit overlapping BARs since u-boot (empirically) won't try to access them in the window where there's some ambiguity. It's not too hard to implement and I thought about that, but this approach is simpler and is closer to what u-boot actually expects, which is that it needs to program BARs itself. It feels a bit questionable to make bhyve more complicated to support what u-boot's doing when we can instead make bhyve do less. If there are further reasons to support overlapping MMIO regions then it's easy to revert this patch.

My mental model of what is safe is that it's allowed to move BARs around so long as you disable decoding while you do so, and I'm pretty sure we already do that now to handle FreeBSD kernels (and other OS kernels) that rewrite BARs to size them during boot. We have to avoid trying to register/unregister them while rewriting, and I thought that was driven by if the I/O space was enabled. (See how update_bar_address makes the register_bar call conditional on encoding being enabled.) It sounds like u-boot is just buggy here in that it isn't disabling decoding while it messes with the BARs. This idea is ok though. I wonder if FreeBSD/amd64 boots with this set to true. :)

usr.sbin/bhyve/pci_emul.c
881

Hmm, I would expand this if further perhaps to not bother with the pci_get_cfgdata and pci_set_cfgdata. Maybe just return early in this case?

In D45049#1027191, @jhb wrote:

My mental model of what is safe is that it's allowed to move BARs around so long as you disable decoding while you do so, and I'm pretty sure we already do that now to handle FreeBSD kernels (and other OS kernels) that rewrite BARs to size them during boot. We have to avoid trying to register/unregister them while rewriting, and I thought that was driven by if the I/O space was enabled. (See how update_bar_address makes the register_bar call conditional on encoding being enabled.) It sounds like u-boot is just buggy here in that it isn't disabling decoding while it messes with the BARs. This idea is ok though. I wonder if FreeBSD/amd64 boots with this set to true. :)

Aha. So yes, U-Boot indeed isn't disabling decoding whilst it probes and allocates the BARs. It *is* only enabling decoding for the types of BARs it allocates once it's done, but it doesn't write the initial value back first. https://source.denx.de/u-boot/u-boot/-/blob/ff0de1f0557ed7d2dab47ba976a37347a1fdc432/drivers/pci/pci_auto.c#L42

But that still doesn't help you if you move the BAR for A to overlap one of B's before you've probed it? bhyve will still get in a mess in that scenario.

So for "bare metal" cases like booting with a bootrom (e.g. UEFI) we should arguably leave bars unregistered since the raw firmware probably does that. Only the bhyveload case probably wants to enable BARs on startup. I would be fine with not using a user-facing config option but basically making the test in this patch be some kind of "is there a bootrom or not" check. Would need to test it on amd64 with UEFI.

IIRC, the default value of the busmasteren, porten and memen bits is 0 according to PCI specification. So, as jhb already mentioned, it should be valid to unset these values when booting with a bootrom.

usr.sbin/bhyve/pci_emul.c
1163–1164

We shouldn't set BUSMASTEREN as well.

In D45049#1027224, @jhb wrote:

So for "bare metal" cases like booting with a bootrom (e.g. UEFI) we should arguably leave bars unregistered since the raw firmware probably does that. Only the bhyveload case probably wants to enable BARs on startup. I would be fine with not using a user-facing config option but basically making the test in this patch be some kind of "is there a bootrom or not" check. Would need to test it on amd64 with UEFI.

I implemented this and EDK2 fails to load our loader. I get the following output before dropped into a shell:

Launching virtual machine "test" ...                                                                                                                          
fbuf frame buffer base: 0x32b086e00000 [sz 33554432]                                                                                                          
BdsDxe: loading Boot0001 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)                      
BdsDxe: starting Boot0001 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)                     
UEFI Interactive Shell v2.2                                                                                                                                   
EDK II                                                                                                                                                        
UEFI v2.70 (BHYVE, 0x00010000)                                                                                                                                
map: No mapping found.

Presumably EDK2 does rely on us having registered BARs. After staring at the EDK2 bhyve platform code for a while, I'm not sure how this is encoded.

In D45049#1027224, @jhb wrote:

So for "bare metal" cases like booting with a bootrom (e.g. UEFI) we should arguably leave bars unregistered since the raw firmware probably does that. Only the bhyveload case probably wants to enable BARs on startup. I would be fine with not using a user-facing config option but basically making the test in this patch be some kind of "is there a bootrom or not" check. Would need to test it on amd64 with UEFI.

I implemented this and EDK2 fails to load our loader. I get the following output before dropped into a shell:

Launching virtual machine "test" ...                                                                                                                          
fbuf frame buffer base: 0x32b086e00000 [sz 33554432]                                                                                                          
BdsDxe: loading Boot0001 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)                      
BdsDxe: starting Boot0001 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)                     
UEFI Interactive Shell v2.2                                                                                                                                   
EDK II                                                                                                                                                        
UEFI v2.70 (BHYVE, 0x00010000)                                                                                                                                
map: No mapping found.

Presumably EDK2 does rely on us having registered BARs. After staring at the EDK2 bhyve platform code for a while, I'm not sure how this is encoded.

It's encoded by PcdPciDisableBusEnumeration. If set to TRUE, EDK2 rely on us. If set to FALSE or unset, EDK2 will register BARs itself. For the X64 platform it was removed some time ago: https://github.com/tianocore/edk2/commit/70f3e62dc73d28962b833373246ef25c865c575e.

You could try building a debug version of EDK2 to get a more detailed log. Therefore, pass -b DEBUG and -DDEBUG_ON_SERIAL_PORT=TRUE to the build command.

In D45049#1027224, @jhb wrote:

So for "bare metal" cases like booting with a bootrom (e.g. UEFI) we should arguably leave bars unregistered since the raw firmware probably does that. Only the bhyveload case probably wants to enable BARs on startup. I would be fine with not using a user-facing config option but basically making the test in this patch be some kind of "is there a bootrom or not" check. Would need to test it on amd64 with UEFI.

I implemented this and EDK2 fails to load our loader. I get the following output before dropped into a shell:

Launching virtual machine "test" ...                                                                                                                          
fbuf frame buffer base: 0x32b086e00000 [sz 33554432]                                                                                                          
BdsDxe: loading Boot0001 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)                      
BdsDxe: starting Boot0001 "EFI Internal Shell" from Fv(7CB8BDC9-F8EB-4F34-AAEA-3EE4AF6516A1)/FvFile(7C04A583-9E3E-4F1C-AD65-E05268D0B4D1)                     
UEFI Interactive Shell v2.2                                                                                                                                   
EDK II                                                                                                                                                        
UEFI v2.70 (BHYVE, 0x00010000)                                                                                                                                
map: No mapping found.

Presumably EDK2 does rely on us having registered BARs. After staring at the EDK2 bhyve platform code for a while, I'm not sure how this is encoded.

It's encoded by PcdPciDisableBusEnumeration. If set to TRUE, EDK2 rely on us. If set to FALSE or unset, EDK2 will register BARs itself. For the X64 platform it was removed some time ago: https://github.com/tianocore/edk2/commit/70f3e62dc73d28962b833373246ef25c865c575e.

You could try building a debug version of EDK2 to get a more detailed log. Therefore, pass -b DEBUG and -DDEBUG_ON_SERIAL_PORT=TRUE to the build command.

Thank you! I think this means though that @jhb 's suggestion isn't viable, as we generally expect newer bhyve to work properly with older EDK2. In other words, we need some config flag which lets a particular platform say, "I want to enable BARs before executing any guest code". Or is there something more clever we can do?

Thank you! I think this means though that @jhb 's suggestion isn't viable, as we generally expect newer bhyve to work properly with older EDK2. In other words, we need some config flag which lets a particular platform say, "I want to enable BARs before executing any guest code". Or is there something more clever we can do?

Commonly a bootrom enumerates BARs on it's own and we can skip it in bhyve if a bootrom is given. Unfortunately, that's not working for older EDK2. I'm not sure if it's possible to read a PCD value from a given EDK2 image. Nevertheless, it wouldn't be trivial to detect an EDK2 image and scan it for a specific PCD value.

Btw. is this an issue on arm64? We don't have an bhyve EDK2 for arm64 yet. So, we could use latest EDK2 for our first arm64 image and assume that EDK2 on arm64 always supports BARs enumeration.

Thank you! I think this means though that @jhb 's suggestion isn't viable, as we generally expect newer bhyve to work properly with older EDK2. In other words, we need some config flag which lets a particular platform say, "I want to enable BARs before executing any guest code". Or is there something more clever we can do?

Commonly a bootrom enumerates BARs on it's own and we can skip it in bhyve if a bootrom is given. Unfortunately, that's not working for older EDK2. I'm not sure if it's possible to read a PCD value from a given EDK2 image. Nevertheless, it wouldn't be trivial to detect an EDK2 image and scan it for a specific PCD value.

Btw. is this an issue on arm64? We don't have an bhyve EDK2 for arm64 yet. So, we could use latest EDK2 for our first arm64 image and assume that EDK2 on arm64 always supports BARs enumeration.

Right now there is only one usable bootrom for arm64 that I'm aware of, u-boot-bhyve-arm64. So if someone works on EDK2 for arm64, I think we'd prefer for it to have the same behaviour as u-boot, i.e., it should have PcdPciDisableBusEnumeration=TRUE.

On amd64 I think it's more important to preserve compatibility with existing EDK2. With this patch, if someone wants to use an alternate bootrom, they have the option to set pci.enable_bars=false as well. The only other alternative I can imagine is to have some kind of "bootrom profile", wherein one can specify a collection of variables associated with a particular bootrom, not just the path to the image.

usr.sbin/bhyve/pci_emul.c
881

Heh, in the version I redid from scratch today I did indeed just return early.

1163–1164

This is less of an issue, but indeed, we should leave this clear for the case we aren't enabling decoding.

Thinking about this more, I wonder if we want to just leave BARs all set to 0 in this case and not even bother reserving resource ranges? I'll have to see how EDK2 copes with that I guess.

In D45049#1052826, @jhb wrote:

Thinking about this more, I wonder if we want to just leave BARs all set to 0 in this case and not even bother reserving resource ranges? I'll have to see how EDK2 copes with that I guess.

Devices don't need host resources backing them. Writing all 1s with the io and mem bits disable lets firmware proble the size. The original value on x86 is the space the BIOS assigned. On arm it's been 0 for the ancient uboot i messed with. Not sure about EDK2 but seems like a good place to start.

markj marked 2 inline comments as done.

Apply the more general approach of not programming BARs if a bootrom
was specified. At some point I was hitting problems with EDK2 with this
patch but can no longer reproduce them. Let's just change the behaviour
for both amd64 (with bootrom) and arm64 together.

Do you plan to do this as one commit? I'd be tempted to suggest doing all the bootrom reorganization as a separate commit before doing the PCI changes.

usr.sbin/bhyve/amd64/pci_lpc.c
107 ↗(On Diff #141592)

Do we need some sort of fallback for old config files that sets bootrom if lpc.bootrom is set?

usr.sbin/bhyve/bhyve_config.5
162
In D45049#1053202, @jhb wrote:

Do you plan to do this as one commit? I'd be tempted to suggest doing all the bootrom reorganization as a separate commit before doing the PCI changes.

Yes, I should split it.

usr.sbin/bhyve/amd64/pci_lpc.c
107 ↗(On Diff #141592)

Sorry, I don't quite follow - this code is setting bootrom using the value of lpc.bootrom, so if an old config file sets lpc.bootrom, then that should already be handled.

usr.sbin/bhyve/bhyve_config.5
162

The latest revision removes this config variable entirely. I should document bootrom though.

usr.sbin/bhyve/amd64/pci_lpc.c
107 ↗(On Diff #141592)

This code parses the lpc command line argument like -s 31,lpc,bootrom=abc. It doesn't parse -o lpc.bootrom=abc.

usr.sbin/bhyve/amd64/pci_lpc.c
107 ↗(On Diff #141592)

The -o variant parses just fine and works and is even documented as such in bhyve_config(5).

To be clear, I think having top-level '-o bootrom' is cleaner and have found it a bit odd of tying that into lpc in the first place. I just think somewhere we might need to handle old config files.

Mark: this code is parsing the '-s NN,lpc' option and translating the 'bootrom' argument in that command line into a variable setting.

usr.sbin/bhyve/bhyve_config.5
162

Oh, I was thinking it would be useful to keep the knob so that users could force the behavior either way (e.g. this could be the workaround for users with an old EDK2 ROM).

For bootrom and bootvars they need to be moved out of the 'lpc' section and into a their own top-level section.

usr.sbin/bhyve/bootrom.c
207 ↗(On Diff #141592)

Ideally what you'd have here is something like this:

    bootrom = get_config_value("bootrom");
#ifdef __amd64__
    /* Backwards compatibility. */
    if (bootrom == NULL)
        bootrom = get_config_value("lpc.bootrom");
#endif

Here and elsewhere (so perhaps you want a helper function?) The same fallback would be needed for "bootvars".

usr.sbin/bhyve/pci_emul.c
857

Here if you wanted to keep the knob it would be something like:

if (get_config_value(..., !bootrom_boot())
usr.sbin/bhyve/bootrom.c
207 ↗(On Diff #141592)

Ideally the fallback would give a deprecation warning (which presumably means storing the value somewhere rather than recomputing it every time lest you get multiple warnings)

usr.sbin/bhyve/pci_emul.c
947

I wonder if we should skip this as well if decoding isn't enabled? That is, don't bother reserving address space since the bootrom is going to allocate new space anyway?

markj marked 10 inline comments as done.Aug 13 2024, 5:18 PM
markj added inline comments.
usr.sbin/bhyve/amd64/pci_lpc.c
107 ↗(On Diff #141592)

I see now, thank you. I updated the amd64 option parsing function to handle these compatibility aliases.

usr.sbin/bhyve/bhyve_config.5
162

I moved them into the "GLOBAL SETTINGS" section.

usr.sbin/bhyve/bootrom.c
207 ↗(On Diff #141592)

I handle this in the amd64 bhyve_optparse(). Rather than caching a value, I just update the config tree directly (and print a warning).

usr.sbin/bhyve/pci_emul.c
947

I suspect we can, but it's also mostly harmless.

markj retitled this revision from bhyve: Let the guest enable PCI BARs on arm64 to bhyve: Do not enable PCI BAR decoding if a boot ROM is present.Aug 13 2024, 5:19 PM
markj edited the summary of this revision. (Show Details)
markj marked 3 inline comments as done.
markj edited the summary of this revision. (Show Details)
  • Restore the pci.enable_bars variable
  • Split out bootrom handling changes into a separate diff
This revision is now accepted and ready to land.Aug 14 2024, 6:15 AM
andy_omniosce.org added inline comments.
usr.sbin/bhyve/pci_emul.c
1163

Isn't this condition backwards?

usr.sbin/bhyve/pci_emul.c
1163

Yes, you are right. I've pushed a commit to fix that, thank you for the report.