Page MenuHomeFreeBSD

vmm: Fix AP startup with old userspace binaries.
ClosedPublic

Authored by jhb on Oct 22 2022, 9:48 PM.
Tags
None
Referenced Files
F108311695: D37093.id112234.diff
Thu, Jan 23, 6:49 PM
Unknown Object (File)
Thu, Jan 16, 6:40 AM
Unknown Object (File)
Thu, Jan 16, 3:29 AM
Unknown Object (File)
Thu, Jan 9, 7:42 AM
Unknown Object (File)
Tue, Jan 7, 10:15 PM
Unknown Object (File)
Dec 21 2024, 10:58 PM
Unknown Object (File)
Nov 17 2024, 7:30 PM
Unknown Object (File)
Nov 16 2024, 6:23 AM

Details

Summary

Older binaries that do not request IPI exits to userspace do not
start user threads for other vCPUs until a STARTUP IPI triggers a
VM_EXITCODE_SPINUP_AP exit to userland. This means that those vcpus
are not yet active (in terms of vm_active_cpus) when the INIT and
STARTUP IPIs are delivered to the vCPUs.

The changes in commit 0bda8d3e9f7a changed the INIT and STARTUP IPIs
to reuse the existing vlapic_calcdest() function. This function
silently ignores IPIs sent to inactive vCPUs. As a result, when using
an old bhyve binary, the INIT and STARTUP IPIs sent to wakeup APs were
ignored.

To fix, restructure the compat code for the INIT and STARTUP IPIs to
ignore the results of vlapic_calcdest() and manually parse the APIC ID
and resulting vcpuid. As part of this, make the compat code always
conditonal on the ipi_exit capability being disabled.

Test Plan

Tested both old and new bhyve binaries with a FreeBSD guest booted via bhyveload.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb requested review of this revision.Oct 22 2022, 9:48 PM
jhb added a reviewer: corvink.
markj added a subscriber: markj.

I hit the bug a couple of days ago and can confirm that this patch fixes it.

sys/amd64/vmm/io/vlapic.c
1146

Leftover debug print?

1186

Should this be gated on COMPAT_FREEBSD13? Same for the INIT case above.

This revision is now accepted and ready to land.Oct 23 2022, 3:00 PM
corvink added inline comments.
sys/amd64/vmm/io/vlapic.c
1138

This doesn't exactly matches the old behaviour. The old vmm checks the dest field and ignores INIT's where the dest field is 0. So, it supports no broadcast and wouldn't set all vlapics to BS_SIPI. Broadcast on old vmm are broken so I think this change is fine.

If you like to exactly match the old vmm behaviour, I would restructure it to:

if (!vlapic->ipi_exit) {
  if (dest != 0) {
    vlapic2 = vm_lapic(vlapic->vm, dest);
    vlapic2->boot_state = BS_SIPI;
  }
} else {
  CPU_COPY(&dmask, &ipimask);
}
1188

See my comment on INIT. This doesn't exactly matches old vmm behaviour.

Just a suggestion: The break on first match might be confusing and it's clearer if we use:

if (!vlapic->ipi_exit) {
  ...
} else {
  CPU_FOREACH_ISSET(i, &dmask)
  ...
}
jhb marked an inline comment as done.Oct 24 2022, 5:28 PM
jhb added inline comments.
sys/amd64/vmm/io/vlapic.c
1138

I did something like this, but used break statements to avoid lots of nested indentation.

1186

That's more complicated. We might should move all of the !ipi_exit code under that, but we'd need to also force the capability on by default and reject attempts to clear it in the setcap routines, etc.

  • Ignore calcdest results entirely for legacy.
This revision now requires review to proceed.Oct 24 2022, 5:28 PM
sys/amd64/vmm/io/vlapic.c
1141

I still think here we should ignore INIT IPIs sent to the current vCPU. Can be a simple:

CPU_CLR(vlapic->vcpuid, &imask);

There's no way that sending yourself an INIT IPI can make sense.

1163

BTW, for my work to allocate vCPUs on demand, I plan to replace the boot_state field with a couple of VM-wide cpusets since legacy binaries against the new kernel on that branch may not have instantiated the vlapics yet when these are sent. One set for vCPUs in the BS_INIT state, and a second set for vCPUs in the BS_SIPI state. It also means that we can replace the loop in the ipi_exit case below with a CPU_AND for example.

sys/amd64/vmm/io/vlapic.c
1141

This should only happen on broken guests. Especially due to vlapic_is_icr_valid the only way to send an INIT to yourself is by sending an INIT with DESTFLD set to yourself which is strange.

Sending an INIT to yourself shouldn't crash vmm or bhyve. So, I'm not sure if it's worth to add some quirks without any use case yet. Nevertheless, the change would be ok from my side.

corvink added inline comments.
sys/amd64/vmm/io/vlapic.c
1145

The check for SIPI to yourself is done some lines lower after vm_apicid2vcpuid. Or are there any other reasons to do it early on the dest field?

This revision is now accepted and ready to land.Oct 25 2022, 5:38 AM
sys/amd64/vmm/io/vlapic.c
1141

Yes, I'm specifically worried about either broken or malicious guests possibly triggering a local DOS on the host if that causes the vCPU thread to just spin forever.

1145

Oh yes, that's a leftover. I had moved the check below while working on this and had not removed it from here.

This revision was automatically updated to reflect the committed changes.