Page MenuHomeFreeBSD

Bhyve cpu topology control
ClosedPublic

Authored by rgrimes on Mar 9 2017, 8:55 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 3, 7:31 AM
Unknown Object (File)
Thu, Jan 2, 8:02 PM
Unknown Object (File)
Tue, Dec 31, 3:06 AM
Unknown Object (File)
Dec 9 2024, 12:12 AM
Unknown Object (File)
Dec 8 2024, 11:59 PM
Unknown Object (File)
Dec 6 2024, 4:02 PM
Unknown Object (File)
Dec 6 2024, 4:02 PM
Unknown Object (File)
Dec 6 2024, 4:02 PM

Details

Summary

This adds the ability to control the CPU topology of created VMs from userland
without the need to tweak sysctls, it allows the old sysctls to continue to function.

The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.

Added command option looks like:
bhyve -c [cpus=]n[,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single integer
invokes the backwards compatible parse.

bhyvectl --get-cpu-topology option added.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/amd64/vmm/vmm.c
469 ↗(On Diff #26094)

minor - style(9) wants spaces either side of the '='.

479 ↗(On Diff #26094)

As above, style(9) on the '=', but don't need the indirect reference here.

Also there should be some error checking on the params, with an error propagated back up to the ioctl caller.

sys/amd64/vmm/vmm_dev.c
646 ↗(On Diff #26094)

As above, no need for pointers here.

sys/amd64/vmm/x86.c
64 ↗(On Diff #26094)

I think this can be deleted. There weren't large numbers of users of this due to it being global, and the bhyve parameters are much better to use.

usr.sbin/bhyve/bhyverun.c
187 ↗(On Diff #26094)

maxcpus is currently a no-op so it doesn't need to be exposed. No harm in having it in the internal api though.

828 ↗(On Diff #26094)

Should check an error return here.

@rgrimes what is the state of this review?

I have all the other comments corrected in my copy, seeking clarification on error check value for vm_set_topology before I correct that. Plan to delay depreciation of old sysctl method to allow simpler merge to stable/11. That is going to require a direct commit to stable/11, and a different commit to ^head

sys/amd64/vmm/vmm.c
479 ↗(On Diff #26094)

From reading manual page and code we currently have a limit of 16 vCPU, so should I just simply check for sockets*cores*threads <=16, and if not through EINVAL?

sys/amd64/vmm/x86.c
64 ↗(On Diff #26094)

I am leaving this here for now to simplify merging to stable/11.
I'll write another differential to mark the depreciation in stable/11 and delete from ^head.

rgrimes marked an inline comment as done.

Correct issues and add man page changes

Thanks for getting back to this Rod.

On the validation issue, AMD and Intel have their own limits on how these numbers are exposed, and even that can be model-specific. While they can be verified in the kernel, the error return is somewhat limited - it may be worth doing the check in user-space where a more useful error can be returned.

Thanks for getting back to this Rod.

On the validation issue, AMD and Intel have their own limits on how these numbers are exposed, and even that can be model-specific. While they can be verified in the kernel, the error return is somewhat limited - it may be worth doing the check in user-space where a more useful error can be returned.

I basically came to that same conclusion already and if you look at usr.sbin/bhyve/bhyverun.c line 210 I validate that the vCPU count and sockets * core * threads are self consistent, elsewhere this script checks that the vCPU count is supported by the kernel.
I did add a minimal in kernel check in case someone is using something other than bhyve(8) at sys/amd64/vmm/vmm.c line 485 for sockets*cores*threads > MAX_VCPU, and for any use of the unimplemented maxcpus != 0, there is also an XXX comment here asking how do I check this against the vm's vCPU count, which I am unsure how to do, do I have to iterate over the vcpuid from 0 to MAX_VCPU to figure out how many threads are created? Is this valid at the time we set topology?

I'd like to see some guest test results (FreeBSD, Linux, Windows) for varying numbers of these, in particular, with sockets > 1, and comparing what Qemu shows for the same configuration.

lib/libvmmapi/vmmapi.c
1425 ↗(On Diff #39405)

uint64_t -> uint16_t

lib/libvmmapi/vmmapi.h
218 ↗(On Diff #39405)

The type for these should be uin16_t - there is no need for it to be a 64-bit type.

sys/amd64/include/vmm.h
185 ↗(On Diff #39405)

uint64_t -> uint16_t

sys/amd64/vmm/vmm.c
169 ↗(On Diff #39405)

uint16_t -> uint16_t

431 ↗(On Diff #39405)

u_int -> uint16_t

459 ↗(On Diff #39405)

Minor style(9), space around =

472 ↗(On Diff #39405)

uint64_t -> uint16_t

482 ↗(On Diff #39405)

uint64_t -> uint16_t

484 ↗(On Diff #39405)

minor style(9), return statement should be on newline.

sys/amd64/vmm/vmm_dev.c
651 ↗(On Diff #39405)

If GET_TOPOLOGY isn't going to be implemented it should have an EINVAL put in.

Seems like it should be in bhyvectl.

sys/amd64/vmm/x86.c
94 ↗(On Diff #39405)

cores/cpus/sockets/threads should be uint16_t

usr.sbin/bhyve/bhyverun.c
96 ↗(On Diff #39405)

uint16_t -> uint16_t

I'd like to see some guest test results (FreeBSD, Linux, Windows) for varying numbers of these, in particular, with sockets > 1, and comparing what Qemu shows for the same configuration.

This code makes no change in what a guest sees using the old sysctl or the new options, if you want to see this you could of seen this anytime over the last N years, this is a non-request to this change as far as I am concerned.

lib/libvmmapi/vmmapi.h
218 ↗(On Diff #39405)

When I started on this task you told me to be QEMU compatible, so I did that, I looked at QEMU both for syntax and data types:
This is from qemu/vl.c
Static void smp_parse(QemuOpts *opts)
{

if (opts) {
    unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
    unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
    unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
    unsigned threads = qemu_opt_get_number(opts, "threads", 0);

We usually express "unsigned" as uint64_t in the kernel, are
you sure you want me to change these all to uint16_t?.

sys/amd64/vmm/vmm.c
459 ↗(On Diff #39405)

Fixed in my local copy

484 ↗(On Diff #39405)

Done in local copy, also next line has same problem. Fixed.

sys/amd64/vmm/vmm_dev.c
651 ↗(On Diff #39405)

I am investigating this issue.

You might have to explain further why you think you are exempt from having to demonstrate testing with this.

lib/libvmmapi/vmmapi.h
218 ↗(On Diff #39405)

"unsigned" would correspond to uint32_t, but there is no point in even using something that size so I prefer uint16_t.

You might have to explain further why you think you are exempt from having to demonstrate testing with this.

Let me try again. This differential in no way effects what a guest sees when the new method to specify
topology is used. Another words running guests with "sockets > 1" would yield no difference
what so ever in running it with vCPU > 1. Further more there is no change in what a guests sees
if you use hw.vmm.topology.cores_per_package or hw.vmm.topology.threads_per_core to effect
threads or cores, so I do not see why running a bunch of tests would yield.
I have run FreeBSD as a guest to make sure that the reported topology matches what was set
using the command line, and also tested that the results are correct if the boot time
sysctl tunables are used to ensure backwards compatibility.
Asking me to setup and run tests in KVM/Qemu is overkill, IMHO.
These tests could of been done using the already existing tunables any time over the
life time of the existing hw.vmm.topology.*

lib/libvmmapi/vmmapi.h
218 ↗(On Diff #39405)

My mistake, I was thinking the sizeof(int) on amd64 would be 8 not 4. I'll change all the types to uint16_t.
It would of been more productive to have this request in the first review.

rgrimes marked 16 inline comments as done.
rgrimes edited the summary of this revision. (Show Details)

Update per Peter Grehan

sys/amd64/vmm/vmm.c
431 ↗(On Diff #39405)

This is an existing interface with an existing data type, I wish to maintain the old type for compatibility with stable/11, once this change is merged back there shall be a review for depreciating these 2 sysctl's with notification in stable/11 and removal in ^head.

sys/amd64/vmm/vmm_dev.c
651 ↗(On Diff #39405)

I have implemented GET_TOPOLOGY and taught bhyvectl how to print it.

usr.sbin/bhyve/bhyve.8
36 ↗(On Diff #39405)

Fix long line, I have tested change from Warren Block, done in my tree.

80 ↗(On Diff #39405)

As above, fix long line. Waiting on input from Warren Block on how to fix this one, the same fix used for the other does not work here.

so I do not see why running a bunch of tests would yield.

This change only makes sense for Windows desktop guests where sockets are artificially limited to 1 or 2. No other guest cares.

How about you at least test with a version of that and give the results ?

I can cover the rest of the testing.

This revision is now accepted and ready to land.Mar 7 2018, 5:52 AM

Found a small whitespace nit in the man page. Also, you need to bump the .Dd to the date of the commit (when it is ready).

usr.sbin/bhyve/bhyve.8
106 ↗(On Diff #40026)

There is a trailing whitespace here, detected by textproc/igor.

In D9930#306720, @bcr wrote:

Found a small whitespace nit in the man page. Also, you need to bump the .Dd to the date of the commit (when it is ready).

I have fixed both of those in my local copy, must of snuck in while I was playing with long line problems, as I ran igor on an earlier version.
Igor is now run as part of my build script.
Thanks for the catch.

usr.sbin/bhyvectl/bhyvectl.c
192 ↗(On Diff #40026)

The coma at the end of this line should of been removed, this has been done in my local copy, about to push that.

Fix white space nit in usr.sbin/bhyve/bhyve.8
Remove extra comma in usr.sbin/bhyvectl/bhyvectl.c causing build failure

This revision now requires review to proceed.Mar 7 2018, 3:23 PM
This revision is now accepted and ready to land.Mar 7 2018, 10:42 PM

Update bhyve -h usage output to include new cpu topology options.
Pointed out by Roman Bogorodskiy (novel@).

This revision now requires review to proceed.Mar 10 2018, 4:39 PM
rgrimes added a reviewer: bhyve.
rgrimes edited the summary of this revision. (Show Details)

The issue causing the cpu topology to come out wrong on Intel cpus has been found, it is a misplaced call to vm_get_topology that is only executed for the cpuid 0xb, ecx=0 case, causing the ecx=1 case to use uninitialized values.

sys/amd64/vmm/x86.c
400 ↗(On Diff #40131)

This call needs to be moved before the if, otherwise sockets, cores and threads are used unitialized in the other cases of *ecx.

rgrimes edited the summary of this revision. (Show Details)
rgrimes marked an inline comment as done.

Make cpu topology sysctl's conditional on FreeBSD_version so that this patch can be merged to stable/11 and make these sysctl obsolete in >120057

Clean up the topology_parse() routing to catch more errors, and to properly match the documented syntax, update the bhyve.8 manual page to reflect the actual syntax implemented.

I think we're still good on the man page!

In D9930#315522, @bcr wrote:

I think we're still good on the man page!

Can you smash the accepted (manpages) button for me :-)

This revision is now accepted and ready to land.Apr 6 2018, 9:13 PM
This revision was automatically updated to reflect the committed changes.