This revision adds a new device file at /dev/vmmctl which can be used to create VMs / device files using new ioctl commands VMMCTL_CREATE and VMMCTL_DESTROY. When a file descriptor to /dev/vmmctl is closed, all VMs associated with the file descriptor are automatically destroyed.
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
lib/libvmmapi/vmmapi.c | ||
---|---|---|
109 | ||
lib/libvmmapi/vmmapi.h | ||
117 | ||
sys/amd64/vmm/vmm_dev.c | ||
1095 | I think *name can be const? | |
1351 | Per style(9), there should be no extra indentation for case statements. | |
1353 | op->name is a character buffer supplied by userland. Nothing guarantees that it is nul-terminated, so we should check that here, e.g., with strnlen(), and fail with EINVAL if not. | |
1368 | Same here regarding nul termination. | |
1398 | Perhaps loop over all VMs and fail if any are non-persistent? | |
1411 | I don't think there's any need to define a new module here? Can't we hook into vmmdev_init() and vmmdev_cleanup()? |
Fix some style issues, fix bug where string from userspace was not strnlen'd, move device file setup, cleanup code into vmmdev_init and vmmdev_cleanup (which causes some code to be shifted around in the file).
Before looking deeper into the code I picked out these first.
lib/libvmmapi/vmmapi.c | ||
---|---|---|
115 | the last 0 is not necessary if you are not creating a file. | |
123 | We allows vmname with the length of VM_MAX_NAMELEN. | |
137 | We allows vmname with the length of VM_MAX_NAMELEN. | |
sys/amd64/include/vmm_dev.h | ||
351 | VM_MAX_NAMELEN + 1 | |
sys/amd64/vmm/vmm_dev.c | ||
1142 | style-nit: there is no need to make one-line statement-true a compound statement. | |
1292 | ||
1302 | The namespace is still shared between vmmctl-created vm and sysctl-created vm. | |
1373 | VM_MAX_NAMELEN + 1 |
Fix some style issues, and update max length of VM name accordingly due to changes in commit df95cc76affb.
I don't understand the motivation for this change. What kind of VM objects (vm_object_t) are being created and for what use? Do you mean instead that you want to create Virtual Machines (in which case, please do not say VM objects, that means vm_object_t from <vm/vm_object.h>)? I think in general we'd like to move to using a cdev for this and retiring the sysctl entirely. However, a couple of thoughts:
- The create sysctl should take a structure that we can extend over time to add new fields (e.g. eventually a "max_vcpus" field is one use case folks already want). I think I would also make the 'persistent' flag be a flag passed to create rather than implicit (this would let us get rid of the sysctl entirely if bhyveload were able to create persistent VMs via /dev/vmmctl). The contract should be that userland has to 'bzero' (or memset(0)) the structure passed to create before setting specific fields. This would let us handle backward compat ABIs via the ioctl size field and knowing that unknown fields are set to 0.
- Somewhat orthogonal, but we probably want a way to restrict namespaces so that you can create jails that can't "see" the VMs in other jails. Perhaps in the outer jail you could see child VMs using some sort of "jail/VM" syntax for the purposes of bhyvectl. One way of doing that might be to have a VMM_OPEN that takes a pathname and allow bhyvectl to use that instead of /dev/vmm/<foo> devices directly.
I know when I've talked to @grehan in the past about using FD's to represent VMs there has been a desire for at least the non-bhyveload case to have something like VMM_CREATE return a new fd that the ioctl are then performed on without creating a /dev/vmm/<foo> entry at all, just an anon FD that destroys the VM on close.
Right, VM here means "virtual machine." The ambiguous "VM object" terminology is my fault, we should just say "VMs" generally. In some cases one wants to refer to a struct vm specifically, which is where "VM object" came from.
This change adds a cdev that can be used to create VMs, instead of the sysctl interface. I tend to think that the sysctl interface needs to stick around for a while, as it's useful to be able to run bhyve(8) from older releases with a new kernel.
However, a couple of thoughts:
- The create sysctl should take a structure that we can extend over time to add new fields (e.g. eventually a "max_vcpus" field is one use case folks already want). I think I would also make the 'persistent' flag be a flag passed to create rather than implicit (this would let us get rid of the sysctl entirely if bhyveload were able to create persistent VMs via /dev/vmmctl). The contract should be that userland has to 'bzero' (or memset(0)) the structure passed to create before setting specific fields. This would let us handle backward compat ABIs via the ioctl size field and knowing that unknown fields are set to 0.
- Somewhat orthogonal, but we probably want a way to restrict namespaces so that you can create jails that can't "see" the VMs in other jails. Perhaps in the outer jail you could see child VMs using some sort of "jail/VM" syntax for the purposes of bhyvectl. One way of doing that might be to have a VMM_OPEN that takes a pathname and allow bhyvectl to use that instead of /dev/vmm/<foo> devices directly.
https://reviews.freebsd.org/D31156 is a step towards this. With that change, only a parent jail can see VMs in a child jail. The namespace is still global for now, so two sibling jails cannot create VMs with the same name.
I know when I've talked to @grehan in the past about using FD's to represent VMs there has been a desire for at least the non-bhyveload case to have something like VMM_CREATE return a new fd that the ioctl are then performed on without creating a /dev/vmm/<foo> entry at all, just an anon FD that destroys the VM on close.
This change is intended to be a step towards that. For now the new device file only handles creation and destruction, with destruction happening automatically when the fd is closed.
While I think we might need to leave it around under COMPAT_FREEBSD<N> for some time, I'd like for the new interface to also support bhyveload if possible (e.g. by having persistent be a flag and then not destroying persistent VMs on close). That would let us stop using the sysctl entirely on HEAD for example.
Add a "persistent" flag to allow some vmmctl-created VMs to not be automatically destroyed when the file descriptor is closed. This allows /dev/vmmctl to be used with bhyveload as well. I also added some padding to the ioctl structs for future-proofing.