Page MenuHomeFreeBSD

bhyve - Snapshot Save and Restore multiple devices
Needs ReviewPublic

Authored by ionut.mihalache1506_gmail.com on Sep 10 2020, 10:21 AM.
Tags
Referenced Files
Unknown Object (File)
Fri, Jan 3, 6:15 AM
Unknown Object (File)
Fri, Jan 3, 6:11 AM
Unknown Object (File)
Thu, Dec 26, 11:16 PM
Unknown Object (File)
Dec 4 2024, 4:21 AM
Unknown Object (File)
Dec 4 2024, 4:21 AM
Unknown Object (File)
Dec 4 2024, 4:21 AM
Unknown Object (File)
Dec 4 2024, 4:20 AM
Unknown Object (File)
Dec 4 2024, 4:20 AM

Details

Reviewers
jhb
Group Reviewers
bhyve
Summary

This feature implements save-restore for multiple devices of the same type.

Initially the restore feature only restored the first device of a certain type.

The feature was rebased over commit 4a16bd8bb4cf1217720c21ae363626df51474edd.

Test Plan

For a guest that has 2 ahci disks and 2 virtio-net network adapters the teste cases were:

  • create a big file on one of the disk
  • start a copy process of the large file from one disk to the other
  • suspend the guest
  • restore the guest and the copy process should finish and the files are the same in data and size
  • initiate a ping process one virtio-net adapter at a time and suspend the guest
  • on restore the ping process should continue for the used adapter

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

vm_snapshot_kern_structs was called two times and probably caused a problem with save restore for multiple devices

vm_snapshot_kern_structs was called two times and probably caused a problem with save restore for multiple devices

Besides my another comments and questions, what is advantage of changing order of the vm_snapshot_kern_structs and vm_snapshot_user_devs (if look at the diff) ?

usr.sbin/bhyve/atkbdc.c
570

calloc(3) initialises all memory with zero, so you don't need explicitly assign was_restored to 0.

Also this is related to all below "was_restored = 0" changes.

usr.sbin/bhyve/pci_ahci.c
2505

Instead allocating dev_info, I would like to suggest to use one of the following:

  1. Use "sc" that already allocated in the line 2345, i.e. put some fields to sc's struct for every PCI type.
  2. Introduce a functions like:

    snapshot_pci_device_register(pi)"; for pci devices snapshot_device_register(cb, data, name); for non-pci devices

And replace all init code for all devices with those functions.

This make code simpler and incapsulate all allocations, etc. Thus each decl "struct vm_snapshot_dev_info *dev_info" and code below will be removed in xxx_init() functions.

usr.sbin/bhyve/pci_emul.c
2062

If meta_data always should be not NULL, just place assert() here.

2068

I would suggest to call pci_pause()/pci_resume() unconditionally for each PCI device and if pe_pause is NULL do not do anything. The same for pci_resume().
That will simplify initialisation and if PCI device has implemented pe_pause/pe_resume, a handler will be called during suspend/resume. If PCI device does not implement pe_pause/pe_resume, just ignore it.

usr.sbin/bhyve/snapshot.c
503

Probably you should correct a code to conform TODO's requirements and then remove TODO comment.

950–951

Is this comment still required?

952

What is this function currently used for? It only calls lookup_devs().

Probably lookup_devs() should be renamed with walk_and_restore_devices()

1191

commented code - bad style ?

1253–1256

Put fprintf under ifdef DEBUG or even replace with dprintf?

usr.sbin/bhyve/pci_ahci.c
2505

Some of the information will need to be used outside the context of the device itself - the static array of user devices has been removed, so the snapshot component must keep track of the functions that need to be called when performing a snapshot or restore. Also, for devices like atkbdc, the snapshot_pci_device_register will likely not be compatible.

Before having support for atkbdc, the static userspace device array was also not required, since you could discover most devices through the PCI bus; however, if other device types are to be supported, a more generic approach is required.

However, I agree that some of the logic, like allocating the structure and performing a part of the initialization could be done separately.

usr.sbin/bhyve/snapshot.c
1436

Changing an order of vm_snapshot_kern_struct() and vm_snapsshot_user_devs() makes kernel structure corruption on resume and host panic. For example, I got panic at restore_guest_fpustate() when vcpu->guest_xcr0 became corrupted:

http://bxr.su/FreeBSD/sys/amd64/vmm/vmm.c#1134

This is due to vm_snapshot_kern_struct() resets "offset" to 0 while vm_snapsshot_user_devs() uses lseek(SEEK_CUR).

Removed redundant lines; applied most of the recommended changes.

! In D26387#593374, @ionut.mihalache1506_gmail.com wrote:
Removed redundant lines; applied most of the recommended changes.

So I am adding rework of this review multiple-devices-v1.patch. Please look at this. Probably, if this patch is considered as good and better, I can
upload here to the review or can create another review.

Idea is: using unique name as identifier. For PCI devices it would be, for example, virtio-blk-pci-$bus-$slot-$func

Changes:

  • Use unique name to identify device.
  • Add simple register_snapshot_dev() function, for any type of device.
  • Changed some function names and defs related to "user devices" and renamed to "devices"
  • Renamed "struct" section to "kern_structs"
  • Removed pci_find_slotted_dev function

For example: "devices" section in $dumpfile.meta would be:

"devices": [
    {
      "device": "lpc-pci-0-31-0",
      "size": 520,
      "file_offset": 80418
    },
    {
      "device": "virtio-net-pci-0-4-0",
      "size": 66180,
      "file_offset": 80938
    },
    {
      "device": "virtio-blk-pci-0-3-0",
      "size": 8819,
      "file_offset": 147118
    },
    {
      "device": "hostbridge-pci-0-0-0",
      "size": 436,
      "file_offset": 155937
    },
    {
      "device": "atkbdc",
      "size": 141,
      "file_offset": 156373
    }
  ]

Rebased with a newer upstream and applied the changes from the review https://reviews.freebsd.org/D29538.