Paths

Table of Contentst

rescue: Implement a direct dumper for arm64 and amd64
Needs RevisionPublic
Actions

Authored by jhibbits on Oct 31 2024, 3:52 PM.

Details

Reviewers

andrew
kib
jhb
imp
manu
ehem_freebsd_m5p.com

Group Reviewers

arm64

Summary

See the rescue.4 manual page for details on configuration.

amd64 and arm64 support is implemented. On arm64, this should work
regardless of whether the host uses an FDT or ACPI root bus; on amd64
the feature should work whether booted via EFI or legacy BIOS.

There are several independent pieces of the implementation:

Build-time configuration. There are two new kernel configuration options, RESCUE and RESCUE_SUPPORT. Compile rescue kernels with "options RESCUE" and compile kernels with "options RESCUE_SUPPORT" to enable use of a rescue kernel. Set the RESCUE_EMBED make option to embed a rescue kernel into a host kernel.
Enable rescue-kernel-on-panic by setting the debug.rescue_minidump tunable to 1 in the host kernel. When configured, rescue_kernel_init() allocates a physically contiguous chunk of memory for use by the rescue kernel. The reservation is populated with an aligned copy of the kernel, the host kernel's environment, and metadata (such as a DTB or an EFI memory map).
When rescue_minidump is configured, an attempt to dump will call rescue_kernel_exec(), which does some setup and jumps to the rescue kernel's entry point. initarm() and hammer_time() have some special hooks to pull metadata out of the reservation. In general I have tried to avoid modifying locore. This and the previous item are implemented in machine/rescue_machdep.c.
Once the rescue kernel has booted it behaves just like a regular kernel, i.e., there is no logic specific to rescue kernels. The one difference is that rescue kernels have a /dev/dumper, which can be used to read a minidump out of the host kernel's RAM. This is implemented in machine/rescue_dumper.c.

Original patch by Mark Johnston.

Obtained from: Juniper Networks, Inc.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 60282
Build 57166: arc lint + arc unit

Event Timeline

jhibbits created this revision.Oct 31 2024, 3:52 PM

Herald added a reviewer: manu. · View Herald TranscriptOct 31 2024, 3:52 PM

Herald added subscribers: olce, ehem_freebsd_m5p.com, emaste. · View Herald Transcript

jhibbits requested review of this revision.Oct 31 2024, 3:52 PM

Harbormaster completed remote builds in B60282: Diff 145740.Oct 31 2024, 3:53 PM

Really cool work. I think this could use a better description though (and possibly a rename) - in current usage the rescue prefix suggests rescue(8) and it's not clear to me what a "direct dumper" is. The summary message should include a brief description of what this actually is.

In D47358#1080093, @emaste wrote:

Really cool work. I think this could use a better description though (and possibly a rename) - in current usage the rescue prefix suggests rescue(8) and it's not clear to me what a "direct dumper" is. The summary message should include a brief description of what this actually is.

We call it a 'rescue kernel' at Juniper because it "rescues" the core; a small kernel, embedded in the main kernel, whose entire purpose is to save off the core of the panicked kernel, directly to disk. We use it because some of our devices don't have any swap at all, or don't have enough swap to cover even a minidump worth of pages. So we need something that can take a dump of the panicked system, and dump it to a file directly.

ehem_freebsd_m5p.com requested changes to this revision.Oct 31 2024, 5:15 PM

ehem_freebsd_m5p.com added inline comments.

sys/dev/xen/bus/xen_intr.c
55	This is wrong. `sys/dev/xen/bus/xen_intr.c` is pure-MI. This could go in `sys/x86/include/xen/arch-intr.h` though.

This revision now requires changes to proceed.Oct 31 2024, 5:15 PM

jhibbits added a subscriber: markj.Oct 31 2024, 8:51 PM

arrowd added a subscriber: arrowd.Nov 1 2024, 5:09 AM

cy added a subscriber: cy.Nov 1 2024, 2:16 PM

kevans added a subscriber: kevans.Nov 1 2024, 2:46 PM

How do you plan to handle the GICv3 ITS? It needs to use the same physical pages in both kernels. We already handle this when kexecing from Linux, but will need to pass the info needed to the rescue kernel.

markj added inline comments.Nov 1 2024, 3:36 PM

sys/amd64/amd64/machdep.c
799	This portion (together with the definition of efi_physmem_type()) should be in its own patch.

@jhibbits I don't have any concern about the functionality or the value it would bring. It would be great to have this capability. My comment was entirely about the potential confusion with rescue(8). I don't have a good alternate suggestion though, possibly something like failover, fallback, crashkern, dumpkern.

share/man/man4/rescue.4
6	Update prior to commit
57	how difficult is a non-embedded root fs? In particular, I am interested in initrd-style booting to support iscsi root and similar cases, and I wonder if we could fit enough in to support both use cases in one root fs image/tarball.
117	but not arm64?
sys/amd64/conf/GENERIC
108–110	Having commented-out entries here seems unusual. Should just be in NOTES. (Or eventually here, not commented.)
sys/amd64/conf/RESCUE
12	It would be good if there was a common file we could include to set these, options.small or whatever.

How do you handle DMA operations possibly still occuring at the panic or reboot time? For panic, the paniced kernel definitely does not do anything to stop them. For reboot, I am aware that e.g. mlx5 driver does nothing to stop PCIe interface from executing commands and writing updates to several rings, also the UMA memory is kept owned by the card until reset is done. Similarly, GPUs which have host memory allocated for them, might do DMA ops for long time after the host started smelling funny.

I mean, from above, that the rescue kernel could see random memory corruption if not limited itself to accessing only memory marked as free in the previous kernel.

sys/amd64/amd64/locore.S
110	I do not see any use for this 'int3' instruction. The instruction above it is jmp, not call, so there is no way for control to return there, except by random corruption.

So I don't have time to go over this in detail.
I don't like the name 'rescue' enough to complain.

Also, I think that this duplicates a lot of the work that loader.kboot could easily do to load this kernel with linux-like kexec_load that could (eventually) generalize to allowing FreeBSD to replace FreeBSD rather than just allowing FreeBSD to dump FreeBSD or FreeBSD to replace Linux.

I also think it's duplicative of the work I'll be doing the rest of the year to finish up amd64 LinuxBoot support, so some care needs to be taken there.

So I really like the functionality, but it's colliding with other work in a similar area I've been doing to make us bootable from Linux and I don't want us to step on each-other's toes. I also think a loader.kexec (freebsd native version of loader.kboot) could obviate the need for embedding and be more general.

share/man/man4/rescue.4
7	Like ed, I have a problem with the name 'rescue' I'd strongly prefer the name 'postmortum' because that's reflects what it's doing: it's there to allow for better post-mortum.
57	Since this kernel is compiled into the other kernel, I imagine hard. However, this kernel shouldn't be compiled into the other kernel, at least long term. It should be loaded via a kexec_load() operation. Ideally, one could convert loader.koot to be a FreeBSD binary again and use that to load this kernel (though there'd need to be some way to pass machine state (GIC) and used memory (so it know what to dump and also what memory to avoid). loader.kboot already supports the initrd style booting (though with a UFS image, not a tarball or CPIO, since we don't have the ability to create a ram disk from those currently, nor use tarfs that's there as a root).
sys/amd64/amd64/locore.S
111	Just FYI, loader.kboot does almost exactly the same thing when booting from Linux :) Except it has to copy the trampoline in and tell kexec_load the start address...
sys/amd64/amd64/machdep.c
1374	If you had loader.kboot load things, you wouldn't need this mini-loader in the kernel (well, I imagine it would be much smaller).

Also... I'm happy to write the loader.kexec parts. Work could use that and i think the replacement kernel case isn't a hugely different problem...

In D47358#1081496, @kib wrote:

How do you handle DMA operations possibly still occuring at the panic or reboot time? For panic, the paniced kernel definitely does not do anything to stop them. For reboot, I am aware that e.g. mlx5 driver does nothing to stop PCIe interface from executing commands and writing updates to several rings, also the UMA memory is kept owned by the card until reset is done. Similarly, GPUs which have host memory allocated for them, might do DMA ops for long time after the host started smelling funny.

I mean, from above, that the rescue kernel could see random memory corruption if not limited itself to accessing only memory marked as free in the previous kernel.

The rescue kernel runs out of a physically contiguous region of RAM reserved by the "host" kernel during boot, typically 64MB or 96MB. It reads some memory owned by the host kernel (e.g., vm page bitmap), but since its goal is to dump the host kernel's memory without modification, concurrent DMA access should not cause any problems as compared with traditional minidumps.

share/man/man4/rescue.4
57	Yes, I agree that being able to dynamically load the rescue kernel is required for this feature to be generally useful. It shouldn't be hard to implement.

gbe added a subscriber: gbe.Nov 4 2024, 6:14 AM

jhibbits added inline comments.Nov 5 2024, 6:51 PM

share/man/man4/rescue.4
57	I agree. However, I don't think a kexec_load() mechanism for this is the correct approach, but a loadable module could be. kexec_load() I presume requires user space to load the next kernel, but if the kernel crashes before user space is even available, this functionality would be unavailable. Making it a KLD allows the kernel to be preloaded from loader, and all the current machinery to Just Work (TM), as the rescue kernel is initialized currently at SI_SUB_VM_CONF, which is still pretty early in the boot phase. This could allow for easier debugging of problematic drivers that for $REASONS can't be loaded later, so are compiled into the kernel. Just looking through the code, it appears pulling the rescue bits out should be pretty easy to do, while keeping the RESCUE_SUPPORT in the kernel proper, just to get the KPIs needed by the rescue setup.

imp added inline comments.Nov 5 2024, 7:37 PM

share/man/man4/rescue.4
57	I think having the boot-loader pre-load the crash dump kernel makes sense. I aslo think kexec_load() makes sense. It's the same code either way (though we'd need to find the pre-loaded special thing too vs having a system call: the bulk of the code is really in the thing, not setting it up). I don't think it should be a normal .ko though, and I don't think kldload crash-kernel is the right user interface. Again, I'm happy to help making this happen since I've just re-entered the linuxBoot stuff.

ehem_freebsd_m5p.com added inline comments.Fri, Dec 13, 4:01 AM

sys/dev/xen/bus/xen_intr.c
55	Actually when checking later I noticed `sys/x86/include/xen/arch-intr.h` already includes `x86/apicvar.h` so this is redundant and breaking the machine-independence of this file.