- Allow the userland hypervisor to intercept breakpoint exceptions (BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to enable this feature. These exceptions are reported to userland via a new VM_EXITCODE_BPT that includes the length of the original breakpoint instruction. If userland wishes to pass the exception through to the guest, it must be explicitly re-injected via vm_inject_exception().
- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH pseudo-register. Injecting a BP# on Intel requires setting this to the length of the breakpoint instruction. AMD SVM currently ignores writes to this register (but reports success) and fails to read it.
- Rework the per-vCPU state tracked by the debug server. Rather than a single 'stepping_vcpu' global, add a structure for each vCPU that tracks state about that vCPU ('stepping', 'stepped', and 'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is currently reporting an event. Event handlers for MTRAP and breakpoint exits loop until the associated event is reported to the debugger.
Breakpoint events are discarded if the breakpoint is not present when a vCPU resumes in the breakpoint handler to retry submitting the breakpoint event. - Maintain a linked-list of active breakpoints in response to the GDB 'Z0' and 'z0' commands.
Details
- tested breakpoints while booting a freebsd kernel. Ran ptrace_test via kyua which used software breakpoints in the guest while kernel breakpoints were active to test BP# injection.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Still working on fully reviewing this, but I noticed this one thing in testing.
usr.sbin/bhyve/gdb.c | ||
---|---|---|
708 ↗ | (On Diff #57537) | gdb_cpu_add is called whether gdb is in use or not, but vcpu_state is only initialized when gdb is in use. So this will segfault when not using gdb. |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
708 ↗ | (On Diff #57537) | Huh, that is actually an old bug. PTHREAD_MUTEX_INITIALIZER just happens to be all zeros such that the first call to pthread_mutex_lock initialized it implicitly, but I should really be bailing from several of these gdb_cpu_ hooks if the stub isn't initialized. |
usr.sbin/bhyve/bhyverun.c | ||
---|---|---|
170 ↗ | (On Diff #58097) | The changes in this file to avoid calling into the debug server I will commit first as a separate change before the actual breakpoint change, but I've uploaded to them this review for now. |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
1196 ↗ | (On Diff #58097) | Extra space after ==, ditto below. |
1215 ↗ | (On Diff #58097) | The protocol specification says that a 'Z0' packet may include conditional expressions, delimited by a semi-colon. Shouldn't we search for that too? |
1229 ↗ | (On Diff #58097) | If we're not honouring the full request, I'd think that we should return an empty string. |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
1215 ↗ | (On Diff #58097) | This is supposed to be a ';' rather than a ':' so we can detect the expressions. It then follows into the else block below. Note that the debugger should only send those to us if we advertise support for ConditionalBreakpoints and/or BreakpointCommands in qSupported (which we don't). |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
1236 ↗ | (On Diff #58232) | Should this be a return? |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
102 ↗ | (On Diff #58232) | Could you add a comment explaining the state machine? It's not very easy to reconstruct it from reading the code. |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
1773 ↗ | (On Diff #58252) | Oops, sorry, that's not true. The problem is that new_connection() may zero the array after the fact. |
- Simplify parsing of breakpoint kind.
- Don't trash vcpu member of vcpu state.
- Style fixes.
usr.sbin/bhyve/gdb.c | ||
---|---|---|
672 ↗ | (On Diff #58252) | Or remove them for stepped (I removed another line in that block in the previous revision). |
1254 ↗ | (On Diff #58252) | I'd kind of left it as a placeholder in case we were going to support it in the future, but I can fix that. In fact, I'll probably restructure it a bit so we just fail for the cp != NULL case first and then fall through to the valid case. |
1773 ↗ | (On Diff #58252) | Oh, that is a bug. I had this in C++ where that was a vector and the vcpu was an index at one point, but went back to C and the bug originates there. I will probably fix new_connection to clear vCPU state explicitly instead. |
usr.sbin/bhyve/gdb.c | ||
---|---|---|
664 ↗ | (On Diff #58280) | I think we want to update cur_vcpu here too. |
I need to rework this a bit more. If the debugger removes the breakpoint, then we shouldn't report any pending events for that breakpoint. I had this happen during testing where I set a breakpoint on Xinvlpg and during boot the stub did the nice thing of reporting a breakpoint for each vCPU thread in turn. However, when I disabled the breakpoint so I could make forward progress, I got two more spurious SIGTRAP events in the debugger due to breakpoint events that were queued but no longer valid.
The simple implementation would be to dispense with the queue of events entirely and just report the most recent event each time. The vCPUs would hit the breakpoint again when they all resumed in that case. I think what I might do though which is slightly more elegant will be to have the case where 'continue' wants to report an already-queued event is to discard swbreak events if the associated breakpoint is no longer active. OTOH, I still need to think a bit about how to handle threads that have stepped and how to report a step. I think the vCPU needs to stay in the "stepped" state perhaps until it's step is actually reported. I can maybe handle these things by having the vCPU threads keep reporting the event if it wasn't ACKed and the event is still true perhaps? Maybe that is the better way to do this.
- Rework event reporting.
- Fix test to report stepped event.
- Make breakpoint packets idempotent.
- Remove any active breakpoints when the debugger goes away.
usr.sbin/bhyve/gdb.c | ||
---|---|---|
919 ↗ | (On Diff #58418) | MTRAP exits are an Intel-only feature. AMD CPUs don't support them. That said, what I probably should do is invoke this once during init_gdb to set a global 'mtrap_supported' bool that I then check here instead of calling it each time. I might do that as a followup change. |
I've been testing this. I observe the following problem when I put a breakpoint on pmap_pinit(). The breakpoint fires, I continue, and gdb/bhyve single steps the trapping CPU by enabling monitor traps. We get a monitor trap with $rip = Xtimerint_pti, gdb re-enables the breakpoint, and in the guest we re-execute the breakpoint when returning from the timer interrupt, so we're basically stuck in a loop.
Weirdly, I was testing an earlier version of the diff last week, and I didn't see this problem. I've tried going back to earlier versions of the diff and earlier kernels, but I keep seeing the same problem.
Hummmm. I haven't seen this behavior, but I can try it. I wonder what qemu does in this case. On a purely software model it probably pauses all the timers, etc. while stopped in the debugger, but I'm not sure if it does that if you are running on top of KVM?
The situation is akin to if the breakpoint in a userland program causes a signal handler to be invoked. On Linux single stepping will enter the signal handler. On FreeBSD we disable TF while entering the signal handler and restore it on exit so that we end up stepping over the entire signal. Probably on GDB this is handled by having the signal be reported to the debugger first.
It might be possible to reproduce this by injecting an NMI while the vCPUs are suspended after a swbreak. I haven't tried it, though.
The situation is akin to if the breakpoint in a userland program causes a signal handler to be invoked. On Linux single stepping will enter the signal handler. On FreeBSD we disable TF while entering the signal handler and restore it on exit so that we end up stepping over the entire signal. Probably on GDB this is handled by having the signal be reported to the debugger first.
So I've thought some more and for one-off events like NMIs which should be rare I think the current behavior we are just stuck with. The same for synchronous exceptions or faults if the instruction being stepped raises an exception. However, for device interrupts like timers, it might be sensible to block those while stepping. I need to look to see if I can "cheat" by setting some state in the VMCS that lets me avoid altering IF in %rflags. If I have to mess with IF then it gets messier as I will need to manually emulate pushf to avoid leaking IF into the guest-visible state.
Do you think it's worth while to add the IF-frobbing for stepping as a separate commit btw?
It's up to you. For the purpose of review I'm fine having it in this diff since I've read what's already here several times.