Details

Reviewers

kib
br
markj
avg

Commits

rG459dc427873c: x86: Refactor kernel-mode NMI handling

Summary

This refactor aims to add the ability to share performance counter interrupts by refactoring the kernel-mode NMI handler.
The handler now allows multiple drivers to service the same interrupt (e.g. hwpmc(4) and hwt(4)'s Intel Processor Trace backend).

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

bnovkov created this revision.Aug 23 2024, 4:34 PM

Herald added a subscriber: imp. · View Herald TranscriptAug 23 2024, 4:34 PM

bnovkov requested review of this revision.Aug 23 2024, 4:34 PM

bnovkov added a parent revision: D46420: x86: Allow sharing of perfomance counter interrupts.

bnovkov added a child revision: D46397: amd64: Intel Processor Trace support.

markj added inline comments.Aug 26 2024, 1:54 PM

sys/amd64/amd64/trap.c
216	Should we return immediately if `handled = true` here?
223	The "we can return immediately" fragment doesn't really make sense anymore.

Address @markj 's comments.

bnovkov marked an inline comment as done.Aug 26 2024, 4:29 PM

bnovkov added inline comments.

sys/amd64/amd64/trap.c
216	I avoided this since it could be possible that a single PC interrupt gets consumed by multiple handlers, at least in theory. I'm not sure if this situation possible in practice though.

I do not quite like the patch. May I suggest to have a list of nmi handlers, registered dynamically?

I think registration for single-linked list can be done without sync with nmi processing, and de-registration can simply NULL the function pointer in the list element. This way you would avoid a need to sync with (AKA disable) NMI for list manipulation. But then we remove the need to patch trap.c for each NMI consumer.

sys/amd64/amd64/trap.c
214	This and the next line can be joined I believe. Is HWT option already defined for this patch to land?
225	Again, join the lines.

In D46421#1058638, @kib wrote:

I do not quite like the patch. May I suggest to have a list of nmi handlers, registered dynamically?

I think registration for single-linked list can be done without sync with nmi processing, and de-registration can simply NULL the function pointer in the list element. This way you would avoid a need to sync with (AKA disable) NMI for list manipulation. But then we remove the need to patch trap.c for each NMI consumer.

I like the approach, but after implementing it and thinking about the matter some more I'm not sure how to proceed.
I'd like to hear what you think before updating the diff.

You can find the dynamic registration patch here. It implements the approach you described above, with the addition of using a UMA_ZONE_NOFREE zone for allocating handler entries and masking the PCINT LVT before invoking the handlers.
However, I'm not sure that adding this additional complexity is entirely justified.
Only the PCINT LVT is currently configured to use an NMI, and according to the AMD and Intel manuals the only PCINT consumers (so far) are performance counters and Intel PT.
Adding dynamic kernel-mode NMI handler registration seems a bit overkill when we're dealing with a maximum of two handlers.

I intended this approach to be applied to all nmi handlers, including the nmi_handle_intr(). [This in particular means that toggling of pcint should be inside the handlers]. Also, with some work, parts of the asm code for nmi from exception.S can be moved to C.

Why do you need zone for the handlers list elements? Due to NOFREE? I do not think it is justifiable.

Why do you check for existing handler registration when registering? If anything, the action should be a panic IMO.

With these changes, it is quite small addition, and most important, it makes code much more modular, which I wanted: remove the knowledge of hwpmc and now pt from trap.c.

Address @kib 's comments:

NMI handlers are now dynamically registered
Added same changes to i386/trap.c

I've updated the patch and tested this approach by having pmcstat and the PT driver run at the same time. I have not encountered any crashes while testing.

There is, however, a peculiar issue I've encountered while working on this.
If I dynamically registers the ipi_nmi_handler, I get dropped into ddb when running the test described above.
However, as you can see in the patch, everything works fine if ipi_nmi_handler is invoked before the dynamic handlers.
I'm not sure what's causing this issue.

In D46421#1059782, @kib wrote:

I intended this approach to be applied to all nmi handlers, including the nmi_handle_intr(). [This in particular means that toggling of pcint should be inside the handlers]. Also, with some work, parts of the asm code for nmi from exception.S can be moved to C.

With these changes, it is quite small addition, and most important, it makes code much more modular, which I wanted: remove the knowledge of hwpmc and now pt from trap.c.

That makes sense, thanks for clearing things up!

Why do you check for existing handler registration when registering? If anything, the action should be a panic IMO.

I was on the fence about that one, I've changed it to a panic.

Why do you need zone for the handlers list elements? Due to NOFREE? I do not think it is justifiable.

Right, the handler list elements are never deallocated which is why I've created a NOFREE zone.

Please upload diffs with full context, otherwise the phab feature to suggest diff does not work.

I still suggest to not use zone, NOFREE for several elements is hardly a justification for the whole zone. Plain malloc would be fine there.

sys/x86/x86/cpu_machdep.c
896	This could be `*nmi_handlers_head`. The head pointer does not need to be the struct, and you do not use it as full struct.
908	Indent is wrong. It is worth to check for NULL func using plain atomic_load_ptr() before issuing CAS.
917	If CAS failed, I do not think it would succeed for any later try. You need to re-read nmi_handlers_head.next on failure. See how atomic_fcmpset_ptr() is used to reuse of the load done as part of CAS.
927	Wrong indent. Why using CAS there and not just plain store if hp->func is equal to handler?

Address @kib 's comments.

In D46421#1060536, @kib wrote:

Please upload diffs with full context, otherwise the phab feature to suggest diff does not work.

Sorry, don't know how that slipped through. The updated diff has full context.

I still suggest to not use zone, NOFREE for several elements is hardly a justification for the whole zone. Plain malloc would be fine there.

No problem, I've switched to plain malloc.

sys/x86/x86/cpu_machdep.c
917	You are right, `atomic_fcmpset_ptr` was what I needed, thanks

kib added inline comments.Sep 5 2024, 12:18 AM

sys/x86/x86/cpu_machdep.c
905	If you use atomic_load() for accessing hp->func, it should be done consistently I think, i.e. for each load and store.
917	Indent should be +4 spaces You do not need the loop body if you use atomic_fcmpset(). nmi_handlers_head should be loaded into hp->next once, and then you just do fcmpset() in a while() loop.

Address @kib 's comments.

bnovkov marked 2 inline comments as done.Sep 7 2024, 12:37 PM

Since the patch is almost finished, I looked at the overall structure and I do not see a reason to use atomics for accesses to ->func. Instead, note that CAS is used against ->next, and there we do need atomics, and perhaps store release on assignment, as well as load acquire on traversing.

sys/x86/x86/cpu_machdep.c
907	hpf = atomic_load_ptr(&hp->func); MPASS(hpf != handler); if (hpf == NULL &&

Address @kib 's comments.

avg added a subscriber: avg.Sep 19 2024, 12:43 PM

My only concern with the current deregistration approach is a hypothetical situation where a handler resides in a loadable module and, thus, can be unloaded after deregistration.
I like the clever simplicity of the current code where list elements are never removed and only the head pointer can ever be modified.
But maybe it would be good for nmi_remove_handler to wait for a completion of the handler if it's running.
E.g., we could add an atomic counter to struct nmi_handler which would be incremented when the handler is started and decremented when the handler is done.
Then, nmi_remove_handler could spin-wait on that counter if it's not zero (after having cleared func field to prevent future calls).

avg added inline comments.Sep 19 2024, 12:54 PM

sys/x86/x86/cpu_machdep.c
892–895	Style-wise I would prefer if `nmi_handler` structure declaration and definition of `nmi_handlers_head` variable were split.

kib added inline comments.Sep 19 2024, 4:10 PM

sys/x86/x86/cpu_machdep.c
907	hp->next load is not different than nmi_handlers_head load, both must have acquire semantic
931	again, loading next should have acq semantic, so that it sync/with rel in store of links, and we see func store if we see next element
938	Same acq

bnovkov added inline comments.Sep 21 2024, 1:25 PM

sys/x86/x86/cpu_machdep.c
907	Please correct me if I'm interpreting something wrong, but would that really be necessary? We only ever use atomics on `nmi_handlers_head` and not on the `->next` links, shouldn't a plain load be sufficient for traversal after we atomically load the list head?

kib added inline comments.Sep 21 2024, 2:56 PM

sys/x86/x86/cpu_machdep.c
907	When you store head you link the new element into the list. What makes a guarantee that if we found the element on the list, we read the func value written there before the linkage?

Address @kib 's and @avg 's comments:

Use atomic loads for links
nmi_remove_handler will now wait for a handler to finish before returning

In D46421#1064708, @avg wrote:

My only concern with the current deregistration approach is a hypothetical situation where a handler resides in a loadable module and, thus, can be unloaded after deregistration.

Thank you, I hadn't thought about this type of situation, the code should properly handle this now.

sys/x86/x86/cpu_machdep.c
907	Ah, you are right, I've switched to atomic_load for the links.

Another general suggestion is that using CK_SLIST from ConcurrencyKit (included into FreeBSD) might help to make the code more compact and expressive.
But I haven't thought through details, so please feel free to ignore this suggestion if CK does not provide something that could be readily used.
The proposed code is completely good too.

sys/x86/x86/cpu_machdep.c
942	IMO, this can / should be `cpu_spinwait`.

This revision is now accepted and ready to land.Sep 25 2024, 8:14 PM

In D46421#1066706, @avg wrote:

Another general suggestion is that using CK_SLIST from ConcurrencyKit (included into FreeBSD) might help to make the code more compact and expressive.
But I haven't thought through details, so please feel free to ignore this suggestion if CK does not provide something that could be readily used.
The proposed code is completely good too.

Yeah, please disregard this.
CK lists / queues are many readers / single writer.
So, unless you'd want to add a lock serializing registrations and deregitrations, those CK containers are not suitable.

At the same time, it might not be such a bad idea :-)

Since registration already uses malloc, it is not wait-free and it does not need to be wait-free in general.
Only nmi_handle_intr has to be wait-free.
So, adding the lock and reducing atomic requirements to handle interactions between a single writer (a single thread (de-)registering a handler) and a single reader (or maybe multiple if we can have concurrent NMI-s) may simplify atomic operations a little bit.
And CK could be used too.

Oh, thinking about this triggered some old memories and I've dug up my old musings on this topic.
The code was never really tested and I cannot guarantee that it's correct, but it seemed correct to the younger me and it still seems correct the now me.
https://people.freebsd.org/~avg/wfhandler.v8.h

Please feel free to ignore it or to pick any ideas, pieces of code, etc.

kib added inline comments.Sep 25 2024, 11:59 PM

sys/x86/x86/cpu_machdep.c
910	IMO load of ->func does not need to have atomic semantic, but it is fine to have it.
914	The indent should be same as on the previous line
926	The indent is wrong, it should be same as on the previous line
941	I do not see a need for this load to have acquire semantic. Can you explain why?
971	Similarly, why this add needs to be rel?

Address @kib 's and @avg 's comments.

This revision now requires review to proceed.Oct 2 2024, 4:52 PM

bnovkov added inline comments.Oct 2 2024, 4:55 PM

sys/x86/x86/cpu_machdep.c
941	right, this was unnecessary since reordering isn't an issue here, I've removed the acq/rel semantics from the running counter. thanks for catching this!

kib accepted this revision.Oct 2 2024, 9:27 PM

kib added inline comments.

sys/x86/x86/cpu_machdep.c
910	No need for this load to be atomic, you do not use atomic store for NULLing func.
969	Same.

This revision is now accepted and ready to land.Oct 2 2024, 9:27 PM

avg accepted this revision.Oct 5 2024, 6:44 PM

bnovkov mentioned this in D47989: hwpmc_x86: Register interrupt handler using the dynamic NMI registration interface.Dec 8 2024, 5:14 PM

Closed by commit rG459dc427873c: x86: Refactor kernel-mode NMI handling (authored by bnovkov). · Explain WhySun, Dec 15, 3:40 PM

This revision was automatically updated to reflect the committed changes.

bnovkov added a commit: rG459dc427873c: x86: Refactor kernel-mode NMI handling.

bnovkov mentioned this in rGd5ce54dddf49: hwpmc_x86: Register interrupt handler using the dynamic NMI registration….

x86: Refactor kernel-mode NMI handling
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 144133

sys/amd64/amd64/trap.c

sys/i386/i386/trap.c

sys/x86/include/x86_var.h

sys/x86/x86/cpu_machdep.c

x86: Refactor kernel-mode NMI handling ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 144133

sys/amd64/amd64/trap.c

sys/i386/i386/trap.c

sys/x86/include/x86_var.h

sys/x86/x86/cpu_machdep.c

x86: Refactor kernel-mode NMI handling
ClosedPublic
Actions

Revision Contents
Changeset List