Page MenuHomeFreeBSD

kern/intr: remove "irq" from kernel event API
Needs ReviewPublic

Authored by ehem_freebsd_m5p.com on Oct 15 2021, 6:35 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 1, 2:24 PM
Unknown Object (File)
Fri, Nov 1, 2:24 PM
Unknown Object (File)
Fri, Nov 1, 2:24 PM
Unknown Object (File)
Fri, Nov 1, 2:23 PM
Unknown Object (File)
Fri, Nov 1, 2:23 PM
Unknown Object (File)
Fri, Nov 1, 2:22 PM
Unknown Object (File)
Fri, Nov 1, 2:22 PM
Unknown Object (File)
Fri, Nov 1, 2:12 PM

Details

Summary

Partially reverting 9b33b154b53. D39178 now has a proposed common interface for "irq" to interrupt structure resolution, Remaining portion is to switch intr_{get|set}affinity() and _intr_drain() to the interface.

There is a question of whether the numeric value to struct intr_event * should be internal, versus external. Receiving a struct intr_event * seems more consistent with the rest of kern_intr.c, though all present users have the number handy.

Squash review: (the original has been broken into pieces)
switch intr_{get|set}affinity() to internally use intr2event()/intrtab_lookup()
switch intr_{get|set}affinity() to expect intr_event * as argument
remove "irq" argument from intr_event_create()
remove ->ie_irq from struct intr_event

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 50490
Build 47381: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Pulling in the entire intended chunk.

Consider this as a WIP suggestion. I fully expect changes.

The observation seems fairly straightforward. Having the irq in intr_event is just wrong. The kernel interrupt/event code doesn't have a table of interrupts and thus cannot lookup interrupts efficiently.

I came to this conclusion when thinking about how the software corresponds to hardware. struct intr_event roughly represents the interrupt source and the device which needs attention from the main processor. struct intr_irqsrc roughly represents a pin on an interrupt controller. This changes with MSIs, but most sources have only a single pin and have no concept of multiple interrupts; it is the controller which defines the interrupt numbering.

As such I believe it is the controller side (struct intr_irqsrc) which should care about irq numbering and the event core shouldn't touch them at all.

This is particularly important with hot-plug controllers, where a given device might get randomly swapped between interrupts. A real example of this is the Xen event channel where on suspend the entire interrupt mapping is lost and has to be recreated.

My apologies to @ae and linuxkpi for the difficult implementation, but upon examining the situation it just looked really wrong.

I should also note having the interrupt number in struct intr_event is low value since it simply duplicates other locations. Since kern_intr.c doesn't have a table, it cannot do lookups quickly unlike all other locations.

mhorne added subscribers: jhb, mhorne.

I think I agree with you that the concept of an IRQ does not really belong to this layer, so tracking it in struct intr_event is 'wrong' in an academic sense. That said, there is clearly some practical benefit to doing so, as this patch complicates several callers which are currently quite simple. Part of the problem is the fact that we use different interrupt frameworks on different architectures, but this is the current state of things.

Let me ask this: is it important that intr_lookup() be fast? If so, why?

It is also worth noting that - despite the comment in sys/interrupt.h - ie_irq does not always refer to a physical IRQ line. For INTRNG platforms, logical/virtual IRQs are handed out by the framework and they don't necessarily correspond to any real IRQ number as visible to the PIC(s) in the system.

Maybe @jhb can give some informed input.

sys/kern/kern_cpuset.c
2194

It is not acceptable to remove this functionality. This (and the hunk above) is what enables userspace to get/set interrupt affinity using cpuset(2).

Updating to hopefully improve things...

I think I agree with you that the concept of an IRQ does not really belong to this layer, so tracking it in struct intr_event is 'wrong' in an academic sense. That said, there is clearly some practical benefit to doing so, as this patch complicates several callers which are currently quite simple. Part of the problem is the fact that we use different interrupt frameworks on different architectures, but this is the current state of things.

Question is how best to address this situation. Perhaps the updated approach might be acceptable?

Let me ask this: is it important that intr_lookup() be fast? If so, why?

I guess I don't actually know. Both x86 and INTRNG use implementation which allow the operation to be fast. Whereas PowerPC is using an approach which will be slower.

Hopefully the update is a notable improvement. Perhaps adding compatibility will push the merging process along. Problem is it looks like not all MIPS devices use INTRNG and I'm unsure how to properly address those. The pointer to pointer return was chosen since that should be of value to hot-plug interrupt controllers (physical ones might be rare, but software/VM ones are common).

sys/kern/kern_cpuset.c
2194

I was worried about this. Upon examination though I realized I was mixing things up and hopefully the update addresses this in an acceptable way.

I'm removing mips in a few weeks. If there is a hassle due to mips, it won't be a hassle for long. Unless I've misread this, I don't see this change being MFCd.

Okay, that will simplify things. I don't see this being MFC'd either. I dislike breaking MIPS before then though.

hselasky added inline comments.
sys/dev/mlx5/mlx5_en/mlx5_en_main.c
42

These headers should be included by the LinuxKPI.

sys/sys/intr_compat.h
37 ↗(On Diff #97030)

Should these two functions be put into a C-file, because not all kmods compiled outside the kernel tree carry the kernel options, thinking about INTRNG.

sys/dev/mlx5/mlx5_en/mlx5_en_main.c
42

Nevermind.

ehem_freebsd_m5p.com marked 2 inline comments as done.

Update to match comment.

My one concern is what effect will this have on attempting to unify interrupt systems on FreeBSD? Will this bring the goal closer (by starting on a hopefully acceptable API)? Will this push the goal away (by working around the immediate issue and relieving pressure)?

I hope it is the former, but it could be the latter. This should be adequate for the wacky idea which came to mind.

sys/sys/intr_compat.h
37 ↗(On Diff #97030)

Good point.

The one which was changing has now been split into appropriate locations. The other is simply a macro which doesn't change. Hopefully this adjustment is acceptable.

ehem_freebsd_m5p.com marked an inline comment as done.

Fix style quirk.

The term IRQ is already in various other places (e.g. SYS_RES_IRQ) so it's not clear to me how this is a gain? It might be helpful to understand the larger context of why you want to do this? FWIW, I had wanted the cpuset for interrupts to take a string, not an integer ID and to do a substring match of the interrupt name.

sys/sys/intr_compat.h
36 ↗(On Diff #97090)

Why does this return a struct intr_event ** instead of struct intr_event *? It seems like all the MD routines could just be a direct implementation of intr_get_event instead?

sys/sys/intr_compat.h
36 ↗(On Diff #97090)

All implementations of the interrupt code already allow this level of access. So far I don't believe anyone has made use of it, but I see a potential use. Notably a dynamic interrupt controller might be implemented by swapping intr_event structures around. Right now this is merely a thought, but this actually seems like an elegant approach.

In D32504#734992, @jhb wrote:

The term IRQ is already in various other places (e.g. SYS_RES_IRQ) so it's not clear to me how this is a gain? It might be helpful to understand the larger context of why you want to do this? FWIW, I had wanted the cpuset for interrupts to take a string, not an integer ID and to do a substring match of the interrupt name.

I had hoped the summary covered that. Having ie_irq in struct intr_event is wrong. For all non-MIPS devices, the interrupt number is elsewhere and ie_irq is simply a redundant copy. For MIPS devices using INTRNG, INTRNG has the value and ie_irq is a redundant copy. Some MIPS devices have interrupt code which is sufficiently primitive it doesn't duplicate anything, but if MIPS is going away ie_irq is simply duplicating data held elsewhere.

A driver for a true dynamic/hotplug interrupt controller could implement interrupt changes by swapping struct intr_event pointers around. If such a controller ended up with the interrupt range 3971-8066 and during a suspend, controller interrupts #3 and #4 got swapped, this could be handled by simply swapping the intr_events for interrupts #3974 and #3975. Having ie_irq on the event makes this harder.

I'm not sure all of the MIPS code has been removed, but a substantial chunk has. Time to ponder this?

I'm not sure all of the MIPS code has been removed, but a substantial chunk has. Time to ponder this?

The end result still seems messier to me, mainly due to the introduction of the new header. Without a clearer motivation for this change, I am more inclined to leave things as they are.

sys/sys/intr_compat.h
36 ↗(On Diff #97090)

Until this becomes more than an idea, please just go with jhb's suggestion. It is quite strange as-is.

Alternative approach. Create a standard typedef for the potential temporary, plus a convert function. I've observed several places which could benefit from being able to look up by interrupt number. (indeed, I see a way to remove the large array from the Xen event channel code if the lookup function is available)

Note, this is a squash of 6 closely related commits in my tree. I was thinking it might be better to review as a group, but if these are better reviewed individually, they can be split apart.

The number of reviewers and subscribers to D32504 keeps increasing, but the review activity is stale. Since the previous approach was refused, there is now a new approach.

Since adding a new header was refused, D35559 is in which renames to a common name. Then the common bits get added to each.

While removing the irq from struct intr_event is the goal, cleaning up portions of the interrupt frameworks has ended up as collateral. I like having more common functionality on the interrupt frameworks, so having lookup seems natural. Having the typedef is valuable for PICs which exist on multiple architectures (there are at least 3).

I could see the common function being intr_lookup_source() instead of intr_lookup(). This would reduce the delta, though end up with a worse name.

sys/arm64/xen/xen_arch_intr.c
92 ↗(On Diff #110666)

Note this is a WIP presently in my tree. If D32504 goes in sooner this might not ever be visible. There is also a decent chance this approach will fail review and never be visible anyway.

New reviews for the updated D32504? A month seems enough time for someone to comment.

sys/sys/intr_compat.h
36 ↗(On Diff #97090)

Marking done since this approach has been abandoned.

I'll try to look at the mlx5 / LinuxKPI bits tomorrow.

Having thought about this, I kind of wonder if intrtab_lookup() might be better than intr_lookup(). Handling the table of interrupts could nominally be separate from the interrupt handling. For instance if one tried to port PowerPC onto INTRNG, the first step would be to add a replacement for the present INTRNG table. I kind of like this change, but someone needs to type something first.

At this point my tree has this replaced with intrtab_lookup() since the potential for breaking the interrupt tables off the architecture interrupt cores seems valuable. Idea being to slowly pull pieces off until rather more is common.

If anyone is curious, ignoring community contributions is certainly helpful for getting a reputation of only accepting corporate paid modifications.

Updating to current tree status. Nothing particularly different, but now using intrtab_lookup().

I've been waiting for a review of the present status for months yet the silence is stunning. This now also depends on D35559 since there were objections to adding a header, so things needed to go somewhere.

I'm really impressed at just how slow reviews for things meant to be simple are.

Fundamentally the issue is rG:9b33b154b53 was a mistake. That merely copied the data and didn't have even the vaguest idea of how to remove the data from the original sources. One issue is the approach used simply cannot meet x86's need for a fast lookup function.

Instead the value should have moved in the opposite direction. There seems some potential for a common lookup table implementation, but that is tricky to handle all architectures (PowerPC and sparse numbering makes it interesting).

I can test this today, and report back. I think your change is OK. There may be different ways to enumerate interrupts, and passing a pointer may make more sense, for example software interrupts not connected to any specific hardware - right?

Hi,

Can you rebase this patch?

And there are some compile issues (AMD64):

/usr/img/freebsd/sys/kern/kern_cpuset.c:78:10: fatal error: 'machine/intr.h' file not found
#include <machine/intr.h>
         ^~~~~~~~~~~~~~~~

Adding Drew, due to changing set affinity for IRQ vectors. Any comments?

Adding Drew, due to changing set affinity for IRQ vectors. Any comments?

I'm not sure what the point of this is. To simplify use cases like hotplug by giving intr API consumers a way to call intr_foo() functions without keeping track of the vector?

The proposed API seems unnecessarily complex for existing callers that know the vector, since they now need to do "intr2event(intrtab_lookup(vector));". I understand that you want to be able to pass an *ie rather than a vector number in some cases, but perhaps there should be new intr_foo_ie() for all intr_foo() API entry points where the intr_foo() API remains the same and a vector is passed. Intr_foo() is just a wrapper for intr_foo_ie() and does the conversion of vector to ie, and calls the new intr_foo_ie(). Callers that want to call directly using a *ie because they don't have a vector can use the new intr_foo_ie().

But again, I may be missing the point.

Can you rebase this patch?

And there are some compile issues (AMD64):

This is based on D35559 and D39178. Previous version the addition of a header was rejected, so D35559 picks one of the two names and declares that the standard name. Notice under Revision Contents, Stack (2 open)?

I'm not sure what the point of this is. To simplify use cases like hotplug by giving intr API consumers a way to call intr_foo() functions without keeping track of the vector?

The real goal is the architecture interrupt cores already have the functionality of resolving an interrupt number to an architecture structure. The architecture structure then resolves to the struct intr_event which is what is actually needed by these calls. That lookup functionality is inherently present and cannot be removed. As a result having the exact same data on struct intr_event is 100% redundant for no gain (and most architecture lookup functions are much faster).

The proposed API seems unnecessarily complex for existing callers that know the vector, since they now need to do "intr2event(intrtab_lookup(vector));". I understand that you want to be able to pass an *ie rather than a vector number in some cases, but perhaps there should be new intr_foo_ie() for all intr_foo() API entry points where the intr_foo() API remains the same and a vector is passed. Intr_foo() is just a wrapper for intr_foo_ie() and does the conversion of vector to ie, and calls the new intr_foo_ie(). Callers that want to call directly using a *ie because they don't have a vector can use the new intr_foo_ie().

sys/kern/kern_intr.c should never touch the interrupt numbers. That is completely outside its remit. I suppose a macro for intr2event(intrtab_lookup(intr)) could be added (that would be D39178), but that seems rather underwhelming savings for a macro.

When it comes down to it, 9b33b154b53 was pretty much a simple hack to allow architectures without a proper interrupt table implementation to do simple interrupt number lookup. Thing is, MIPS was the last architecture lacking such a table, therefore this is now fully redundant with architecture functionality. As such this is effectively cleaning out yet another MIPS remainder.

I've got a vague plan for making the interrupt tables architecture-independent, but that isn't yet ready for consideration.

I'm not sure what the point of this is. To simplify use cases like hotplug by giving intr API consumers a way to call intr_foo() functions without keeping track of the vector?

As already stated, the goal is purely trying to get rid of redundancy. Looking again though, there are actually two pieces of redundancy and perhaps they can be addressed in distinct ways.

The first piece is storing the interrupt number on struct intr_event. I had been opting towards the approach of throwing that back to the architecture. When it comes down to it, this is mostly due to INTRNG doing lazy allocation of events. If INTRNG did event allocation during interrupt creation, then every interrupt could be relied upon to have an event and instead the architecture variables could be deleted. If someone approves D40166 this becomes possible.

The second piece is the interrupt lookup. Issue here is the kernel event implementation cannot substitute for the architecture implementations. In particular x86's lookup function must be fast (and O(1) is fast) and the linked list here simply doesn't work. The other issue is the architecture implementations must retrieve the architecture interrupt structure which the event implementation cannot do.

Something could be done, but it is a multi-step process. First creating a common table implementation is doable (PowerPC would require a hash table), the common typedef created in D39178 is crucial to this. Second, intr_event would need to add a pointer to the architecture interrupt type. Then the table could point to struct intr_event and substitute for all the architecture interrupt tables.

Unfortunately this is a multi-step process and if no one wants to touch FreeBSD interrupts, then it will simply slowly rot.

D40166 heads in the opposite direction with regards to the interrupt number. Instead of removing it from intr_event and pushing it back to the architectures, D40166 allows removing it from the architecture and only having it on intr_event.

D32504 will still be left with the issue of the architectures having interrupt number resolution functionality which is superior to what has been implemented in sys/kern/kern_intr.c. Notably all of those are faster and provide access to the architecture structure (which then universally includes a pointer to the event).

When it comes down to it, since I'm more concerned with getting rid of sys/kern/kern_intr.c:intr_lookup() it is acceptable to continue having intr_{get|set}affinity() using numeric interrupt numbers. This will require sys/kern/kern_intr.c to #include the architecture interrupt headers. I was reluctant to propose that due to how distinct they are, but this doesn't actually contradict my goals and is nominally acceptable to me.

This still leaves D35559, and D39178 unresolved.

The other issue is what to do about -ie_irq? This review includes removing it from the kernel events, but one could go the opposite direction and remove the corresponding value from the architectures. For ARM/INTRNG this is ->isrc_irq, for x86 this is ->is_pic->pic_vector() and for PowerPC this is either ->irq or ->vector.