Page MenuHomeFreeBSD

kern/intr: add flags for multi-processor interrupts
Needs ReviewPublic

Authored by ehem_freebsd_m5p.com on Feb 8 2023, 9:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 25, 12:10 PM
Unknown Object (File)
Oct 5 2024, 12:48 AM
Unknown Object (File)
Oct 4 2024, 11:00 AM
Unknown Object (File)
Oct 4 2024, 4:13 AM
Unknown Object (File)
Oct 1 2024, 9:11 PM
Unknown Object (File)
Sep 27 2024, 2:04 PM
Unknown Object (File)
Sep 27 2024, 11:26 AM
Unknown Object (File)
Sep 27 2024, 12:53 AM
Subscribers

Details

Reviewers
mjg
markj
Group Reviewers
Core Team
Summary

Presently only layers above intr_event consider multi-processor interrupts. Yet these do effect the interrupt core. Notably, if interrupt counters were moved to intr_event then it would need to use atomics, per-processor counters, or something else.

Additionally there is a concern if a device expects a PPI ("Private Peripheral Interrupt" in ARM terms, but per-processor interrupt), yet receives a SPI ("Shared Peripheral Interrupt" in ARM terms, but opposite of PPI) there will likely be trouble.

I believe x86 isn't sending PPI interrupts to intr_event. Instead the equivalents are using intrcnt_add() and never letting the interrupts reach the core.

This is a squash review of 3 commits. One commit for each flag, plus a third for type checking and marking clk_intr_event as being multi-processor.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 53053
Build 49944: arc lint + arc unit

Event Timeline

mjg added inline comments.
sys/kern/kern_intr.c
1396

atomic ops like this are too expensive to introduce

the thing to do is to have per-cpu counters

sys/kern/kern_intr.c
1396

The expectation is 99.44% of the time, this will only be accessed by a single processor (the one handling the particular interrupt). Due to being well-separated from other structures, this is unlikely to share a cache line with anything.

The two instances a second processor is likely to touch this are:

  1. a PPI interrupt
  2. userspace is retrieving the interrupt counters

Having per-processor interrupt counters for anything other than a PPI interrupt seems strange.

sys/kern/kern_intr.c
1396

atomics are dog slow on amd64, whether they bounce cachelines or not.

if there is only one cpu ever modifying then there is no reason for it to use atomics as is

ehem_freebsd_m5p.com added a subscriber: jhibbits.

Was I reading the discussion on IRC correctly that padding struct intr_event to cache-line size was being suggested?

sys/kern/kern_intr.c
1396

ARM64 does have PPI interrupts which require atomics. I suppose the use of atomics could be removed on x86 (x86 memory consistency being much easier). Anyone else care to comment since this is sounding odd.

The alternative I have is to introduce a new flag for ->ie_flags, tentatively named IE_MULTIPROC which indicates the event is triggered on multiple processors (PPI interrupt, though nominally IPI interrupts would qualify). Then turn ->ie_intrcnt into an array and allocate per-processor counters.

One quirk of this, what should be done about the stray counter?

In particular PPI (or IPI) interrupt handlers never signal stray interrupts, since masking these interrupts would be Bad (PPI interrupts are generally some flavor of secondary interrupt controller). In such case should the stray counter be omitted from PPI interrupt events?

sys/kern/kern_intr.c
1396

if there is only one cpu ever modifying then there is no reason for it to use atomics as is

For ARM(64) PPI interrupts, there can be multiple processors trying to update the value at the same time.

Having looked at the situation and thought about things, I do have an alternative. AMD64 also has PPI interrupts, just no one uses the term. Notably all the hypervisors at using intrcnt_add() to create arrays of per-processor counters. Having multiple counters for PPI interrupts would deviate from INTRNG's current implementation, but I hope that is acceptable.

Candidate implementation of what @mjg requested. Adding a flag to mark interrupts which can occur on multiple processors and therefore need an array of counters.

Note this has been successfully compiled, but not tested anywhere yet. Adding the flag is a separate commit, but I hope this can be reviewed as part of D38448. It can be reviewed separately if needed though.

ehem_freebsd_m5p.com marked an inline comment as done.

Since D38448 presently has no reviewers, I'm nomination people who might be appropriate to review this.

This does not solve the problem.

ie = malloc(sizeof(struct intr_event) + (flags & IE_MULTIPROC ?
    sizeof(u_long) * (mp_ncpus - 1) : 0), M_ITHREAD, M_WAITOK | M_ZERO);

Say sizeof(u_long) == 8 and there are 8 cpus. Then this adds 64 bytes shared by all of them, where each only writes to an 8 byte portion -- meaning they all keep bouncing the same area, largely defeating the point.

This needs a per-cpu alloc.

More importantly, currently the kernel lacks infrastructure to sensibly allocate per-cpu counters and the current counter(9) does NOT solve the problem, albeit it can work around it for some cases.

All this is with the assumption that it makes sense to share one intr_event struct between different cpus.

all that said, I would have to dig into what the code is doing and what makes sense to do, which I can't at the moment due to already big backlog.

In D38448#886009, @mjg wrote:

Say sizeof(u_long) == 8 and there are 8 cpus. Then this adds 64 bytes shared by all of them, where each only writes to an 8 byte portion -- meaning they all keep bouncing the same area, largely defeating the point.

You were objecting to using atomics, this is a solution which results. This is mostly equivalent to the current multi-processor interrupt counter implementations for PowerPC and x86. Both of have an intrcnt_add() function which ends up allocating a contiguous range of counters (unless callers manage to interleave, but this isn't too likely). This might not have the best performance characteristics, but it should match the existing characteristics.

In D38448#886009, @mjg wrote:

All this is with the assumption that it makes sense to share one intr_event struct between different cpus.

all that said, I would have to dig into what the code is doing and what makes sense to do, which I can't at the moment due to already big backlog.

Having looked a bit more at this, that is a rather interesting question. Issue is INTRNG pushes both normal device interrupts and PPI interrupts through intr_event_handle() since both get pushed through similar paths in the bus infrastructure. I now wonder if that is a design mistake. Both x86 and PowerPC allocate separate interrupt vectors for each individual processor (and then use intrcnt_add() for the relevant counters).

If instead PPI interrupts were handled as being more similar to IPI interrupts, then INTRNG wouldn't need to push multi-processor interrupts through this path. This seems a reasonable approach, but I've no idea whether that would be accepted.

Appears INTRNG isn't the only thing tending to push multi-processor interrupts onto single intr_event structures. At D25754 a multi-processor soft interrupt was created. As such it would need to share this.

Updating, spreading the proposed flag further. Still need an answer for whether multi-processor interrupts should share an intr_event structure though.

Updating to re-add a commit which got dropped.

Note: This is now a *sqash* review. The commits to create the flags on intr_event and the bus code are separate from the adding the counters commit.

As such the flag could be added without addressing the counters issue.

I'm starting to think the flag addition to he lower layers should be broken into a separate review. Issue is that has some effects all over the place.

sys/kern/subr_intr.c
757–762

This is something I had been concerned with. Drivers which *require* PPI interrupts will almost certainly *malfunction* if they get a non-PPI interrupt. As such there needs to be some way for a driver to force a PPI interrupt. I'm unsure of the severity of the inverse situation.

sys/xen/intrng/xen_arch_intr.c
182 ↗(On Diff #125747)

Presently this hasn't yet been brought into FreeBSD, so this can be ignored for now.

Updating D38448 to split off the counter addition. At this point I'm under the suspicion the core team needs to consider this, and say "yay" or "nay". Whether interrupts which occur on multiple processors should share an intr_event structure is a real question. So far several places appear to be saying "yes", but I would like a core team review.

ehem_freebsd_m5p.com retitled this revision from kern/intr: add interrupt counters to intr_isrc to kern/intr: add flags for multi-processor interrupts.Aug 11 2023, 8:15 PM
ehem_freebsd_m5p.com edited the summary of this revision. (Show Details)
ehem_freebsd_m5p.com added a reviewer: Core Team.
sys/kern/kern_intr.c
1645–1646

This should be where clk_intr_event gets created. If apei_attach() gets called first, I would worry.

1648–1649

I've confirmed clk_intr_event gets triggered/handled on multiple processors on an ARM device. I'm concerned SWIs being uni-processor may be more notable than multi-processor SWIs.

I hadn't realized there was an actual sys/arm/arm/generic_timer.c which appears to be used for ARM64. As such I've identified 3 places on ARM where a single intr_event structure is used for interrupts which occur on multiple processors. I now fear this could be all over ARM and this may already have been endorsed by fiat.

I suspect the ARM generic_timer works with either multi-processor or single-processor interrupts. Mostly an issue of whether it is on an older device with a single processor or no PPIs, versus a newer device with multiple processors and PPIs. At the same time I am looking at an in-progress driver which won't function if it gets a single-processor interrupt. This should be very unlikely, but the result could be very strange if it somehow got the wrong type.

sys/kern/kern_intr.c
611–622

I've never seen the first happen in real life. The second likely indicates the flag was omitted. Problem is if the first did occur in, I would expect there to be problems. Hence this test.

1648–1649

(perhaps _IE_SOFT should take the value of IE_SOFT, and the new IE_SOFT should equal IE_SOFT | IE_MULTIPROC)

sys/sys/bus.h
285

I wonder if there should also be INTR_UNIPROC to indicate PPI/IE_MULTIPROC interrupts are not acceptable.

Correspondingly there might be a INTR_ANYPROC = INTR_MULTIPROC | INTR_UNIPROC to indicate either is acceptable. In the future setting neither could become a warning.