Page MenuHomeFreeBSD

acpi_ged: Handle events directly
ClosedPublic

Authored by gallatin on Oct 11 2023, 3:10 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 16, 3:47 AM
Unknown Object (File)
Dec 14 2024, 5:57 PM
Unknown Object (File)
Dec 3 2024, 7:16 PM
Unknown Object (File)
Oct 30 2024, 4:23 PM
Unknown Object (File)
Oct 28 2024, 11:04 PM
Unknown Object (File)
Oct 2 2024, 8:17 PM
Unknown Object (File)
Oct 2 2024, 8:13 PM
Unknown Object (File)
Oct 2 2024, 8:13 PM
Subscribers

Details

Summary

We have an arm64 server which encounters an acpi_ged event (APEI Corrected Error) on boot, as soon as the driver loads and the interrupt is unmasked by virtue of it being registered.

With the current code, on a platform like arm64 where EARLY_AP_STARTUP is not in use, this means we encounter the interrupt when running on just the base CPU. The interrupt fires, AcpiOsExecute queues the event, and then the isr exits and the interrupt is unmasked. This leads to another interrupt immediately, which causes the same event to be read and queued again, and we eventually livelock with the console spewing "AcpiOsExecute: failed to enqueue task, consider increasing the debug.acpi.max_tasks tunable".

I first attempted to fix this by deferring acpi_ged interrupt setup until after the APs were running (https://reviews.freebsd.org/D42141). This "works", but has several problems. First, several duplicate events are queued on the CPU handling the interrupt before another core can parse the event and eventually ack the interrupt. Second, this does not help in the case of a single CPU.

On that review, @jhb pointed out that according to the spec (5.6.4: General Purpose Event Handling), the OS must leave the interrupt source disabled until EOI. This is not a good fit for our method of queuing the work (including the EOI ack of the interrupt), since when the ithread is done, the interrupt will be unmasked.

The only solution I can see is to simply handle the event directly, which is what this patch does. With it, I've seen no complaints about sleeping in the wrong context, etc, and I see just a single interrupt on acpi_ged (rather than the 3-5 I'd see with the previous attempt in https://reviews.freebsd.org/D42141)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Oct 11 2023, 4:04 PM

Would it be useful to add a tunable to revert to the current behaviour if we do find a machine that can't run the ACPI method from an ithread?

I don't suppose that there's a way to know if the GPE handler sleeps so we can warn / avoid it?

Update to add a tunable to run ged events in a deferred context, as suggested by @andrew

This revision now requires review to proceed.Oct 11 2023, 5:15 PM

Would it be useful to add a tunable to revert to the current behaviour if we do find a machine that can't run the ACPI method from an ithread?

Yes, you're probably right. I'd hate to be the reason somebody's system panics or deadlocks at boot, so I've updated the patch to add a tunable.

In D42158#961862, @imp wrote:

I don't suppose that there's a way to know if the GPE handler sleeps so we can warn / avoid it?

Not as far as I know. But I'm not super familiar with ACPI.

This revision is now accepted and ready to land.Oct 11 2023, 6:36 PM
This revision was automatically updated to reflect the committed changes.
In D42158#961862, @imp wrote:

I don't suppose that there's a way to know if the GPE handler sleeps so we can warn / avoid it?

There is not. All you get is an event number and associated IRQ, and an _EVT method you're supposed to call with the sole argument being the event number.