Page MenuHomeFreeBSD

cam/iosched: Log outlier latency events
ClosedPublic

Authored by imp on Jul 18 2024, 10:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 19, 12:53 AM
Unknown Object (File)
Oct 10 2024, 3:37 AM
Unknown Object (File)
Oct 3 2024, 9:13 PM
Unknown Object (File)
Oct 1 2024, 3:40 PM
Unknown Object (File)
Oct 1 2024, 11:55 AM
Unknown Object (File)
Sep 30 2024, 7:04 PM
Unknown Object (File)
Sep 30 2024, 7:04 PM
Unknown Object (File)
Sep 30 2024, 7:04 PM
Subscribers
None

Details

Summary

Log outlier latency events to devd. In addition to counting, this will
allow analysis of whether the problem is confined to a specific block
range, or if it's a more general problem.

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 58719
Build 55607: arc lint + arc unit

Event Timeline

imp requested review of this revision.Jul 18 2024, 10:02 PM

Misc updates with testing

Do you want a global knob to control if devd reports are enabled rather than just the counts?

This revision is now accepted and ready to land.Jul 19 2024, 3:32 PM
In D46036#1049518, @jhb wrote:

Do you want a global knob to control if devd reports are enabled rather than just the counts?

devd will handle them fairly efficiently, so my first inclination is to always report. Other events from the kernel don't have this throttling..... though if you're getting a lot of these reported, they are forwarded to other listeners at least, which might be a problem for some people (I learned yesterday that firefox is one such listener). The default is 500ms in the device, which even in congested spinning disk land is kinda horrible and is semi-naturally rate limiting. Steady state that's 2/s. Due to head of queue blocking effects, that can be big bursts once in a while (more likely on devices with big queue depths like NVMe, though). We have a ~1k even buffer in the kernel. so they'll be generally rare, generally singletons, but on sick hardware can deviate from that. For that case, I'm not sure if you'd want to just increase this limit (because you know you have sick hardware and better on the way) or if you'd want a way to not get them at all.

So I'm on the fence about this...

This revision was automatically updated to reflect the committed changes.