Page MenuHomeFreeBSD

hwpmc: Fix amd/arm64/armv7/uncore sampling overflow race
ClosedPublic

Authored by jrtc27 on Dec 25 2021, 1:15 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 12, 1:51 AM
Unknown Object (File)
Fri, Nov 1, 2:59 PM
Unknown Object (File)
Oct 3 2024, 7:01 AM
Unknown Object (File)
Sep 23 2024, 8:37 AM
Unknown Object (File)
Sep 21 2024, 1:01 PM
Unknown Object (File)
Sep 21 2024, 1:01 PM
Unknown Object (File)
Sep 21 2024, 1:00 PM
Unknown Object (File)
Sep 21 2024, 1:00 PM
Subscribers

Details

Summary

If a counter more than overflows just as we read it on switch out then,
if using sampling mode, we will negate this small value to give a huge
reload count, and if we later switch back in that context we will
validate that value against pm_reloadcount and panic an INVARIANTS
kernel with:

panic: [pmc,1470] pmcval outside of expected range cpu=2 ri=16 pmcval=fffff292 pm_reloadcount=10000

or similar. Presumably in a non-INVARIANTS kernel we will instead just
use the provided value as the reload count, which would lead to the
overflow not happing for a very long time (e.g. 78 minutes for a 48-bit
counter incrementing at an averate rate of 1GHz).

Instead, clamp the reload count to 0 (which corresponds precisely to the
value we would compute if it had just overflowed and no more), which
will result in hwpmc using the full original reload count again. This is
the approach used by core for Intel (for both fixed and programmable
counters).

As part of this, armv7 and arm64 are made conceptually simpler; rather
than skipping modifying the overflow count for sampling mode counters so
it's always kept as ~0, those special cases are removed so it's always
applicable and the concatentation of it and the hardware counter can
always be viewed as a 64-bit counter, which also makes them look more
like other architectures.

Whilst here, fix an instance of UB (shifting a 1 into the sign bit) for
amd in its sign-extension code.

Test Plan

make kernels builds, run-time tested on arm64 (would reliably panic for
frequent counters, no longer does so far)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/dev/hwpmc/hwpmc_arm64.c
226

This isn't quite right either since, when adding 0x100000000llu * pm->pm_pcpu_state[cpu].pps_overflowcnt which, for sampling counters, will be 0xffffffff00000000llu (32 1's then 32 0's), we'll get a huge number again after negation (2^32). Can fix this either by checking for overflow after adding on the high 32 bits, or always incrementing pps_overflowcnt to make it look more like a real 64-bit counter.

I'll go with the latter, it makes armv7/arm64 look less weird and has fewer special cases.

I guess I just got lucky when testing this before...

sys/dev/hwpmc/hwpmc_amd.c
439

Should 47 spelled as (pcd_width - 1)?

You are testing for the PMC value being positive 48-bit integer, am I right? I do not quite understand the 'just' word in the comment.

443

This is simultaneously UB (for <<) and IDB (for >>).

At least mips and uncore have the same issue.

sys/dev/hwpmc/hwpmc_amd.c
439

Well the hard-coded 16 just below is 64 - 48; everywhere in this file just hard-codes the width.

And yes; the "just" is to indicate that this is rare, for when it overflows during switch out so no interrupt is taken for it.

443

The UB is a good point, I'll fix amd and the copy I made in mips. The IDB is fine, we can safely assume that, things would be on fire if you didn't have two's complement...

I don't see any similar code in uncore?

sys/dev/hwpmc/hwpmc_amd.c
439

yes, 16 == NBBY * sizeof(int64) - pdc_width, and ought to be fixed, but you did not modified the next block.

Overall it might be useful to do the pass to use pdc_width. On the other hand, AMD seems to go a different route, instead of extending PMC width, they do something called 'mergeevent' which uses two PMC registers for one counter.

443

I do not see that there is a spec that states the result of the shift for two-complement. It can do something silly on 2-c machine anyway.

Yes, uncore uses 1ULL as the shifted value, so all is unsigned and my point does not apply.

sys/dev/hwpmc/hwpmc_amd.c
439

Using pcd_width would diverge from the style of every MD file in hwpmc. Every file either hard-codes the single width everywhere (which may be implicit in the types used) or has its own internal variable tracking the width. The only consumer of pcd_width itself currently is hwpmc_mod, which uses it for two purposes: the initial printf at load time of all available classes, and for PMC_OP_GETCPUINFO. It has no behavioural meaning, solely informational.

443

https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Integers-implementation.html

Clang doesn't document its implementation-defined behaviour as it should (which technically makes it non-conforming) but heavily assumes the same implementation as GCC in its own code base so can't be compiled with any compiler that doesn't follow that, so it's safe to say that we can assume it too, and already do throughout base.

  • Farewell mips
  • Fix existing UB in amd code (no longer copied to mips after the above...)
  • Rework armv7/arm64
jrtc27 retitled this revision from hwpmc: Fix amd/arm64/armv7/mips/uncore sampling overflow race to hwpmc: Fix amd/arm64/armv7/uncore sampling overflow race.Jan 2 2022, 9:55 AM
jrtc27 edited the summary of this revision. (Show Details)
jrtc27 edited the test plan for this revision. (Show Details)
kib added inline comments.
sys/dev/hwpmc/hwpmc_amd.c
443

There should be no space in ) (

This revision is now accepted and ready to land.Jan 2 2022, 5:47 PM
sys/dev/hwpmc/hwpmc_amd.c
443

Ah yes, missed that one when cleaning up the int64_t cast

It is a silly scenario, but what would happen if one requested a reload count exceeding (1 << $pmc_width-1)? Then the negation of the reload count might appear as an overflow.

Yeah, that would break, so "don't do that" I guess. hwpmc_mod.c enforces a lower bound of 1000 for the reload count but doesn't currently enforce a reasonable upper bound; it probably should cap it at 1ULL << (pcd-> pcd_width - 1).