There is no universal way to find the TSC frequency. Newer Intel CPUs
may report it via CPUID leaves 0x15 and 0x16. Sometimes it can be
obtained from the PLATFORM_INFO MSR as well, though we never use that.
On older platforms we derive the frequency using a DELAY(1000000) call,
which uses the 8254 PIT. On some newer platforms the 8254 is apparently
non-functional, leading to bogus calibration results. On such platforms
the TSC frequency must be available from CPUID. It is also possible to
disable calibration with a tunable, in which case we try to parse the
brand string if the TSC freq is not available from CPUID.
CPUID 0x15 provides an authoritative TSC frequency value, but even that
is not always available on new Intel platforms. CPUID 0x16 provides the
specified processor base frequency, which is not the same as the TSC
frequency. Empirically, it is close enough for early boot, but too far
off for timekeeping: on a Comet Lake NUC, CPUID 0x16 yields 1600MHz but
the TSC frequency is rougly 1608MHz, leading to frequent clock stepping.
Thus we have a situation where we cannot calibrate using the PIT and
cannot obtain a precise frequency from CPUID (or MSRs). This change
seeks to address that by using the CPUID 0x16 value during early boot
and refining the calibration later once ACPI-based timecounters are
available. TSC frequency detection is thus split into two phases:
Early phase:
- On Intel platforms, query CPUID 0x15 and 0x16 and use that value
initially if available.
- Otherwise, get an estimate using the PIT, reducing the delay loop to
100ms from 1s. I can't see any reason to have such a long loop and it
does not significantly change the calculated frequency on systems that
I have access to.
- Continue to register the TSC as the CPU ticks provider early, even
though the frequency may be off. Otherwise any code executed during
boot that uses cpu_ticks() (e.g., context switching) gets tripped up
when the ticks provider changes.
Later phase:
- In SI_SUB_CLOCKS, once the timehands are initialized, sample the
current timecounter and TSC values, and sample them again later using
a callout. Use the frequency of the selected timecounter (should be
HPET or ACPI PM timer) to derive the TSC frequency.
- Update the TSC timecounter, global tsc_freq and CPU ticker with the
new frequency and finally register the TSC as a timecounter.
TODO:
- kib suggested extending the core timecounter code to kick off deferred
calibration as soon as a high-quality source is registered, rather
than waiting until SI_SUB_CLOCKS. Though, one additional complication
here is that we also perform multi-core TSC synchronization at some
point, and we should be careful to ensure that that doesn't happen
during calibration.
- I removed parsing of the CPUID brand string. Perhaps it should be
restored and enabled by a tunable.
- Some kind of sanity checks of the early and late calibration results
should be added, but I'm not certain how to do it.