Page MenuHomeFreeBSD

x86: Speed up clock calibration
ClosedPublic

Authored by cperciva on Jan 10 2022, 1:29 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 4:05 PM
Unknown Object (File)
Wed, Jan 8, 9:14 AM
Unknown Object (File)
Mon, Jan 6, 6:20 AM
Unknown Object (File)
Fri, Dec 27, 4:52 PM
Unknown Object (File)
Wed, Dec 25, 5:46 PM
Unknown Object (File)
Wed, Dec 25, 2:14 PM
Unknown Object (File)
Dec 19 2024, 2:38 PM
Unknown Object (File)
Dec 15 2024, 12:19 AM

Details

Summary

Prior to this commit, the TSC and local APIC frequencies were calibrated
at boot time by measuring the clocks before and after a one-second sleep.
This was simple and effective, but had the disadvantage of *requiring a
one-second sleep*.

Rather than making two clock measurements (before and after sleeping) we
now perform many measurements; and rather than simply subtracting the
starting count from the ending count, we calculate a best-fit regression
between the target clock and the reference clock (for which the current
best available timecounter is used). While we do this, we keep track
of an estimate of the uncertainty in the regression slope (aka. the ratio
of clock speeds), and stop measuring when we believe the uncertainty is
less than 1 PPM.

In order to avoid the risk of aliasing resulting from the data-gathering
loop synchronizing with (a multiple of) the frequency of the reference
clock, we add some additional spinning depending upon the iteration number.

For numerical stability and simplicity of implementation, we make use of
floating-point arithmetic for the statistical calculations.

This reduces the FreeBSD kernel boot time on x86 systems by between 1900
and 2000 ms.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/kern/subr_clockcalib.c
43

char *clkname

45

*tc

109

It seems you are using 64bit reads for TSC, which cannot wrap. But LAPIC timer is 32bit and can wrap.

180

Perhaps use ia32_pause() there instead of empty loop?

sys/sys/clockcalib.h
37 ↗(On Diff #101173)

I do not see a need of a new header for the whole single prototype. timetc.h IMO is the right place for clockcalib() proto. May be also add 'tc' somewhere in the name, to indicate that the function works with the timecounter.

sys/x86/x86/tsc.c
714–716

Can't we enter the FPU-enabled section in clockcalib(), rather than forcing all callers to do it?

sys/x86/x86/tsc.c
714–716

This was discussed to the death in some previous review. There is no compiler barrier that prevents it from reordering FPU instructions out of the region.

The separate compilation unit is used as a kind of barrier, and could break with LTO turned on, I suspect.

val_packett.cool added inline comments.
sys/sys/clockcalib.h
37 ↗(On Diff #101173)

In case the header stays, it should at least not break buildworld :) add <sys/types.h> to avoid:

/usr/obj/usr/src/amd64.amd64/tmp/usr/include/sys/clockcalib.h:37:1: error: unknown type name 'uint64_t'
[…]
make[3]: stopped in /usr/src/tools/build/test-includes
sys/kern/subr_clockcalib.c
180

Maybe spelled as cpu_spinwait since this file is not otherwise x86-specific?

sys/x86/x86/local_apic.c
1008

New functions don't have to have the gratuitous blank line here, but if you are matching the style of existing functions in this file, it is ok to stay.

sys/kern/subr_clockcalib.c
45

Thanks, fixed.

109

True, but if we were going to handle this is should be done in local_apic.c. For our purposes it shouldn't matter -- we're counting down from APIC_TIMER_MAX_COUNT and should have finished calibrating long before we hit zero.

180

Probably doesn't matter at the moment considering that it's unlikely we'll have any other non-idle thread running at this point in the boot process -- but sure, might as well include it.

sys/sys/clockcalib.h
37 ↗(On Diff #101173)

Ok, merging into timetc.h. I don't think it makes sense to have tc in the function name though, since it's calibrating results returned by a function pointer, not a timecounter.

kib added inline comments.
sys/kern/subr_clockcalib.c
180

It is not about scheduling, you are in critical section anyway, so context switches are disabled. cpu_spinwait() reduces power consumption of this core.

This revision is now accepted and ready to land.Jan 12 2022, 5:08 AM
sys/kern/subr_clockcalib.c
180

Oh, I was thinking about the effect of PAUSE on hyperthreading, that it reduces the performance impact on the sibling thread. I didn't realize it also reduced power consumption.

I doubt anyone cares about a one-time reduction in power consumption lasting a few microseconds, but there's no reason to not save a few mJ I guess...

This revision was automatically updated to reflect the committed changes.