Page MenuHomeFreeBSD

arm64: use FEAT_WFxT for DELAY() when available
Needs ReviewPublic

Authored by harry.moulton_arm.com on Tue, Jan 21, 3:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 31, 7:36 AM
Unknown Object (File)
Sun, Jan 26, 4:47 PM
Subscribers

Details

Reviewers
andrew
Group Reviewers
arm64
Summary

Use a wfet, rather than a busy wait, in DELAY() when the FEAT_WFxT
extension is available.

Sponsored by: Arm Ltd
Signed-off-by: Harry Moulton <harry.moulton@arm.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 62089
Build 58973: arc lint + arc unit

Event Timeline

sys/arm/arm/generic_timer.c
877–878

This isn't safe when different CPUs have different levels of support. You can use get_kernel_reg to get a common view of the CPUs, however the overhead may be large for a short delay, and the value could change when secondary CPUs are enabled.

You could use the CPU feature framework in D47812 to handle this by adding a CPU_FEAT_SYSTEM check that uses get_kernel_reg to see if all CPUs can use the instructions.

879

What if the CPU receives an exception before the timeout? Won't that mean DELAY would return early?

sys/arm/arm/generic_timer.c
879

The timeout value passed to the wfet and wfit instructions is the value the virtual timer should be when a local wakeup event is generated. This means we'll need to calculate this value. The counts calculation in arm_tmr_do_delay appears to be what we need to find how far into the future to wake up.

If you extract that to a new function you could do something like the following:

counts = <calculate counts>;
start = sc->get_cntxct(false);
end = start + count;
while (counts > 0) {
	wfet(end);
	last = sc->get_cntxct(sc->physical_sys);
	counts -= (int32_t)(last - first);
	first = last;
}
sys/arm/arm/generic_timer.c
877–878

We'd need to add the wfxt_check and wfxt_enable functions. I've done this in machdep.c.
I wasn't sure how to use the cpu_feat framework to check if a feature is enabled on the fly, so I looked at what feat_pan does, added a bool in machdep.c and an is_wfxt_en() function that can be used elsewhere in the kernel. This appears to work okay when testing.

879

What is the reason for putting the wfet() inside the loop, as opposed to just calculating the CNTVCT_EL0 value for the timeout? To avoid the previous comment about an exception accidentally ending the delay early?

sys/arm/arm/generic_timer.c
877–878

You can put them in this file so we can hide the details from the rest of the kernel that doesn't need to know about them. Because it uses a DATA_SET the functions and struct can be static. The DATA_SET macro creates an array in a linker section we can find and iterate over.

You can also look at ptrauth.c. It sets a variable, but most of the details are hidden in a single file (other than elf64_addr_mask).

If you use CPU_FEAT_AFTER_DEV | CPU_FEAT_SYSTEM for the flags the check and enable functions will onlt run on a single CPU, and late enough that other CPUs are online. You can then use get_kernel_reg to read the common view of the ID register. It will only report wfxt support if all CPUs support it making the instruction safe to use when the function may be scheduled on different CPUs.

879

wfet can finish early if the CPU receives an event, e.g. another CPU executes an sev instruction.

Use the CPU feature framework to determine whether the WFxT instructions are available on all CPUs, and set enable_wfxt accordingly. When the timer is used for DELAY(), it'll check whether it can use wfet().

sys/arm/arm/generic_timer.c
841–850

last the last value read in the loop. The above change should be ok to cause wfet to wait with the correct timeout.

885

We need to use get_kernel_reg to get the common view of the ID registers. If one CPU has the extension and another doesn't then we shouldn't enable it as DELAY may be scheduled on either.

Amended arm_tmr_do_delay() to use end for wfet(), used get_kernel_reg() rather than READ_SPECIALREG(), and added CPU_FEAT_AFTER_DEV flag, so support is checked after secondary cpus are brought up.