Page MenuHomeFreeBSD

ofw_cpu: check for "disabled" status
ClosedPublic

Authored by mhorne on Dec 17 2024, 8:06 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 10, 9:09 PM
Unknown Object (File)
Fri, Jan 10, 9:00 PM
Unknown Object (File)
Tue, Jan 7, 7:28 AM
Unknown Object (File)
Wed, Jan 1, 3:53 AM
Unknown Object (File)
Wed, Jan 1, 3:52 AM
Unknown Object (File)
Wed, Jan 1, 3:50 AM
Unknown Object (File)
Dec 19 2024, 6:08 PM

Details

Summary

If a CPU is marked disabled, we should not create a CPU device for it.

This is present in some RISC-V device trees which contain a "monitor
core" with limited functionality (no MMU). We don't run the kernel
on these, and in early CPU start-up code we skip them, and they have no
impact on mp_ncpu. Why this is included in the device-tree at all is
beyond me. It seems the new trend is to mark these as "disabled", so
let's respect this in the ofw_cpu probe routine.

This is generally harmless, but I noticed an impact when attempting to
attach the cpufreq_dt driver. It counted more OFW CPU devices than
logical CPUs (mp_ncpus), and therefore failed to attach for logical CPU
3 (RISC-V hart 4). With this change, all is well, and we don't get a
bogus device.

No functional change when the property is not present at all.

Test Plan

Verbose boot before:

cpulist0: <Open Firmware CPU Group> on ofwbus0
cpu0: <Open Firmware CPU> on cpulist0
cpu0: missing 'clock-frequency' property
cpu1: <Open Firmware CPU> on cpulist0
cpu1: Nominal frequency 999Mhz
cpufreq_dt1: <Generic cpufreq driver> on cpu1
cpufreq_dt1: 375.000 Mhz (0 uV)
cpufreq_dt1: 500.000 Mhz (0 uV)
cpufreq_dt1: 750.000 Mhz (0 uV)
cpufreq_dt1: 1500.000 Mhz (0 uV)
cpufreq1: <CPU frequency control> on cpu1
cpu2: <Open Firmware CPU> on cpulist0
cpu2: Nominal frequency 999Mhz
cpufreq_dt2: <Generic cpufreq driver> on cpu2
cpufreq_dt2: 375.000 Mhz (0 uV)
cpufreq_dt2: 500.000 Mhz (0 uV)
cpufreq_dt2: 750.000 Mhz (0 uV)
cpufreq_dt2: 1500.000 Mhz (0 uV)
cpufreq2: <CPU frequency control> on cpu2
cpu3: <Open Firmware CPU> on cpulist0
cpu3: Nominal frequency 999Mhz
cpufreq_dt3: <Generic cpufreq driver> on cpu3
cpufreq_dt3: 375.000 Mhz (0 uV)
cpufreq_dt3: 500.000 Mhz (0 uV)
cpufreq_dt3: 750.000 Mhz (0 uV)
cpufreq_dt3: 1500.000 Mhz (0 uV)
cpufreq3: <CPU frequency control> on cpu3
cpu4: Nominal frequency 999Mhz
cpufreq_dt4: <Generic cpufreq driver> on cpu4
cpufreq_dt4: Not attaching as cpu is not present
device_attach: cpufreq_dt4 attach returned 6

After:

cpulist0: <Open Firmware CPU Group> on ofwbus0
cpu0: <Open Firmware CPU> on cpulist0
cpu0: Nominal frequency 999Mhz
cpufreq_dt0: <Generic cpufreq driver> on cpu0
cpufreq_dt0: 375.000 Mhz (0 uV)
cpufreq_dt0: 500.000 Mhz (0 uV)
cpufreq_dt0: 750.000 Mhz (0 uV)
cpufreq_dt0: 1500.000 Mhz (0 uV)
cpufreq0: <CPU frequency control> on cpu0
cpu1: <Open Firmware CPU> on cpulist0
cpu1: Nominal frequency 999Mhz
cpufreq_dt1: <Generic cpufreq driver> on cpu1
cpufreq_dt1: 375.000 Mhz (0 uV)
cpufreq_dt1: 500.000 Mhz (0 uV)
cpufreq_dt1: 750.000 Mhz (0 uV)
cpufreq_dt1: 1500.000 Mhz (0 uV)
cpufreq1: <CPU frequency control> on cpu1
cpu2: <Open Firmware CPU> on cpulist0
cpu2: Nominal frequency 999Mhz
cpufreq_dt2: <Generic cpufreq driver> on cpu2
cpufreq_dt2: 375.000 Mhz (0 uV)
cpufreq_dt2: 500.000 Mhz (0 uV)
cpufreq_dt2: 750.000 Mhz (0 uV)
cpufreq_dt2: 1500.000 Mhz (0 uV)
cpufreq2: <CPU frequency control> on cpu2
cpu3: <Open Firmware CPU> on cpulist0
cpu3: Nominal frequency 999Mhz
cpufreq_dt3: <Generic cpufreq driver> on cpu3
cpufreq_dt3: 375.000 Mhz (0 uV)
cpufreq_dt3: 500.000 Mhz (0 uV)
cpufreq_dt3: 750.000 Mhz (0 uV)
cpufreq_dt3: 1500.000 Mhz (0 uV)
cpufreq3: <CPU frequency control> on cpu3

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jrtc27 added inline comments.
sys/dev/ofw/ofw_cpu.c
211

ofw_bus_status_okay(dev)?

221

Hm, wouldn't this make setting hw.ncpu=1 to disable SMP (or running a UP kernel) rather noisy?

sys/dev/ofw/ofw_cpu.c
211

status is not a standard property for CPUs, unlike other devices. So in the relevant cases the "normal" CPUs have nothing.

221

Indeed. I'll just drop it.

sys/dev/ofw/ofw_cpu.c
211

Oh hang on, I misread the implementation of ofw_bus_status_okay() the first time. Indeed this can be used.

Use ofw_bus_status_okay().

sys/dev/ofw/ofw_cpu.c
211

Fwiw, I've wanted to move this check up into the bus for years

sys/dev/ofw/ofw_cpu.c
211

Yeah, lacking status should give 1 not 0. Though the current implementation is buggy for the len == 3 case, doesn't do == 0 on the bcmp so it's inverted (i.e. status = "no" / "aa" / ... looks like it reports enabled, and status = "ok" reports disabled).

sys/dev/ofw/ofw_cpu.c
211

Oh, just ofw_bus_node_status_okay is buggy, ofw_bus_status_okay is fine.

I don't think we can do this on a CPU node. I think @cognet had some hardware where non-boot CPUs had status = "disabled" in the dtb so committed rG21ce594e7aa396a9befb6f0a2b25aab263023011.

Add "enable-methods" property to the check. According to my reading of the
spec, and our code, this should satisfy all cases.

Merge the status and enable-method checks into a helper function.

This revision is now accepted and ready to land.Fri, Jan 10, 6:33 PM

I was going to suggest using the device_t-taking APIs instead, but then realised that doesn't work for the early case... bit sad but oh well.