Page MenuHomeFreeBSD

LinuxKPI: reduce impact of large MAXCPU
ClosedPublic

Authored by bz on Oct 23 2023, 11:23 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 12, 2:20 PM
Unknown Object (File)
Mon, Nov 4, 8:14 PM
Unknown Object (File)
Mon, Nov 4, 8:14 PM
Unknown Object (File)
Mon, Nov 4, 8:14 PM
Unknown Object (File)
Mon, Nov 4, 8:14 PM
Unknown Object (File)
Mon, Nov 4, 8:13 PM
Unknown Object (File)
Mon, Nov 4, 8:11 PM
Unknown Object (File)
Mon, Nov 4, 8:11 PM

Details

Summary

Start scaling arrays dynamically by (mp_maxid + 1) instead of MAXCPU,
resulting in an extra single allocation on startup.

For the static_single_cpu_mask[] additionally allocate the cpuset_t
as well so we can better scale those up too and to allow us to keep
a direct map for low CPU counts while for higher counts switch to sets
of static masks and shift them by _BITSET_BITS accordingly.
This should lower memory usage for MAXCPU=65536 roughly from
512M to 1.5M.

PR: 274316
Sponsored by: The FreeBSD Foundation

Test Plan

Not tested yet. main currently does not compile. Still want to put
it out before calling it a day. Someone review my thinking and my
maths. Can we do it even more simple?

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 54691
Build 51580: arc lint + arc unit

Event Timeline

bz requested review of this revision.Oct 23 2023, 11:23 PM
sys/compat/linuxkpi/common/include/asm/processor.h
45

This should really be an accessor function (but that's unrelated to this change).

sys/compat/linuxkpi/common/src/linux_compat.c
2850

cpuset here should probebly be cpuset_t->__bits[]

Commit message should probably also be more clear that we go from static to dynamically allocated memory, so impact on size if also going.

@emaste @andrew anyone any comments? Anyone else who should review this?

Improve comment as I had pointed out myself.
Would still love someone to review this.

jhb added inline comments.
sys/compat/linuxkpi/common/src/linux_compat.c
2793

Maybe use CPU_FOREACH? That would leave __cpu_data[i] all zeroes for non-existent CPUs (probably more correct)

2840

This works if you use CPU_FOREACH

2842

CPU_FOREACH

2867

This probably can't use CPU_FOREACH due to the packing trick

bz marked 2 inline comments as done.Nov 29 2023, 6:31 PM
bz added a subscriber: rwatson.
bz added inline comments.
sys/compat/linuxkpi/common/src/linux_compat.c
2840

This works if you use CPU_FOREACH

But won't necessarily do the right thing for "ID gaps"?
Or are we no longer doing them (as in we skip)?
We used to have a machine in the netperf cluster (if my memory is not wrong from when @rwatson did netisr2) which had a gap (back then). Simplified example: 0,1,2,3,4,6,7. ncpu=7.
If this all sounds gibberish, I will happily switch to mp_ncpus :)

Address some comments from @jhb; wait for clarification on the last one.

bz marked an inline comment as done.Nov 29 2023, 6:36 PM
sys/compat/linuxkpi/common/src/linux_compat.c
2840

In this case since you only need mp_ncpus masks as you will leave static_single_cpu_mask[x] NULL for absent CPUs and only use one of lcs for each active CPU. One issue is that if CPU 0 is absent you wouldn't free lcs below though. I think it would be more obvious if you just save lcs directly in a global that you free below instead of abusing static_single_cpu_mask[0].

sys/compat/linuxkpi/common/src/linux_compat.c
2840

Doh. Of course. CPU_FOREACH does a CPU_ABSENT chekc. Just have to guard against NULL a bit better then. Ok. Will go and make the changes.

As suggested by @jhb:

  • use mp_ncpus and CPU_FOREACH which will account for absent CPUs.
  • use global static rather than using [0] to free the 2nd allocation

Further:

  • Add KASSERT to guard for absent CPU requests (while this should not happen I would rather have a controlled panic than way further down)
  • switch the last magic loop to CPU_FOREACH(i) as well given all computations for the shift are done based on i and i will be incremented propoerly. So this should also just give us uninitialized static_single_cpu_mask[i] antries like it would for gaps with the mp_ncpus case above now. CPU not there, bitset not set.

Please review carefully again.

bz marked 2 inline comments as done.Nov 29 2023, 11:09 PM

I've pointed out a number of problems.

That said, I've also gone out of my way and drafted a version that allocates only the exact number of necessary bytes. Computations are a bit more involved, but IMHO what really matters are the comments before each one. Here's the patch, to apply on top of your changes in this revision (up to Diff 4):

Even if you choose not to take it in its entirety, I'd recommend that you use the rewritten comments there (with adaptations) and you may also want to use the fix for static_single_cpu_mask_lcs changing in the first branch of the if (regular, non-overlapping case).

sys/compat/linuxkpi/common/src/linux_compat.c
2842

static_single_cpu_mask_lcs must not be changed for the allocated memory to be freed correctly in linux_compat_uninit().

2850

I agree with your theory. :-) But to be more precise, this means _BITSET_BITS times a long too much, i.e., n/8 bytes with n being MAXCPU, so quite negligible.

2854–2862

The body of this loop prepares the long having bit i set in the span of longs that will serve for cpumask i, i+_BITSET_BITS, etc. It should thus be the first in the second full cpumask_t, hence my fix.

2867

This probably can't use CPU_FOREACH due to the packing trick

No problem here, because the pointer is recomputed entirely from static_single_cpu_mask_lcs at each iteration, not relying on any computation performed in the previous iteration.

2867

Same problem as above.

olce removed a subscriber: olce.

Apply the diff as submitted by Olivier Certner.

I made minor adjustments to the updated comments only.

@olce congrats on the commit bit. Is this version (mostly yours now) okay?

In D42345#983201, @bz wrote:

@olce congrats on the commit bit. Is this version (mostly yours now) okay?

Thanks! I'm still struggling from the amount of received mail, so your question was lost for a while (I need to better organize myself for the new situation).

Yes, re-reading it, I think this version's OK, barring some minor nit in a comment (which I introduced, sorry).

But I would also signal that I didn't test this code. It seems that ultimately only some code in sys/contrib/dev/athk/ath11k/pci.c actually uses the array via cpumask_of() and I don't have such hardware. So I'm afraid I'll have to defer to you for the test part, unless there is another tractable way for me?

sys/compat/linuxkpi/common/src/linux_compat.c
2871
In D42345#983608, @olce wrote:
In D42345#983201, @bz wrote:

@olce congrats on the commit bit. Is this version (mostly yours now) okay?

Thanks! I'm still struggling from the amount of received mail, so your question was lost for a while (I need to better organize myself for the new situation).

Yes, re-reading it, I think this version's OK, barring some minor nit in a comment (which I introduced, sorry).

I'll fix that in a minute. Thanks.

But I would also signal that I didn't test this code. It seems that ultimately only some code in sys/contrib/dev/athk/ath11k/pci.c actually uses the array via cpumask_of() and I don't have such hardware. So I'm afraid I'll have to defer to you for the test part, unless there is another tractable way for me?

I wondered if the slight complication for saving the one bitset word was worth it?

I used to use a small C program to print the bitsets out for CPUs 0..n and checking both cases duplicating that code in; I can probably replace that code quickly and see.

In D42345#983610, @bz wrote:

I wondered if the slight complication for saving the one bitset word was worth it?

Not that much, I mainly wanted to see if the code would be much more complex, and it's not (but still a bit more). I find reasoning in bitset's words more natural, YMMV. And now that the code is there...

I used to use a small C program to print the bitsets out for CPUs 0..n and checking both cases duplicating that code in; I can probably replace that code quickly and see.

Would be great. Up to you. Thanks in any case.

Insert "them" as submitted by @olce

bz marked 6 inline comments as done.Dec 21 2023, 10:58 PM

I believe all suggested changes went in with diff from @olce and the last update

This revision was not accepted when it landed; it landed in state Needs Review.Dec 22 2023, 12:24 AM
This revision was automatically updated to reflect the committed changes.