Page MenuHomeFreeBSD

(umaperf 2/7) Store the zone flags and size in the uma per-cpu cache area.
ClosedPublic

Authored by jeff on Dec 15 2019, 11:36 PM.
Tags
None
Referenced Files
F107628280: D22826.diff
Thu, Jan 16, 9:48 PM
Unknown Object (File)
Tue, Jan 14, 8:31 AM
Unknown Object (File)
Sat, Jan 11, 7:35 AM
Unknown Object (File)
Thu, Dec 19, 12:22 AM
Unknown Object (File)
Thu, Dec 19, 12:20 AM
Unknown Object (File)
Dec 11 2024, 4:21 AM
Unknown Object (File)
Dec 10 2024, 2:03 PM
Unknown Object (File)
Nov 5 2024, 2:44 PM
Subscribers

Details

Summary

This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.

This patch caches the zone size and flags in the per-cpu caches and adds a few flags so features can be tested without touching uma_zone. In a normal fast path allocation we will now only touch the cacheline for the per-cpu area and the bucket we're popping as a result.

The per-cpu cache buckets have 32bit of padding in them. It is slightly gross to use this but if I don't the per-cpu cache size will exceed 64 bytes which I would very much like to avoid. The limit code could be improved by setting and clearing the flag when we reach and drop below the limit, probably with some timeout to clear to limit hysteresis.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 28284
Build 26394: arc lint + arc unit

Event Timeline

jeff added reviewers: mav, markj, rlibby, glebius, gallatin.
jeff set the repository for this revision to rS FreeBSD src repository - subversion.
markj added inline comments.
sys/vm/uma_core.c
2186

Maybe zone_update_cached_attrs() or something like that instead?

2190

Why not CPU_FOREACH()?

2906

I don't see why the added ifdefs are needed?

3539

The two if-statements can be flattened.

sys/vm/uma_int.h
224

You might consider adding a

struct uma_cached_attrs {
    uint32_t flags;
    uint32_t size;
};

and write your accessors in terms of that. Then you would need fewer accessors and the naming would be clearer. It would also be easier to extend this to make use of the third spare word in the cross bucket, if you ever want to do that. I see that in uma_zfree_arg() we always load the flags but only load the size in a rare branch; given that both words belong to the same cache line, I am not sure that it would be harmful to always load both into the attr struct above. And, in other places we always load both words anyway.

253

Extra newline.

This revision is now accepted and ready to land.Dec 16 2019, 4:37 PM
sys/vm/uma_int.h
479

PRINT_UMA_FLAGS below should be updated.

sys/vm/uma_core.c
2190

At one point there were some startup issues that meant cpus were marked absent until later in the boot. I'm not sure what the situation is now. However, this construction lends itself to hotplug by initializing everything.

We should possibly make a CPU_FOREACH_ALL() or something.

sys/vm/uma_core.c
2906

In some cases if you if (condition) KASSERT the compiler has generated some extra branch etc. It should not here.

Partially it just gives me visual clarity that nothing in this huge top block is doing anything on release builds. I can look at the assembly and see if it's required but I like it.

Looks good.

An alternative to relying on structure padding would be to use macro name pasting magic for your accessors. More explicit about layout, but less readable code. I think using the structure padding is okay.

Nits and a question inline.

sys/vm/uma_core.c
2851–2852

Long line...

2947–2953

Long lines...

3669

We don't care about this one? Because we're about to call bucket_alloc(zone,...) anyway? And similar below?

This revision now requires review to proceed.Dec 21 2019, 8:46 PM
sys/vm/uma_core.c
3539

Flattening them would tell the compiler to test both conditions but place the jmp in such a way that the processor won't static predict it true and keep it away from the main function text. This means we'll potentially speculatively touch both lines.

Ordering it this way means that uz_sleepers is only accessed if the unlikely branch tests true.

I have not verified the assembly but I believe this should produce better results.

3669

Yes we are basically into the slow layer so we're touching a bunch of zone fields anyway.

This revision is now accepted and ready to land.Dec 22 2019, 9:57 PM
markj added inline comments.
sys/vm/uma_core.c
3539

I'd be surprised if the compiler treated the above any differently from:

if (__predict_false(uz_flags & UMA_ZFLAG_LIMIT) && zone->uz_sleepers > 0)
    ...

but stranger things have happened.