Page MenuHomeFreeBSD

vm_page: iterators in alloc_contig_domain
ClosedPublic

Authored by dougm on Oct 10 2024, 9:05 AM.
Tags
None
Referenced Files
F108472251: D47036.id144616.diff
Sat, Jan 25, 6:00 AM
Unknown Object (File)
Thu, Jan 16, 6:19 PM
Unknown Object (File)
Thu, Jan 16, 4:27 AM
Unknown Object (File)
Fri, Jan 10, 2:25 PM
Unknown Object (File)
Fri, Jan 10, 2:06 PM
Unknown Object (File)
Fri, Jan 10, 2:02 PM
Unknown Object (File)
Fri, Jan 10, 1:51 PM
Unknown Object (File)
Fri, Jan 10, 10:20 AM
Subscribers

Details

Summary

Restructure a bit of code to allow vm_page_alloc_contig_domain to use pctrie iterators for lookup and insertion into the object radix tree.

This change depends on D47021.

Test Plan

Instrumented the code to count calls and cycles, and discovered that all the calls to this function appear to be at boot time. Calls were 19440 for each, and cycles improved from 8529425 to 7962437, a 6.65% reduction in cycles.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm requested review of this revision.Oct 10 2024, 9:05 AM
dougm created this revision.

Add definition of vm_radix_iter_lookup_le. I thought I had added it to a vm_radix patch just committed, but apparently I had not.

Set page fields before insertion, and clear them when insertion fails.

Define functions to bundle prepare, insert, cleanup and finish.

sys/vm/vm_page.c
1530–1532

With this version, I see a 5.3% reduction in alloc_contig cycles. If you restore passing mpred, rather than recomputing it, I see a 7.7% reduction.

dougm marked an inline comment as done.

Restore passing mpred to vm_page_iter_insert.

A bigger patch that uses iterators for vm_page_alloc_after and the functions it calls. Changes to vm_page_alloc are outside the range of this patch.

Add and user lookup_iter_lt. Looking for the predecessor by starting by looking for the place where something will be inserted is a waste. Instead, start looking just before where something will be inserted; there's a change you'll find something there.

Can't switch from _le to _lt in grab_pages.

Do a bit of work around vm_page_grab.

D47207 has reduced the average number of cycles to perform vm_page_alloc() from 1295 to 1265.

Switching from ..._lookup_le() to ..._lookup_lt() increased the average number of cycles in vm_page_alloc().

Drop ..._iter_lookup_lt().

Tweak the first loop in pctrie_iter_lookup_le to make it 2 whole bytes smaller. And, for the _ge version, one byte smaller.

Incorporating D47277 yielded the lowest average cycles in vm_page_alloc_contig() that I've seen.

Incorporating D47277 has increased the average number of cycles to perform vm_page_alloc() to 1325.

That change should have had no cost, so the real takeaway is that small perturbations to the code layout can affect the average by at least 60 cycles.

alc accepted this revision.EditedNov 16 2024, 7:05 PM

Cycles to perform a 2MB aligned vm_page_alloc_contig() for shm_create_largepage() on a Ryzen 5900X:

x base
+ iter
+------------------------------------------------------------------------------+
|  +                                                             x             |
|  ++                                                            x             |
|  ++                                                           xx             |
| ++++        +                                                xxx xx         x|
||_MA___|                                                     |__MA____|       |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10         16813         17866       16989.5       17074.6     294.94715
+  10         12691         13496         12759       12833.1     239.58643
Difference at 95.0% confidence
        -4241.5 +/- 252.466
        -24.841% +/- 1.2701%
        (Student's t, pooled s = 268.696)
This revision is now accepted and ready to land.Nov 16 2024, 7:05 PM
This revision was automatically updated to reflect the committed changes.