Restructure a bit of code to allow vm_page_alloc_contig_domain to use pctrie iterators for lookup and insertion into the object radix tree.
This change depends on D47021.
Differential D47036
vm_page: iterators in alloc_contig_domain dougm on Oct 10 2024, 9:05 AM. Authored by Tags None Referenced Files
Subscribers
Details
Restructure a bit of code to allow vm_page_alloc_contig_domain to use pctrie iterators for lookup and insertion into the object radix tree. This change depends on D47021. Instrumented the code to count calls and cycles, and discovered that all the calls to this function appear to be at boot time. Calls were 19440 for each, and cycles improved from 8529425 to 7962437, a 6.65% reduction in cycles.
Diff Detail
Event TimelineComment Actions Add definition of vm_radix_iter_lookup_le. I thought I had added it to a vm_radix patch just committed, but apparently I had not.
Comment Actions A bigger patch that uses iterators for vm_page_alloc_after and the functions it calls. Changes to vm_page_alloc are outside the range of this patch. Comment Actions Add and user lookup_iter_lt. Looking for the predecessor by starting by looking for the place where something will be inserted is a waste. Instead, start looking just before where something will be inserted; there's a change you'll find something there. Comment Actions D47207 has reduced the average number of cycles to perform vm_page_alloc() from 1295 to 1265. Comment Actions Switching from ..._lookup_le() to ..._lookup_lt() increased the average number of cycles in vm_page_alloc(). Comment Actions Tweak the first loop in pctrie_iter_lookup_le to make it 2 whole bytes smaller. And, for the _ge version, one byte smaller. Comment Actions Incorporating D47277 yielded the lowest average cycles in vm_page_alloc_contig() that I've seen. Comment Actions Incorporating D47277 has increased the average number of cycles to perform vm_page_alloc() to 1325. That change should have had no cost, so the real takeaway is that small perturbations to the code layout can affect the average by at least 60 cycles. Comment Actions Cycles to perform a 2MB aligned vm_page_alloc_contig() for shm_create_largepage() on a Ryzen 5900X: x base + iter +------------------------------------------------------------------------------+ | + x | | ++ x | | ++ xx | | ++++ + xxx xx x| ||_MA___| |__MA____| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 16813 17866 16989.5 17074.6 294.94715 + 10 12691 13496 12759 12833.1 239.58643 Difference at 95.0% confidence -4241.5 +/- 252.466 -24.841% +/- 1.2701% (Student's t, pooled s = 268.696) |