Page MenuHomeFreeBSD

vm_grab_pages_unlocked: read all the pages at once
ClosedPublic

Authored by dougm on Oct 15 2024, 8:19 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 29, 6:46 AM
Unknown Object (File)
Tue, Apr 29, 1:12 AM
Unknown Object (File)
Sat, Apr 26, 7:50 PM
Unknown Object (File)
Fri, Apr 25, 6:22 PM
Unknown Object (File)
Mon, Apr 14, 5:21 AM
Unknown Object (File)
Mar 23 2025, 6:23 PM
Unknown Object (File)
Mar 18 2025, 8:25 PM
Unknown Object (File)
Mar 1 2025, 6:48 PM
Subscribers

Details

Summary

Define a function pctrie_lookup_range_unlocked, which looks up
consecutive elements in the pctrie, until count limit is reached
or an item is discovered missing, and fills them in the first elements
of an array, returning the count of items read. It does not require a lock.
It uses an iterator, strictly between smr_enter and smr_exit, when some of
the nodes in the pctrie on entry may come to have only one child, but none
of them can be freed before exit.

Define a vm_radix interface to it for reading page ranges.

Change vm_page_grab_pages_unlocked to read all requested pages at
once, without relying on the existence of a linked list of pages.

Drop smr arguments from pctrie_iter_stride, since there's no smr version in use.

Test Plan

A kernel was modified to keep stats on the function vm_page_grab_pages_unlocked: number calls, number of cycles spent, and number of pages returned.

Here are stats from running the sendfile17.sh stress test once, without and with this change in place.
Without:

vm.stats.grab_count: 96475		1.2148055807393976 count/call
vm.stats.grab_cycles: 755609782		9514.578699506397  cycles/call
vm.stats.grab_calls: 79416

With:

vm.stats.grab_count: 1026495		1.0169458782036676 count/call
vm.stats.grab_cycles: 4618535619	4575.571007241998  cycles/call
vm.stats.grab_calls: 1009390

Numbers for running multiple consecutive buildworlds:

original                                modified
Thu Oct 17 15:05:38 CDT 2024            Thu Oct 17 21:48:24 CDT 2024
56807.687u 1420.590s 1:03:23.32         56790.190u 1411.922s 1:03:22.04
vm.stats.grab_count: 4876020            vm.stats.grab_count: 4837437
vm.stats.grab_cycles: 4143819181        vm.stats.grab_cycles: 3659615758
vm.stats.grab_calls: 818112             vm.stats.grab_calls: 812842
Thu Oct 17 16:09:03 CDT 2024            Thu Oct 17 22:51:48 CDT 2024
56862.789u 1462.502s 1:03:25.71         56794.394u 1451.880s 1:03:20.05
vm.stats.grab_count: 4645138            vm.stats.grab_count: 4656055
vm.stats.grab_cycles: 4468458476        vm.stats.grab_cycles: 4032884301
vm.stats.grab_calls: 763546             vm.stats.grab_calls: 764337
Thu Oct 17 17:12:31 CDT 2024            Thu Oct 17 23:55:10 CDT 2024
56975.406u 1496.538s 1:03:30.53         56888.537u 1483.686s 1:03:28.61
vm.stats.grab_count: 4614480            vm.stats.grab_count: 4682859
vm.stats.grab_cycles: 3330967328        vm.stats.grab_cycles: 4104409778
vm.stats.grab_calls: 759291             vm.stats.grab_calls: 767468
Thu Oct 17 18:16:03 CDT 2024            Fri Oct 18 00:58:41 CDT 2024
56924.365u 1493.537s 1:03:30.28         56833.735u 1484.556s 1:03:27.67
vm.stats.grab_count: 4608861            vm.stats.grab_count: 4617088
vm.stats.grab_cycles: 3619200880        vm.stats.grab_cycles: 4221455097
vm.stats.grab_calls: 758430             vm.stats.grab_calls: 758624
Thu Oct 17 19:19:35 CDT 2024            Fri Oct 18 02:02:10 CDT 2024
56959.930u 1549.681s 1:03:30.11         56917.831u 1547.128s 1:03:32.56
vm.stats.grab_count: 4684897            vm.stats.grab_count: 4831921
vm.stats.grab_cycles: 3699821068        vm.stats.grab_cycles: 4117332546
vm.stats.grab_calls: 768036             vm.stats.grab_calls: 866469
Thu Oct 17 20:23:07 CDT 2024
57043.872u 1542.574s 1:03:38.57
vm.stats.grab_count: 4652010
vm.stats.grab_cycles: 5045539558
vm.stats.grab_calls: 764080

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm requested review of this revision.Oct 15 2024, 8:19 AM
dougm created this revision.
sys/vm/vm_page.c
5300–5301

Is this assignment needed?

sys/vm/vm_page.c
5300–5301

vm_page_acquire_unlocked might reread the page and get a different value. So, yes.

sys/vm/vm_page.c
5300–5301

Then a comment would be useful there IMO, literally what you wrote above.

Add a comment in grab_pages_unlocked.

kib added inline comments.
sys/vm/vm_page.c
5299
This revision is now accepted and ready to land.Oct 16 2024, 10:38 PM
dougm added a subscriber: pho.

Apply style fixes. Add comment about the semantics of unlocked fetching of multiple pages.

Request testing from @pho

This revision now requires review to proceed.Sat, Apr 26, 6:19 PM

The changes to vm_page.c somehow disappear from the latest revision. Restore them.

markj added inline comments.
sys/kern/subr_pctrie.c
626

I believe that when we find the leaf node which contains the item at index, that leaf node potentially contains items from index to roundup2(index + 1, PCTRIE_WIDTH) - 1. If that's true, then we could have an inner loop here which just copies entries from the current leaf node, rather than repeating the checks in _pctrie_iter_lookup() for each entry. Might that be a worthwhile optimization?

This revision is now accepted and ready to land.Sun, Apr 27, 2:07 PM
sys/kern/subr_pctrie.c
626

To be precise, a leaf node has no children, so pages are leaf nodes. But a level 0 internal node has the properties you describe.

We could inline _pctrie_iter_lookup, to avoid updating it->index with every iteration.

If, after calling _pctrie_lookup_node, it->node->pn_clev == 0, then we could just walk through the elements of it->node->pn_child until we reached the end of the array, or the size limit imposed by the caller, or encountered a null leaf.

Eliminate _pctrie_lookup(). Let lookup_range_unlocked go straight to _pctrie_lookup_node. Don't bother creating an iterator; just use a parent node pointer.

This revision now requires review to proceed.Sun, Apr 27, 8:14 PM

Handle bottom-level nodes in lookup_range_unlocked without using lookup_node and the checking that comes with, in hopes of improving performance.

This revision is now accepted and ready to land.Mon, Apr 28, 12:32 AM
sys/sys/pctrie.h
91

Why do you need this array?

sys/sys/pctrie.h
91

I need a pointer of this type to pass to pctrie_lookup_range_unlocked().

Do you recommend that, instead, I define

uint64_t *data = (uint64_t *)value;

?

sys/sys/pctrie.h
91

Probably **, not *.

This revision now requires review to proceed.Mon, Apr 28, 7:20 PM

Avoid null checks in transforming pointers.

This revision is now accepted and ready to land.Tue, Apr 29, 6:16 AM

I ran tests with D47114.154425.patch for 5 hours without seeing any issues.