Details

Reviewers

alc
markj
kib
andrew
manu

Summary

The memq field in vm_object is used to maintain a list of mapped pages, sorted by pindex. Remove that field, and stop maintaining the list.
Where a predecessor page is passed to a function solely to help with insertion into the memq list, drop that parameter. Rename functions that those arguments from *_after() to *.iter().

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

dougm created this revision.Dec 9 2024, 8:06 PM

Herald added a reviewer: andrew. · View Herald TranscriptDec 9 2024, 8:06 PM

Herald added a reviewer: andrew. · View Herald Transcript

Herald added a reviewer: manu. · View Herald Transcript

Herald added subscribers: riscv, linuxkpi, emaste, jhibbits. · View Herald Transcript

dougm requested review of this revision.Dec 9 2024, 8:06 PM

A code reformat to (slightly) reduce the diff size.

Define and use VM_RADIX_FOREACH macros. Initialize a couple of iterators before acquiring read locks.

markj added inline comments.Jan 6 2025, 2:58 PM

sys/amd64/sgx/sgx.c
396	Rather than lifting out the vm_page_remove() calls, perhaps pass the iterator to sgx_page_remove() as well, and have it decide whether to call vm_page_remove() or vm_page_iter_remove() based on whether the iterator pointer is NULL.
sys/vm/vm_glue.c
625	Do you have some other uses for NOCREAT_LT planned? This function is called rarely (kernel stacks are cached by UMA and reused many times without going through this function), so the extra complexity feels a bit unnecessary.
sys/vm/vm_page.h
224	Not quite related to this change, but once it lands, I'd like to add one 8-byte field to `plinks`. I believe that'd be enough to let us start embedding UMA slab headers (i.e. uma_slab_t) within the vm_page structure. This is an alternative approach to D22768.
sys/vm/vm_radix.h
272	Bikeshedding, but VM_RADIX_FOREACH_FROM reads a bit more naturally to me, and we already have TAILQ_FOREACH_FROM etc..
sys/vm/vm_reserv.c
528	This is more expensive, but at some point I presume it should be possible to pass an interator down from the page allocator and use that instead of doing a full lookup here.

dougm marked 2 inline comments as done.Jan 6 2025, 7:26 PM

dougm added inline comments.

sys/vm/vm_glue.c
625	I have no other uses planned. I was just looking for a way to avoid a redundant lookup.
sys/vm/vm_radix.h
272	I don't object to using FROM, but the reason I didn't is that FOREACH_FROM takes two arguments, and this macro takes three. FOREACH_FROM assumes that the lookup has already happened and FOREACH_START has to do the lookup. Those differences are the reason that I chose to use a distinct name. But it doesn't matter much to me.
sys/vm/vm_reserv.c
528	I agree that it would be possible.

dougm updated this revision to Diff 148834.Jan 6 2025, 8:54 PM

dougm marked an inline comment as done.

Doug, how would you like to proceed with the patch? Since quite a few pieces of it are independent, I imagine they can be reviewed and peeled off one by one, especially in areas that aren't performance-critical.

Are there still some concerns about the overhead of switching to iterators in some of the already-committed changes? I've lost track a bit, I'm sorry.

sys/kern/kern_kcov.c
411	Can't we use vm_radix_iter_next() here?
sys/kern/subr_pctrie.c
550	This can presumably be committed on its own. It looks reasonable to me.
sys/x86/iommu/amd_idpgtbl.c
123	The assertion can come first, as in dmar_domain_free_pgtbl().
sys/x86/iommu/intel_idpgtbl.c
266–268	Elsewhere you are careful to initialize the iterator outside the scope of the object lock, but not here.
268	Does it make sense to have a macro for this usage too? Perhaps we could use VM_RADIX_FOREACH and assert that pindices are consecutive?

In D48007#1102734, @markj wrote:

Doug, how would you like to proceed with the patch? Since quite a few pieces of it are independent, I imagine they can be reviewed and peeled off one by one, especially in areas that aren't performance-critical.

Some of them have been reviewed separately, and approved by @kib, but blocked by comments from @alc. His comment on D47718 is why I stopped creeping along a bit at a time and threw it all out for consideration at once. I expect that his approval depends on some kind of performance testing. He's been drawn away from BSD considerations by other life events lately.

Are there still some concerns about the overhead of switching to iterators in some of the already-committed changes? I've lost track a bit, I'm sorry.

@alc has concerns. He's reported performance worsened a bit by some of the page management changes already made, and worsened a bit by some other changes that have been under review for a while.

sys/x86/iommu/intel_idpgtbl.c
268	I don't know. I'll add a VM_RADIX_FORALL macro for this purpose, and you can decide how you feel about it.

Use VM_RADIX_FORALL_FROM one more place.

Use VM_RADIX_FORALL in kcov_free.

Abandon VM_ALLOC_NOCREAT_LT.

In D48007#1103103, @dougm wrote:

In D48007#1102734, @markj wrote:

Are there still some concerns about the overhead of switching to iterators in some of the already-committed changes? I've lost track a bit, I'm sorry.

@alc has concerns. He's reported performance worsened a bit by some of the page management changes already made, and worsened a bit by some other changes that have been under review for a while.

Do we have a benchmark to experiment with, either a workload to test the kernel or a standalone userspace program embedding subr_pctrie.o? I wrote a patch which tries to add a fast path to pctrie_iter_step/next, wherein pctrie.h inlines iteration when the top-most node in the iterator contains a run of successive leaf values, and falls back to pctrie_iter_lookup_ge() to look up new nodes or handle holes. So far I've only tested it to check correctness, but I wonder if it would help performance in the case where a VM object is densely populated.

One other performance-related detail I'm wondering about is, we check the iterator limit twice when stepping: once before looking up a leaf, and once after finding one. I believe the latter check is sufficient for correctness. If so, are we making the right tradeoff in checking twice instead of once?

sys/amd64/amd64/pmap.c
7710	I don't have a concrete suggestion to offer at the moment, but here we're doing some unnecessary work: vm_map_pmap_enter(), the sole caller of this function, uses an iterator to find a contiguous run of pindices that correspond to a run of valid pages; then, it passes the bounds of this run to pmap_enter_object(), which uses a separate iterator to re-lookup each page in the run. That seems redundant, and we should perhaps rework the pmap_enter_object() interface to avoid it. One possibility would be modifying vm_map_pmap_enter() to store a fixed-size array of vm_page pointers on the stack, which gets passed to pmap_enter_object() once full or the end of a run is detected or a superpage is detected.

In vm_map.c, have page index jump ahead with a superpage jump-ahead. Initialize the cursor before acquiring a read lock.

dougm added inline comments.Jan 14 2025, 12:03 AM

sys/amd64/amd64/pmap.c
7710	How about if vm_map_pmap_enter stops trying to find a max page range between calls to pmap_enter_object, and instead just calls it every time vm_page_all_valid is true, passing along the start and psind vals? The pmap_lock would be acquired once, before the loop in vm_map_pmap_enter, and released once, and pmap_enter_object wouldn't have a loop at all. Since I don't know what the rwlock is about, I don't know how the potential extra locking/unlocking of the rwlock would affect things.

Undo the parts of the last change about moving the iterator initialization.

In D48007#1105192, @markj wrote:

In D48007#1103103, @dougm wrote:

In D48007#1102734, @markj wrote:

Are there still some concerns about the overhead of switching to iterators in some of the already-committed changes? I've lost track a bit, I'm sorry.

@alc has concerns. He's reported performance worsened a bit by some of the page management changes already made, and worsened a bit by some other changes that have been under review for a while.

Do we have a benchmark to experiment with, either a workload to test the kernel or a standalone userspace program embedding subr_pctrie.o? I wrote a patch which tries to add a fast path to pctrie_iter_step/next, wherein pctrie.h inlines iteration when the top-most node in the iterator contains a run of successive leaf values, and falls back to pctrie_iter_lookup_ge() to look up new nodes or handle holes. So far I've only tested it to check correctness, but I wonder if it would help performance in the case where a VM object is densely populated.

I've just been running buildworld tests. I wrote a patch (a part of D47680) that tries to have the next lookup execute straightline code in the common case.

One other performance-related detail I'm wondering about is, we check the iterator limit twice when stepping: once before looking up a leaf, and once after finding one. I believe the latter check is sufficient for correctness. If so, are we making the right tradeoff in checking twice instead of once?

@alc has suggested that we could modify pctrie nodes to maintain a parent pointer and avoid the iterator array.

In D48007#1105192, @markj wrote:

One other performance-related detail I'm wondering about is, we check the iterator limit twice when stepping: once before looking up a leaf, and once after finding one. I believe the latter check is sufficient for correctness. If so, are we making the right tradeoff in checking twice instead of once?

I haven't decoded this yet. Could you provide a bit more detail?

sys/amd64/amd64/pmap.c
7710	Cancel the previous idea. Can we use p->plinks.s to build a list of the valid pages, and pass that list to pmap_enter_object to avoid the second iterator? Using an array leads to a question of how big to make the array, where using plinks.s avoids that - assuming that someone else isn't using plinks for something while vm_map_pmap_enter is happening. I don't know whether I can safely assume that.

markj added inline comments.Jan 16 2025, 2:05 PM

sys/amd64/amd64/pmap.c
7710	At the moment, we cannot use plinks.s: it's overlaid with plinks.q, the fields used to link pages together in the global page queues. The pages mapped here will in general be managed (which among other things means that they are reclaimable and thus are kept in these page queues), so plinks is already in use. That said, in this diff you remove two pointer-sized fields from struct vm_page, and I would later like to add one back for use in UMA (the kernel memory allocator). We could make use of such a field here: that would not overlap with UMA's use of it.

Rebase.

dougm updated this revision to Diff 154481.Tue, Apr 29, 4:56 PM

dougm edited the summary of this revision. (Show Details)

markj added inline comments.Tue, Apr 29, 5:09 PM

sys/vm/vm_object.h
101	We'll need a __FreeBSD_version bump with this change, to force out-of-tree kernel modules to be recompiled. It's not too unusual for them to access vm_object fields directly. None of the usual suspects access `memq` though.

kib added inline comments.Tue, Apr 29, 5:16 PM

sys/vm/vm_page.h
224	Why would that require the current review before?
238–239	Is listq still needed?

markj added inline comments.Tue, Apr 29, 5:19 PM

sys/vm/vm_page.h
224	In order to avoid making struct vm_page larger. My intent would be to remove all uses of `listq`. Though, this is still a bit tricky in vm_phys.c because of the way we lazily remove pages from the page queues.

kib added inline comments.Tue, Apr 29, 5:22 PM

sys/vm/vm_page.h
224	But you said that the bytes are added to plinks, which should overlay other members of the union? But nevermind.

dougm added inline comments.Tue, Apr 29, 5:26 PM

sys/vm/vm_page.h
238–239	listq is still needed in uma_core.c and vm_phys.c, where pages are not part of objects. The first version of this patch shrank vm_page by using a union to store listq and (object, pindex) in the same space. And that used to work. Now, it doesn't work, and I plan to make it work after this gets committed.

Always make the iter pointer the last argument to any function. Suggested offline by @alc.

Herald added a subscriber: glebius. · View Herald TranscriptTue, Apr 29, 6:09 PM

In page.c, reorder more param lists to make the iterator appear last.

dougm mentioned this in D50088: pctrie: add a locked pctrie_lookup_range.Wed, Apr 30, 5:19 PM

vm_object: drop memq fieldNeeds ReviewPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 147746

sys/amd64/amd64/efirt_machdep.c

sys/amd64/amd64/pmap.c

sys/amd64/sgx/sgx.c

sys/arm/arm/pmap-v6.c

sys/arm64/arm64/efirt_machdep.c

sys/arm64/arm64/pmap.c

sys/compat/linuxkpi/common/src/linux_compat.c

sys/dev/agp/agp.c

sys/i386/i386/pmap.c

sys/kern/kern_kcov.c

sys/kern/kern_proc.c

sys/kern/subr_pctrie.c

sys/powerpc/aim/mmu_oea.c

sys/powerpc/aim/mmu_oea64.c

sys/powerpc/aim/mmu_radix.c

sys/powerpc/booke/pmap.c

sys/riscv/riscv/pmap.c

sys/sys/pctrie.h

sys/vm/device_pager.c

sys/vm/swap_pager.c

sys/vm/vm_fault.c

sys/vm/vm_glue.c

sys/vm/vm_kern.c

sys/vm/vm_map.c

sys/vm/vm_object.h

sys/vm/vm_object.c

sys/vm/vm_page.h

sys/vm/vm_page.c

sys/vm/vm_pageout.c

sys/vm/vm_radix.h

sys/vm/vm_reserv.c

sys/vm/vm_swapout.c

sys/vm/vnode_pager.c

sys/x86/iommu/amd_idpgtbl.c

sys/x86/iommu/intel_idpgtbl.c

vm_object: drop memq field
Needs ReviewPublic
Actions

Revision Contents
Changeset List