Calling vm_phys_scan_contig in a loop in vm_page_reclaim_contig_domain_ext duplicates the work of searching for the appropriate segment needlessly. Break up vm_phys_scan_contig into pieces and invoke them with an awareness of the march through segments, to avoid that extra computation.
Details
- Reviewers
alc markj manu - Commits
- rG9e8174289236: vm_phys: add binary segment search
Can build and run kernel.
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/vm/vm_page.c | ||
---|---|---|
3050 | I assume this refers to vm_page_found_range. Perhaps. But this vm_page.c already includes several references to vm_phys segments, so I assumed that using it here was okay. I will consider moving the vm_phys segments code to vm_phys.c. |
Move some phys stuff back into vm_phys.c. Make the binary search for the segment containing an address more generally used.
Have vm_phys_found_range return -1 for failure, so that vm_page_reclaim_contig_domain_ext need not access vm_phys_nsegs.
sys/vm/vm_page.c | ||
---|---|---|
3104 | A couple of these lines appear to be longer than 80 columns. | |
sys/vm/vm_phys.c | ||
903 | I believe this code is assuming that pa is either within the bounds of a segment, or past the end of the highest segment, but I don't see why that's true. In general the vm_phys segments of a given system will have holes in between each other. | |
sys/vm/vm_phys.h | ||
45 | Style: please include a newline after includes. |
Fix style problems. Resolve a merge conflict.
sys/vm/vm_phys.c | ||
---|---|---|
903 | If pa is before the start of the first segment, then segind is 0 and pa < seg->start, so we return NULL. If pa is between the first and second segments, then segind is 1, and pa < seg->start, so we return NULL. And so on. |
Define vm_phys_paddr_lookup to do the binary search based on 'end', without promising to check 'start', and vm_phys_paddr_to_seg, which checks 'start' too and returns a vm_phys_seg*. Use the latter in vm_phys_paddr_to_vm_page, and also in _pa_to_pmdp in arm64/pmap.c.
The arm64 code for the binary search is
dd4: 90000008 adrp x8, 0x0 <vm_phys_paddr_to_vm_page+0x8> dd8: 2a1f03e9 mov w9, wzr ddc: b940010a ldr w10, [x8] de0: 90000008 adrp x8, 0x0 <vm_phys_paddr_to_vm_page+0x14> de4: 91000108 add x8, x8, #0 de8: 3400018a cbz w10, 0xe18 <vm_phys_paddr_to_vm_page+0x4c> dec: 5280070b mov w11, #56 df0: 2a0a03ec mov w12, w10 df4: 4b09018d sub w13, w12, w9 df8: 0b4d052d add w13, w9, w13, lsr #1 dfc: 9bab21ae umaddl x14, w13, w11, x8 e00: f94005ce ldr x14, [x14, #8] e04: eb0001df cmp x14, x0 e08: 1a8d8529 csinc w9, w9, w13, hi e0c: 1a8c81ac csel w12, w13, w12, hi e10: 6b09019f cmp w12, w9 e14: 54ffff01 b.ne 0xdf4 <vm_phys_paddr_to_vm_page+0x28>
The inner loop for the current code also consists of 9 instructions
df0: 9100e129 add x9, x9, #56 df4: f1000508 subs x8, x8, #1 df8: 540001c0 b.eq 0xe30 <vm_phys_paddr_to_vm_page+0x64> dfc: f85f012a ldur x10, [x9, #-16] e00: eb00015f cmp x10, x0 e04: 54ffff68 b.hi 0xdf0 <vm_phys_paddr_to_vm_page+0x24> e08: f85f812b ldur x11, [x9, #-8] e0c: eb00017f cmp x11, x0 e10: 54ffff09 b.ls 0xdf0 <vm_phys_paddr_to_vm_page+0x24>
sys/vm/vm_phys.h | ||
---|---|---|
82–83 | Move this up, so as not to introduce another unsorted declaration. | |
115 | I would suggest vm_phys_lookup_segind. | |
117 | Use u_int. | |
121 | The compiler generates the same code for this as it would for lo != hi. |
Change from "== NULL" to "!= NULL" checks in a couple of functions to match what's already there and optimize the common case.
The change looks correct to me.
sys/vm/vm_phys.c | ||
---|---|---|
1248–1250 | I would either permit domain == -1 to mean that the caller doesn't care to specify a particular NUMA domain, or assert that domain >= 0 && domain < vm_ndomain. Probably the latter is fine for now. | |
sys/vm/vm_phys.h | ||
133 | I have access to an Ampere Altra with 256GB of RAM. Its vm_phys_seg array looks like this: https://reviews.freebsd.org/P578 In particular, almost all of the physical memory belongs to segment 4. I wonder if it would be possible and profitable to precompute an initial "mid"-point closer to the likely result, so as to reduce the average number of iterations. That is, start by looking at the segment in the middle of the physical address space as determined by the number of pages rather than the number of segments. |