Details

Reviewers

alc
markj

Commits

rG1c69690319c5: Unmap shared page manually before doing vm_map_remove() on exit or exec
rG0b3bc7288984: amd64 pmap: adjust the empty pmap optimization in pmap_remove()
rGe93b5adb6bb8: amd64 pmap: account for the top-level pages

Summary

both for kernel and user page tables, the later exist in the PTI case.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Oct 20 2021, 1:06 AM

Herald added a subscriber: imp. · View Herald TranscriptOct 20 2021, 1:06 AM

kib requested review of this revision.Oct 20 2021, 1:06 AM

Rebase

kib added reviewers: alc, markj.Oct 20 2021, 1:30 AM

Upload the right patch

I don't see any problem with this, but I'm not sure why it's preferable either. Now amd64 is inconsistent with other pmaps.

sys/amd64/amd64/pmap.c
1505	pmap_alloc_pt_page() updates user_pt_page_count even when pmap == NULL, so I think this is double-counting.
4369	Some tracing[] suggests that it is still useful, I presume because some applications do not fault in a mapping of the shared page. For instance, during a buildkernel, ~15-20% of pmap_remove() calls during an exit terminate immediately because of the optimization. A large majority of pmap_remove() calls during execve also benefit from the early termination. I have no idea what the practical impact is, though, since the pmap lock will be uncontended in these paths. But this diff completely nullifies the pmap_remove() optimization. Could we instead consider explicitly unmapping the shared page during execve and exit? \ something like `dtrace -n 'fbt::pmap_remove:entry {@[stack(), args[0]->pm_stats.resident_count == 0] = count()}'`

Fix double-accounting
Improve pmap_remove() check to allow the top level page table pages be accounted
Remove shared page(s) from vm_map manually before destroying the vmspace, on exec and exit

markj added inline comments.Oct 21 2021, 12:45 PM

sys/amd64/amd64/pmap.c
6281
6282
6283	Is the point that kernel_pmap->pm_stats.resident_count > 2 always?
sys/kern/kern_exit.c
424 ↗	(On Diff #97187)	If multiple processes are sharing the vmspace, vmspace_exit() will only drop a reference. So I think this operation needs to happen in vmspace_exit() after the last ref is dropped. Then, we have the following code in both vmspace_dofree() and exec_new_vmspace(): if (p->p_sysent->sv_shared_page_obj != NULL) vm_map_remove(map, p->p_sysent->sv_shared_page_base, p->p_sysent->sv_shared_page_len); shmexit(vmspace); vm_map_remove(map, vm_map_min(map), vm_map_max(map)); Should it be lifted into a new vmspace_clear() helper function? I do not think the ordering of the pmap_remove_pages() call is important so long as it happens before the final vm_map_remove() call. Finally, instead of using vm_map_remove() to unmap the shared page, can we simply use pmap_remove()? That will be marginally cheaper since it avoids locking the map or performing a splay lookup, but it should still be sufficient.

I am wondering if it makes sense to add pmap flag to indicate that it is dying and disable invalidation IPIs until either pmap_release() or specific call to re-enable them.

sys/kern/kern_exit.c
424 ↗	(On Diff #97187)	I structured it differently, mostly I want the shared page remap before the shuffle of the pmap activations and reference handling.

exec_free_abi_mappings()
update comment in pmap_remove() about optimization

cy added a subscriber: cy.Oct 21 2021, 4:19 PM

I wrote a little will-it-scale benchmark to try and evaluate this. The loop forks(), and the child touches the shared page and then exits. The parent waits for the child before creating another child.

With a single instance of the loop, the loop rate drops from 5750Hz to 5430Hz when I disable the optimization. The parent process has 60 map entries, almost all of which are backed by VM objects.

I also tried changing the test to not touch the shared page, and measured the rate with and without exec_free_abi_mappings(), and I don't see a significant difference.

If I run NCPU instances of the loop, the benchmark is bottlenecked by pv list lock contention in pmap_remove_pages(), and the optimization has a smaller but still measurable effect on throughput.

In D32569#735609, @kib wrote:

I am wondering if it makes sense to add pmap flag to indicate that it is dying and disable invalidation IPIs until either pmap_release() or specific call to re-enable them.

I was wondering that too. But, why would we raise IPIs when the exiting/execing process is guaranteed to be single-threaded?

sys/amd64/amd64/pmap.c
6284	I suspect that `1 + (pmap->pm_pmltopu != NULL ? 1 : 0)` is a bit more idiomatic in the kernel, but this is just a suggestion.
sys/kern/kern_exec.c
1074 ↗	(On Diff #97210)
1075 ↗	(On Diff #97210)
1077 ↗	(On Diff #97210)	Maybe "no work will be left for pmap_remove(min, max)."
1087 ↗	(On Diff #97210)	I believe this isn't required for correctness now that we are removing the mapping with pmap_remove() instead of vm_map_remove(). Other processes sharing the pmap will incur a soft fault the next time they access the shared page. Just an observation.
1094 ↗	(On Diff #97210)	If a process has not mapped the shared page, then this becomes a pessimization. As noted elsewhere, this does happen for some commonly used applications. I think one pmap_remove() call is not likely to be significant.

This revision is now accepted and ready to land.Oct 26 2021, 5:01 PM

kib marked 7 inline comments as done.Oct 26 2021, 9:02 PM

kib added inline comments.

sys/kern/kern_exec.c
1087 ↗	(On Diff #97210)	It should be still a minor optimization. In particular, avoiding the fault.
1094 ↗	(On Diff #97210)	You mean, not 'not mapped', but 'not faulted'. I am not sure that there is anything to do, at least I cannot think about fast way to check for the presence of the installed mapping that would not incur the cost of pmap lock.