Page MenuHomeFreeBSD

amd64 pmap: account for the top-level pages
ClosedPublic

Authored by kib on Oct 20 2021, 1:06 AM.
Tags
None
Referenced Files
F108429426: D32569.id.diff
Fri, Jan 24, 5:18 PM
Unknown Object (File)
Wed, Jan 22, 5:21 PM
Unknown Object (File)
Wed, Jan 22, 5:20 PM
Unknown Object (File)
Sat, Jan 18, 10:08 PM
Unknown Object (File)
Sat, Jan 18, 9:58 PM
Unknown Object (File)
Sat, Jan 18, 5:47 PM
Unknown Object (File)
Tue, Jan 14, 7:26 AM
Unknown Object (File)
Mon, Jan 13, 8:04 PM
Subscribers

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Oct 20 2021, 1:06 AM

I don't see any problem with this, but I'm not sure why it's preferable either. Now amd64 is inconsistent with other pmaps.

sys/amd64/amd64/pmap.c
1505

pmap_alloc_pt_page() updates user_pt_page_count even when pmap == NULL, so I think this is double-counting.

4369

Some tracing[*] suggests that it is still useful, I presume because some applications do not fault in a mapping of the shared page. For instance, during a buildkernel, ~15-20% of pmap_remove() calls during an exit terminate immediately because of the optimization. A large majority of pmap_remove() calls during execve also benefit from the early termination. I have no idea what the practical impact is, though, since the pmap lock will be uncontended in these paths. But this diff completely nullifies the pmap_remove() optimization.

Could we instead consider explicitly unmapping the shared page during execve and exit?

\* something like dtrace -n 'fbt::pmap_remove:entry {@[stack(), args[0]->pm_stats.resident_count == 0] = count()}'

Fix double-accounting
Improve pmap_remove() check to allow the top level page table pages be accounted
Remove shared page(s) from vm_map manually before destroying the vmspace, on exec and exit

sys/amd64/amd64/pmap.c
6281
6282
6283

Is the point that kernel_pmap->pm_stats.resident_count > 2 always?

sys/kern/kern_exit.c
422

If multiple processes are sharing the vmspace, vmspace_exit() will only drop a reference. So I think this operation needs to happen in vmspace_exit() after the last ref is dropped.

Then, we have the following code in both vmspace_dofree() and exec_new_vmspace():

if (p->p_sysent->sv_shared_page_obj != NULL)
    vm_map_remove(map, p->p_sysent->sv_shared_page_base, p->p_sysent->sv_shared_page_len);
shmexit(vmspace);
vm_map_remove(map, vm_map_min(map), vm_map_max(map));

Should it be lifted into a new vmspace_clear() helper function? I do not think the ordering of the pmap_remove_pages() call is important so long as it happens before the final vm_map_remove() call.

Finally, instead of using vm_map_remove() to unmap the shared page, can we simply use pmap_remove()? That will be marginally cheaper since it avoids locking the map or performing a splay lookup, but it should still be sufficient.

kib marked 5 inline comments as done.Oct 21 2021, 2:42 PM

I am wondering if it makes sense to add pmap flag to indicate that it is dying and disable invalidation IPIs until either pmap_release() or specific call to re-enable them.

sys/kern/kern_exit.c
422

I structured it differently, mostly I want the shared page remap before the shuffle of the pmap activations and reference handling.

exec_free_abi_mappings()
update comment in pmap_remove() about optimization

I wrote a little will-it-scale benchmark to try and evaluate this. The loop forks(), and the child touches the shared page and then exits. The parent waits for the child before creating another child.

With a single instance of the loop, the loop rate drops from 5750Hz to 5430Hz when I disable the optimization. The parent process has 60 map entries, almost all of which are backed by VM objects.

I also tried changing the test to not touch the shared page, and measured the rate with and without exec_free_abi_mappings(), and I don't see a significant difference.

If I run NCPU instances of the loop, the benchmark is bottlenecked by pv list lock contention in pmap_remove_pages(), and the optimization has a smaller but still measurable effect on throughput.

In D32569#735609, @kib wrote:

I am wondering if it makes sense to add pmap flag to indicate that it is dying and disable invalidation IPIs until either pmap_release() or specific call to re-enable them.

I was wondering that too. But, why would we raise IPIs when the exiting/execing process is guaranteed to be single-threaded?

sys/amd64/amd64/pmap.c
6288

I suspect that 1 + (pmap->pm_pmltopu != NULL ? 1 : 0) is a bit more idiomatic in the kernel, but this is just a suggestion.

sys/kern/kern_exec.c
1074
1075
1077

Maybe "no work will be left for pmap_remove(min, max)."

1087

I believe this isn't required for correctness now that we are removing the mapping with pmap_remove() instead of vm_map_remove(). Other processes sharing the pmap will incur a soft fault the next time they access the shared page. Just an observation.

1094

If a process has not mapped the shared page, then this becomes a pessimization. As noted elsewhere, this does happen for some commonly used applications. I think one pmap_remove() call is not likely to be significant.

This revision is now accepted and ready to land.Oct 26 2021, 5:01 PM
kib marked 7 inline comments as done.Oct 26 2021, 9:02 PM
kib added inline comments.
sys/kern/kern_exec.c
1087

It should be still a minor optimization. In particular, avoiding the fault.

1094

You mean, not 'not mapped', but 'not faulted'. I am not sure that there is anything to do, at least I cannot think about fast way to check for the presence of the installed mapping that would not incur the cost of pmap lock.

kib marked 2 inline comments as done.

Use ? 1 : 0 instead of casting bool to int.
Adjust comment.

This revision now requires review to proceed.Oct 26 2021, 9:06 PM
This revision was not accepted when it landed; it landed in state Needs Review.Oct 28 2021, 7:02 PM
This revision was automatically updated to reflect the committed changes.