amd64 pmap: implement per-superpage locks
The current 256-lock sized array is a problem in the following ways:
- it's way too small
- there are 2 locks per cacheline
- it is not NUMA-aware
Solve these issues by introducing per-superpage locks backed by pages
allocated from respective domains.
This significantly reduces contention e.g. during poudriere -j 104.
See the review for results.
Reviewed by: kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21833