The overall goal of the change is to reduce the amount of work done in
locore assembly, and defer as much as possible until pmap_bootstrap().
Currently, half the setup is done in assembly, and then we pass the l1pt
address to pmap_bootstrap() where it is amended with other mappings.
Inspiration and understanding has been taken from amd64's
create_pagetables() routine, and I try to present the page table
construction in the same way: a linear procedure with commentary
explaining what we are doing and why. Thus the core of the new
implementation is contained in pmap_create_pagetables().
Once pmap_create_pagetables() has finished, we switch to the new
pagetable root and leave the bootstrap ones created by locore behind,
resulting in a minimal 8kB of wasted space.
Having the whole procedure in one place, in C code, allows it to be more
easily understood, while also making it more amenable to future changes
which may depend on CPU feature/errata detection.
Note that with this change the size of the early devmap is bumped up
from one to four L2 pages (8MB).