In cpu_fork, we allocate sizeof(struct pcb) + sizeof(struct trapframe)
space on the stack, then round it for stack alignment. This not only
fails to include the space needed for TP but also doesn't round up the
trapframe allocation to be stack-aligned, yet TF_SIZE does, as is the
expectation of fork_trampoline and cpu_exception_handler. Given that
sizeof(struct pcb) + sizeof(struct trapframe) is a multiple of 16, this
causes the saved TP to be stored in the PCB's pcb_sp (the intended
trapframe padding aliasing pcb_ra), which is probably harmless in
practice as the PCB isn't expected to be current, but definitely not
intended.
In cpu_thread_alloc, we do include the 8 bytes for TP and then stack
align that. This again fails to include the padding for trapframe as
present in TF_SIZE, but sizeof(struct pcb) + sizeof(struct trapframe)
happens to be a multiple of 16, as above, so adding 8 then rounding to
stack alignment (16) includes an extra 8 bytes of padding, giving the
right result for the wrong reason (and could be broken by future struct
growth).
Extract the calculation into a shared function that rounds correctly
regardless of the struct layouts. Also introduce a new struct kernframe
to encapsulate and clearly document this shared state rather than using
magic constants, and also enable it to be easily extended in future, as
we intend to do downstream in CheriBSD.