amd64: stop using top of the thread' kernel stack for FPU user save area
MFC note: this commit changes layout of td_md for amd64, resulting in
static checks for struct thread ABI in kern_thread.c to fail. Next
two commits restore the layout, I decided to not overcomplicate the
merge and not do the work that is going to be overwritten immediately.
(cherry picked from commit df8dd6025af88a99d34f549fa9591a9b8f9b75b1)