Instead do one more allocation at the thread creation time. This frees a lot of space on the stack.
Also do not use alloca() for temporal storage in signal delivery sendsig() function and signal return syscall sys_sigreturn(). This saves equal amount of space, again by the cost of one more allocation at the thread creation time.
amd64: eliminate td_md.md_fpu_scratch
For signal send, copyout from the user FPU save area directly.
For sigreturn, we are in sleepable context and can do temporal allocation of the transient save area. We cannot copying from userspace directly to user save area because XSAVE state needs to be validated, also partial copyins can corrupt it.
amd64: move signal handling and register structures manipulations into sig_machdep.c from machdep.c which is too large pile of unrelated things. Some ptrace functions are moved from machdep.c to ptrace_machdep.c.
linux32: add a hack to avoid redefining the type of the savefpu tag when compiling in amd64 kernel environment with -m32. This is a temporal workaround for some future proper (but unclear) fix.
Per-commit view is available at https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/alloca