Again, naive attempt... this would be broken into three commits, and it seems to work in userland, at least.
Provide *cmpset_{8,16} that proxy through to atomic_fcmpset_32. Initial users will be mips and sparc64, with hopes to wean mips off of it later.
Initially tried to provide these in atomic_common with an #ifdef PLATFORM_NEEDS_SUBWORD_OPS, but this ends up being cleaner with only two archs using it and allows the possibility of KASSERT if we want to do so as systm.h needs bits from machine/atomic.h.
Most of the diff in machine/atomic.h are to refactor definitions of _acq/_rel versions.