The td_ast member is an int so only 4 bytes, yet we were using an 8 byte
load and thus also got td_inhibitors in the upper bits. The code prior
to the commit that introduced td_ast did also do a bogus 8 byte load of
td_flags but masked the flags so arguably was correct, if dodgy. Now
that we're using the right width for the load we can also fold the
immediate offset back into the load; because td_ast is at an odd
multiple of 4 bytes from the start of struct thread the normal scaled
load couldn't be used with such an immediate offset when doing an 8 byte
load due to its limited immediate range, but we can use a scaled load
once more now that the offset is a multiple of the load width.
Fixes: c6d31b8306eb ("AST: rework")