for the which which definitely use membar to sync with interrupt handlers.
libc and rtld uses of __compiler_membar() seems to want compiler barriers proper.
There are two special cases:
- kpilite sched_unpin_lite() fence after td_pinned decrement. I am not sure if we need it at all
- x86/include/bus.h (not handled in the patch) where I am not sure why do we need seq_cst fence at all for bus_space_barrier()