Rather than maintain our own implementations for each architecture, we can rely
on the compiler builtin to emit appropriate instructions or optimized libcalls
as available on each platform.
This conversion was done recently for x86/endian.h in e6ff6154d203.
Overall this should result in similar if not identical code generation. The
notable exception is arm, where we can now take advantage of the rev
instruction, available on armv6+. On arm64, we end up saving one instruction,
due to PR 236920.
While here, comments present in other headers were added where they were
missing. Drop the armeb remnants.
Presented as one review, although I expect to commit each file separately.