Use of compiler builtin ffs/ctz functions will result in optimized
instruction sequences when possible, and fall back to calling a function
provided by the compiler runtime library. We have slowly shifted our
platforms to take advantage of these builtins in 60645781d613 (arm64),
1c76d3a9fbef (arm), 9e319462a03a (powerpc, partial), D40594 (riscv,
proposed).
Some platforms still rely on the libkern implementations of these
functions provided by libkern, namely riscv, powerpc (ffs*, flsll), and
i386 (ffsll and flsll). These routines are slow, as they perform a
linear search for the bit in question. Even on platforms lacking
dedicated bit-search instructions, such as riscv, the compiler library
will provide better-optimized routines, e.g. by using binary search.
Consolidate the existing builtin implementations in sys/libkern.h, but
with the ability for a specific machine-dependent implementation to be
provided for each function by machine/cpufunc.h. amd64 and i386 make use
of this to provide fls* using bsrl/bsrq instructions.
One wart in all of this is the existing HAVE_INLINE_F*** macros, which
we use in a few places to conditionally avoid the slow libkern routines.
These aren't easily removed in one commit. For now, provide these
defines unconditionally, but marked for removal after subsequent
cleanup.
Removal of the now unused libkern routines will follow in a separate
commit.