The implementations of *_keydiff iterate. By using flsl(), they don't have to.
If flsl() is implemented as a loop, though, this is a loss. Of particular concern is behavior on riscv, where the standard loop implementations of ffs, fls, etc appear to be in place. The change proposed at https://reviews.freebsd.org/D40594 seems like it ought to replace the loops with __builtin_ffs and the like, but I don't see obvious benefits (the presence of ctz and clz opcodes in objdump output) from this change after building a riscv kernel.
Mark, do you have any suggestions on how to (or who could) get ffs and fls inlined on riscv?