Implement the ffs and fls instructions, and their longer counterparts, in cpufunc, in terms of gcc extensions like __builtin_ffs, for arm64 architectures, and use those, rather than simple libkern implementations, in building arm64 kernels.
Tested by: greg_unrelenting.technology