Details

Reviewers

kib
markj
alc
manu

Commits

rS347927: Implement the ffs and fls functions, and their longer counterparts, in

Summary

Implement the ffs and fls instructions, and their longer counterparts, in cpufunc, in terms of gcc extensions like __builtin_ffs, for arm64 architectures, and use those, rather than simple libkern implementations, in building arm64 kernels.

Tested by: greg_unrelenting.technology

Test Plan

I can complete an arm64 build, and I can examine an object file like pmap.o or subr_blist.o for signs that the inlining has happened, as instruction sequences like

244: dac001c9 rbit x9, x14
248: dac01129 clz x9, x9

appear there to indicate a fast implementation of ffs. My reference on smart bit ops for arm64 and how to recognize them is https://fgiesen.wordpress.com/2013/10/18/bit-scanning-equivalencies/.

Without the patch, the arm64 build does not have these sequences in these object files and has lines like this instead:

af34:       94000000        bl      0 <ffsl>

which do not appear with the patch in place.

None of this suggests that this patch is correct, but only that it does something. I don't have the means to test an arm64 kernel at this time. If Peter Holm can help, that would be great. If not, I'll have to wait for the freebsd arm64 test boxes to come back online.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

dougm created this revision.May 12 2019, 9:16 PM

Herald added subscribers: emaste, andrew. · View Herald TranscriptMay 12 2019, 9:16 PM

Perhaps @andrew, @manu, @greg_unrelenting.technology or one of the other ARM folks would be willing to test?

arm64/include/cpufunc.h
56 ↗	(On Diff #57343)	The indentation in this function and the one above looks off.

Address indentation issues.

Can somebody explain to me why don't we use __builtin_ffs() and other functions everywhere ?

arm64/include/cpufunc.h
47 ↗	(On Diff #57349)	Why do you need the cast of mask to u_int ? Prototype in the gcc reference shows `int __builtin_ctz(unsigned)`. I also find it strange to write `mask == 0 ? mask : ...`, I would certainly express it as `mask == 0 ? 0 : ...`.
85 ↗	(On Diff #57349)	Why casting the result of __builtin_clzl() to int ? It is tautology.

Reduce strangeness.

arm64/include/cpufunc.h
47 ↗	(On Diff #57349)	In imitating the code in amd64/include/cpufunc.h, I unnecessarily copied a cast-to-int. I will remove it. I also copied the use of 'mask' that seems strange. I will remove it too.
85 ↗	(On Diff #57349)	Agreed.

In D20250#436408, @kib wrote:

Can somebody explain to me why don't we use __builtin_ffs() and other functions everywhere ?

In case you are asking why I wrote code using builtin_clz and not builtin_ffs, it is because I believe that many call to ffs and fls always subtract one from the result, and always know that the argument is nonzero, and I think that the way I wrote things might let the compiler optimize out the zero-check and/or the plus-one sometimes, and maybe with __builtin_ffs that wouldn't happen. But that is speculation.

I case you are asking about the whole freebsd codebase in general, I can't say.

Kernel with this patch boots fine on Amazon EC2 (which btw you can use too, freebsd-snapshots posts have aarch64 AMIs)

dougm edited the summary of this revision. (Show Details)May 14 2019, 4:27 PM

dougm added a reviewer: alc.May 16 2019, 7:41 AM

The only difference in the pmap.o code on arm64 between using builtin_ffsl and _builtin_ctzl (with a zero-test and an add) is that the version with builtin_ffsl has an 'orr' instruction where the other version has an 'add'.

So this version uses builtin_ffs and builtin_ffsl.

alc added inline comments.May 16 2019, 5:24 PM

arm64/include/cpufunc.h
94 ↗	(On Diff #57437)	This assumes that long and long long will always be the same size. If a tool chain came along that defined long long as 128 bits, this would compile without an error or warning and produce broken results. I would either add a CTASSERT() that long and long long are the same size or implement this using __builtin_clzll.

Handle the possibility that sizeof(long long) != sizeof(long).

alc accepted this revision.May 17 2019, 4:36 AM

This revision is now accepted and ready to land.May 17 2019, 4:36 AM

kib accepted this revision.May 17 2019, 7:58 AM

Closed by commit rS347927: Implement the ffs and fls functions, and their longer counterparts, in (authored by dougm). · Explain WhyMay 17 2019, 3:52 PM

This revision was automatically updated to reflect the committed changes.

Herald added a reviewer: manu. · View Herald TranscriptMay 17 2019, 3:52 PM

Herald added a subscriber: imp. · View Herald Transcript

dougm mentioned this in D40594: Build riscv kernel with inline ffs, fls.Jun 18 2023, 4:11 AM

Build arm64 kernel with inline ffs, fls
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 57485

head/sys/arm64/include/cpufunc.h

head/sys/conf/files.arm64

Build arm64 kernel with inline ffs, flsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 57485

head/sys/arm64/include/cpufunc.h

head/sys/conf/files.arm64

Build arm64 kernel with inline ffs, fls
ClosedPublic
Actions

Revision Contents
Changeset List