Provide an implementation of fpu_kern_enter/fpu_kern_leave for PPC to
enable FPU, VSX, and Altivec usage in-kernel. The functions currently
only support FPU_KERN_NOCTX, but this is sufficient for ossl(1) and many
other users of the API.
Details
This patchset has been tested on powerpc64le using a modified version of the in-tree tools/tools/crypto/cryptocheck.c tool to check for FPU/Vec register clobbering along with a follow-up patch to enable ossl(4) on powerpc64*.
The (work-in-progress) cryptocheck patch that implements FPU clobber testing is available here: https://gitlab.raptorengineering.com/sanastasio/freebsd-patches/-/blob/master/0003-WIP-Add-FPU-clobber-check-for-ppc64-to-cryptocheck.patch
The follow-up (work-in-progress) ossl(4) patch for powerpc64* support is available here: https://gitlab.raptorengineering.com/sanastasio/freebsd-patches/-/blob/master/0002-ossl-Add-support-for-powerpc64-powerpc64le.patch
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Awesome, thanks for this patch! This might pave the way to fixing GPU drivers as well. Could you bump the version in https://cgit.freebsd.org/src/tree/sys/sys/param.h#n78? This will make it possible to add version checks in https://github.com/freebsd/drm-kmod/blob/5.15-lts/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c#L97 to support FreeBSD/powerpc64* as well.
As for the contents of the current patch, I will leave it to people who are able to correctly review it. But I think you should be able to switch https://cgit.freebsd.org/src/tree/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_powerpc.h#n53 to 1 for testing - if you can load ZFS module with FPU and do some basic operations, this patch should be ok.
Also, regarding your commit description for ossl: "The required assembly files for ppc were already present, but never used before now.". That's actually wrong. I added those files to be used by userspace OpenSSL libraries and they in fact allow up to 20x speedup over the previous portable C code. Back then, I tried adding ossl(4) support as well but didn't have enough knowledge to add kfpu support.
One tip for future uploads - please generate patches with full context (e.g. git show -U999999 <hash>) so that Phabricator will be able to expand the rest of the file. See https://wiki.freebsd.org/Phabricator for details.
Testing with ZFS would be a plus. It also only uses FPU_KERN_NOCTX on aarch64 and x86. There is a fpu_kern.9 manual page you will also want to update.
Thank you all for the comments. I'll coalesce my replies into a single message.
@pkubaj - Would it be most appropriate to perform that version bump in this patch or in a separate follow-up patch? Also thank you for pointing out the issue with the ossl patch description -- I'll correct that before submission here.
@emaste - Noted -- I'll fix that going forward.
@jhb - Thanks for pointing out the missing man page change, I'll update it and upload a new patch here. I'll also look into doing some basic tests with ZFS.
If possible please!make sure this doesn't break the boot loader. I don't see how it could, but ZFS and float has broken the loader before
I've just done some preliminary tests with ZFS kernel FPU enabled on power and it seems to be working, though I'm not sure if I've fully exercised the relevant code paths.
For reference, the patch enabling ZFS kfpu support I wrote is here: https://gitlab.raptorengineering.com/sanastasio/freebsd-patches/-/blob/master/0004-zfs-powerpc-enable-kernel-floating-point.patch
The commands I ran to test basic functionality were:
- dd if=/dev/zero of=/zfs.img bs=1K count=1 seek=1M
- zpool create zvol0 /zfs.img
- dd if=/dev/urandom of=/zvol0/junk bs=1K count=200K
- zpool scrub zvol0
I also added a printf inside kfpu_begin() to ensure that ZFS was indeed using the kernel FPU facilities.
Overall this looks good, just the one comment on SPE. Can you also test with powerpc64 BE to avoid surprises?
sys/powerpc/powerpc/fpu.c | ||
---|---|---|
332 | I think this whole block is not correct for SPE. SPE uses PSL_VEC and enable_vec(). This file, fpu.c, is used only for FP emulation, which would be a waste for fpu_kern with SPE, quite likely a severe performance penalty. Since I'm pretty sure there are very few users of SPE (quite possibly just me), I recommend just blocking this whole thing from SPE, and if someone is adventurous enough they can try making it work. |
sys/powerpc/powerpc/fpu.c | ||
---|---|---|
332 | Sounds good -- I'll wrap the entire function bodies in an ifndef __SPE__. |
- Guard out entire implementation on SPE
Additionally, I have tested the patch along with my aforementioned ossl patches on powerpc64 big endian and confirmed that the fpu_kern* facilities work as expected.
Alarmingly, though, independent of any of my patches, I discovered that the cryptocheck tool doesn't fully pass on big endian with the cryptosoft backend. The ossl backend enabled by my patches works fine though.
Can you share which tests fail on big-endian? Is it specific to certain ciphers or hash algorithms?
It seems like aes-xts and all its variants are the only ones failing. See below.
# ./cryptocheck -a all -d soft -v ripemd160 (16) matched (cryptodev device cryptosoft0) sha1 (16) matched (cryptodev device ossl0) sha224 (16) matched (cryptodev device ossl0) sha256 (16) matched (cryptodev device ossl0) sha384 (16) matched (cryptodev device ossl0) sha512 (16) matched (cryptodev device ossl0) blake2b (16) matched (cryptodev device cryptosoft0) blake2s (16) matched (cryptodev device cryptosoft0) ripemd160hmac (16) matched (cryptodev device cryptosoft0) sha1hmac (16) matched (cryptodev device ossl0) sha224hmac (16) matched (cryptodev device ossl0) sha256hmac (16) matched (cryptodev device ossl0) sha384hmac (16) matched (cryptodev device ossl0) sha512hmac (16) matched (cryptodev device ossl0) gmac128 (16) matched (cryptodev device cryptosoft0) gmac192 (16) matched (cryptodev device cryptosoft0) gmac256 (16) matched (cryptodev device cryptosoft0) poly1305 (16) matched (cryptodev device ossl0) aes-cbc128 (16) matched (cryptodev device ossl0) aes-cbc192 (16) matched (cryptodev device ossl0) aes-cbc256 (16) matched (cryptodev device ossl0) aes-ctr128 (16) matched (cryptodev device cryptosoft0) aes-ctr192 (16) matched (cryptodev device cryptosoft0) aes-ctr256 (16) matched (cryptodev device cryptosoft0) aes-xts128 (16) encryption mismatch: control: 0000 f5 dc 96 14 60 43 95 99 e0 cf fc 7f d8 89 08 38 |....`C.........8| test (cryptodev device cryptosoft0): 0000 7d 28 37 e0 95 f7 4d 97 34 5b 02 ad 1a ed 51 e6 |}(7...M.4[....Q.| aes-xts256 (16) encryption mismatch: control: 0000 d1 c2 99 6d 13 f7 dc 12 49 e2 43 49 ae d4 2b 61 |...m....I.CI..+a| test (cryptodev device cryptosoft0): 0000 27 cc f5 72 13 6a d6 05 8a 49 cd 47 98 ae bc c2 |'..r.j...I.G....| camellia-cbc128 (16) matched (cryptodev device cryptosoft0) camellia-cbc192 (16) matched (cryptodev device cryptosoft0) camellia-cbc256 (16) matched (cryptodev device cryptosoft0) chacha20 (16) matched (cryptodev device ossl0) aes-cbc128+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) aes-cbc128+sha1hmac (0, 16) matched (cryptodev device ossl0) aes-cbc128+sha224hmac (0, 16) matched (cryptodev device ossl0) aes-cbc128+sha256hmac (0, 16) matched (cryptodev device ossl0) aes-cbc128+sha384hmac (0, 16) matched (cryptodev device ossl0) aes-cbc128+sha512hmac (0, 16) matched (cryptodev device ossl0) aes-cbc192+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) aes-cbc192+sha1hmac (0, 16) matched (cryptodev device ossl0) aes-cbc192+sha224hmac (0, 16) matched (cryptodev device ossl0) aes-cbc192+sha256hmac (0, 16) matched (cryptodev device ossl0) aes-cbc192+sha384hmac (0, 16) matched (cryptodev device ossl0) aes-cbc192+sha512hmac (0, 16) matched (cryptodev device ossl0) aes-cbc256+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) aes-cbc256+sha1hmac (0, 16) matched (cryptodev device ossl0) aes-cbc256+sha224hmac (0, 16) matched (cryptodev device ossl0) aes-cbc256+sha256hmac (0, 16) matched (cryptodev device ossl0) aes-cbc256+sha384hmac (0, 16) matched (cryptodev device ossl0) aes-cbc256+sha512hmac (0, 16) matched (cryptodev device ossl0) aes-ctr128+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr128+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr128+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr128+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr128+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr128+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr192+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr192+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr192+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr192+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr192+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr192+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr256+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr256+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr256+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr256+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr256+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) aes-ctr256+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) aes-xts128+ripemd160hmac (0, 16) encryption mismatch: control: 0000 61 08 43 05 51 e1 7e 7c ce b6 25 7d 97 ff 74 f4 |a.C.Q.~|..%}..t.| test (cryptodev device cryptosoft0): 0000 c8 91 66 30 83 ef 3f 98 8f e4 a0 f7 52 58 cf 7e |..f0..?.....RX.~| aes-xts128+sha1hmac (0, 16) encryption mismatch: control: 0000 f1 a4 90 dd 88 61 00 1c 97 ae ac f4 e8 7e 5d 73 |.....a.......~]s| test (cryptodev device cryptosoft0): 0000 4b 40 61 77 2a f4 96 6b 59 a7 92 4c 45 1a 38 c8 |K@aw*..kY..LE.8.| aes-xts128+sha224hmac (0, 16) encryption mismatch: control: 0000 48 8c 9a cb 5d 82 59 ae 6a db 13 b9 76 29 67 fb |H...].Y.j...v)g.| test (cryptodev device cryptosoft0): 0000 5c 50 f8 99 c4 5d 7b 5c 00 34 51 27 23 89 87 5f |\P...]{\.4Q'#.._| aes-xts128+sha256hmac (0, 16) encryption mismatch: control: 0000 fb 6f 7e 08 09 4a 44 71 3a 9a fe 91 a2 c8 f0 2d |.o~..JDq:......-| test (cryptodev device cryptosoft0): 0000 ba 65 d9 1e 14 f0 bb 6f b4 24 16 8a e6 95 24 0d |.e.....o.$....$.| aes-xts128+sha384hmac (0, 16) encryption mismatch: control: 0000 ec c9 51 67 10 44 7d 2f 3c 97 34 53 62 f0 62 89 |..Qg.D}/<.4Sb.b.| test (cryptodev device cryptosoft0): 0000 4d f5 e9 91 c3 27 cd cb ba 2f 5e c8 4e 0d af 7a |M....'.../^.N..z| aes-xts128+sha512hmac (0, 16) encryption mismatch: control: 0000 64 5a 65 bb f9 a7 9d 22 49 44 e9 dc b9 fa 75 fc |dZe...."ID....u.| test (cryptodev device cryptosoft0): 0000 1c fc 16 fa 6c 24 11 e8 39 ac 63 b6 73 a0 1b f0 |....l$..9.c.s...| aes-xts256+ripemd160hmac (0, 16) encryption mismatch: control: 0000 86 b3 7d 97 4f ad 72 41 e7 73 2a d0 75 92 7a c9 |..}.O.rA.s*.u.z.| test (cryptodev device cryptosoft0): 0000 5e 74 e1 59 79 93 0c 63 57 7a 6f 87 ae 6a 4c 66 |^t.Yy..cWzo..jLf| aes-xts256+sha1hmac (0, 16) encryption mismatch: control: 0000 4a 8d 97 6c 6c 96 d7 9c c0 8e e1 57 14 4a 79 be |J..ll......W.Jy.| test (cryptodev device cryptosoft0): 0000 a4 ca f7 29 cc 9b c2 07 4a a6 36 7f c6 88 f2 68 |...)....J.6....h| aes-xts256+sha224hmac (0, 16) encryption mismatch: control: 0000 8e c7 c1 f0 91 aa cb 58 3c 59 98 e5 11 a3 3b 7c |.......X<Y....;|| test (cryptodev device cryptosoft0): 0000 7b 1e 06 5e 63 b2 33 2b 59 e2 51 e0 7e 49 50 cf |{..^c.3+Y.Q.~IP.| aes-xts256+sha256hmac (0, 16) encryption mismatch: control: 0000 6c 26 fa 56 ec 86 74 c9 7b c9 2b 11 6a b4 eb 60 |l&.V..t.{.+.j..`| test (cryptodev device cryptosoft0): 0000 23 56 50 1b 35 48 d2 b7 81 07 dc aa a6 b4 f9 ae |#VP.5H..........| aes-xts256+sha384hmac (0, 16) encryption mismatch: control: 0000 7b 88 d0 ec 4b e5 37 3c d3 76 3d 42 ee 29 1a 82 |{...K.7<.v=B.)..| test (cryptodev device cryptosoft0): 0000 6c 7c b7 55 a8 8c fc ab f9 e2 cb 6a 40 0c 61 b4 |l|.U.......j@.a.| aes-xts256+sha512hmac (0, 16) encryption mismatch: control: 0000 1c e3 85 e3 d0 a7 8f 0d 1a 00 aa 9c 55 8e 3f 96 |............U.?.| test (cryptodev device cryptosoft0): 0000 2d 65 e0 82 c6 02 9b 04 76 3b 8f ce 7f 7e 9f c5 |-e......v;...~..| camellia-cbc128+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc128+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc128+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc128+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc128+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc128+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc192+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc192+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc192+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc192+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc192+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc192+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc256+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc256+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc256+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc256+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc256+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) camellia-cbc256+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) chacha20+ripemd160hmac (0, 16) matched (cryptodev device cryptosoft0) chacha20+sha1hmac (0, 16) matched (cryptodev device cryptosoft0) chacha20+sha224hmac (0, 16) matched (cryptodev device cryptosoft0) chacha20+sha256hmac (0, 16) matched (cryptodev device cryptosoft0) chacha20+sha384hmac (0, 16) matched (cryptodev device cryptosoft0) chacha20+sha512hmac (0, 16) matched (cryptodev device cryptosoft0) aes-gcm128/12 (0, 16) matched (cryptodev device ossl0) aes-gcm192/12 (0, 16) matched (cryptodev device ossl0) aes-gcm256/12 (0, 16) matched (cryptodev device ossl0) aes-ccm128/12 (0, 16) matched (cryptodev device cryptosoft0) aes-ccm192/12 (0, 16) matched (cryptodev device cryptosoft0) aes-ccm256/12 (0, 16) matched (cryptodev device cryptosoft0) chacha20-poly1305/12 (0, 16) matched (cryptodev device ossl0)
Hi all,
Just a quick ping to see what the status was on this patch and whether or not it's in a state that it could be committed.
Though I haven't tested it on that hardware, I did take care to check for VSX support before setting the corresponding MSR bit in enable_fpu_kern, so it should be good.
Also, thank you @jhibbits for approving the patch. I'm not super familiar with the FreeBSD contribution work flow yet, so excuse the question, but is there anything that I need to do at this point to have this patch committed? I presume the patch must now be merged by someone other than me with commit privileges.
@sanastasio_raptorengineering.com Yes, a committer (probably me) needs to push this change. I'll try to find the time in the next few days to merge it. Thanks for your work on this!
Assuming everything works fine after a MFC period, could this change be also MFS'd to releng/14.0?