Page MenuHomeFreeBSD

amd64: Use __seg_gs to implement per-CPU data accesses.
ClosedPublic

Authored by jhb on Jun 20 2023, 2:54 PM.
Tags
None
Referenced Files
F102806803: D40647.diff
Sun, Nov 17, 10:42 AM
Unknown Object (File)
Sep 23 2024, 9:48 AM
Unknown Object (File)
Sep 22 2024, 4:57 PM
Unknown Object (File)
Sep 21 2024, 3:54 PM
Unknown Object (File)
Sep 18 2024, 7:03 AM
Unknown Object (File)
Sep 17 2024, 11:45 AM
Unknown Object (File)
Sep 16 2024, 8:16 PM
Unknown Object (File)
Sep 11 2024, 12:42 PM
Subscribers

Details

Summary

This makes use of the alternate address space support in both GCC and
clang to access per-CPU data as accesses relative to GS:. The
original motivation for this is that it quiets verbose warnings from
GCC 12. However, this version is also much easier to read and
allows the compiler to generate better code (e.g. the compiler can
use a GS: memory operand directly in other instructions such as IMUL
and CMP rather than always MOVing to a temporary register).

The one caveat is that the current approach is very inefficient at -O0
since the compiler expects to load the 0 base offset from a global
variable instead of assuming it is 0 (even with the const).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

My original motivation for this is that PCPU_GET raises a -Warray-bounds warning with GCC 12 which makes the kernel build very noisy. (GCC gets confused and unhappy about the struct s type punning.)

However, this also seems to result in better code generation in that now the compiler is free to use %gs-relative memory operands in other instructions, e.g. using cmp with a %gs operand rather than having to always do a mov to a temporary register.

Another change I noticed though is that the compiler will also now sometimes cache the value of a per-CPU variable that is accessed multiple times in a function. Any places that would be bad though I think are already broken if they count on treating the PCPU_GET as implicitly volatile. If you are using critical_enter/exit those already contain compiler barriers so should force the values to be reloaded.

One headache though is that with -O0 (which some people do use) the compiler degrades to reading the 0 offset from a global via a PC-relative access. Not sure if there is a way to get rid of that even in -O0.

Yes, the think that could break is the accesses like

  • read through __seg_gs
  • write through pcpu linear pointer
  • read through __seg_gs again

I cannot think about an obvious example.

sys/amd64/include/pcpu.h
122

And __SEG_GS is defined by both modern gcc and clang?

sys/amd64/include/pcpu.h
122

Yes. For GCC it needs the preceding change in this stack to add -std=gnu99 (clang enables it even for -std=c99 which might be a bug in clang).

kib added inline comments.
sys/amd64/include/pcpu.h
122

It is an extension in the implementation namespace, I do not see why it is wrong to have it enabled just by c99 mode.

This revision is now accepted and ready to land.Jun 21 2023, 3:22 AM
sys/amd64/include/pcpu.h
157

Why do we still need this? It was only there before to restrict when we tried to use a memory operand for inline assembly in order to avoid nonsense widths, AFAICT.

172

Ditto

sys/amd64/include/pcpu.h
157

Hmm, I was somewhat paranoid that the compiler might not do exactly what we want here. My assumption is that the second version generates a pointer from pc_prvspace and then uses an ADD with that.

what is this waiting for?

In D40647#930340, @mjg wrote:

what is this waiting for?

I need to test it some more.

sys/amd64/include/pcpu.h
157

That commit just moved the bits around I think, the dedicated single-instruction versions were first added back in https://github.com/freebsd/freebsd-src/commit/03cd4c2029d2f6444ea162462db3157fe21a3a67

BTW, tested an -O0 kernel and it does boot ok.