Page MenuHomeFreeBSD

Add atomic_testandset_acq_* on arm64
ClosedPublic

Authored by andrew on Dec 20 2021, 2:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Oct 24, 6:05 PM
Unknown Object (File)
Sep 28 2024, 8:12 AM
Unknown Object (File)
Sep 25 2024, 1:26 AM
Unknown Object (File)
Sep 23 2024, 11:52 PM
Unknown Object (File)
Sep 23 2024, 12:26 PM
Unknown Object (File)
Sep 22 2024, 2:35 PM
Unknown Object (File)
Sep 17 2024, 2:44 AM
Unknown Object (File)
Sep 11 2024, 3:13 AM
Subscribers

Details

Summary

We only need to include sys/_atomic_subword.h on arm64 yo provide
atomic_testandset_acq_long. Add an implementation in the arm64 atomic.h
based on the existing atomic_testandset macro.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

It is strange, does anybody need atomic_testand{set,clear}_rel_<type>? It seems that LSE instructions LDCLRA and LDEORA both acquire and release semantic, am I right that we end up use them?

The only user of testand{set,clear} with a barrier in the kernel is sys/dev/random/fenestrasX/fx_pool.c.

Based on the description of the LDCLRA instruction LD*A and LD*AL will have acquire semantics, LD*AL and LD*L will have release semantics, so only LD*AL will have both. With this change the kernel can use LD{SET,CLR}A, so will have acquire semantics.

This revision is now accepted and ready to land.Dec 21 2021, 12:02 PM
This revision was automatically updated to reflect the committed changes.

Just a general comment about atomic_testandset_acq_* and similar RMW operations - I'm fairly sure that operations like this do not behave as described in atomic(9):

For example, to add two long integers ensuring that all prior loads and
stores are completed before the addition is performed, use
atomic_add_rel_long().

In particular, for both the LL/SC and LSE variants of the arm64 implementation, the acquire semantics apply only to the load portion of the operation. In other words, there is nothing preventing a subsequent load from being reordered with the store. But that example seems to suggest that the store will be visible before subsequent loads.

Just a general comment about atomic_testandset_acq_* and similar RMW operations - I'm fairly sure that operations like this do not behave as described in atomic(9):

For example, to add two long integers ensuring that all prior loads and
stores are completed before the addition is performed, use
atomic_add_rel_long().

Oops, I meant to quote this section instead:

For example, to subtract two integers ensuring that the subtraction is
completed before any subsequent loads and stores are performed, use
atomic_subtract_acq_int().

The C17 standard defines _acq as

For memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst, a load operation performs an acquire operation on the affected memory location.

For me this sounds exactly as implemented, i.e. for acquire, the load portion of RWM is acquire. Symmetrically for release, the store portion of RMW is release.

There is no reordering terminology. Probably our man page requires some tweak.

There was a discussion on the arch list about this a few years ago: https://lists.freebsd.org/pipermail/freebsd-arch/2014-October/016160.html.

With LL/SC atomics we may nove a later load before the store-exclusive, but it will be a successful store as any failed store-exclusive would rerun the load-exclusive and will have acquire semantics.

The one place we need to be careful is if we have both acquire and release semantics (accesses from before and after may be reordered to be within the read-modify-write sequence, so be reordered with each other), however as we don't have these in atomic(9) it shouldn't be an issue.

There was a discussion on the arch list about this a few years ago: https://lists.freebsd.org/pipermail/freebsd-arch/2014-October/016160.html.

Was there a discussion about specific issue Mark noted? Because the points from the mail you reference are all gone for long time. Atomic_load_acq and store_rel on x86 fully utilize TSO model provided by hardware. We also have atomic_thread_fence_XXX() primitives which provide desired memop-less fences without breaking C11 memory model.

With LL/SC atomics we may nove a later load before the store-exclusive, but it will be a successful store as any failed store-exclusive would rerun the load-exclusive and will have acquire semantics.

The one place we need to be careful is if we have both acquire and release semantics (accesses from before and after may be reordered to be within the read-modify-write sequence, so be reordered with each other), however as we don't have these in atomic(9) it shouldn't be an issue.

This is the common (allowed) issue with acq_rel operation, basically this is one of the aspects they are different from true sequentially consistent op (but there are more).

In D33587#761528, @kib wrote:

The C17 standard defines _acq as

For memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst, a load operation performs an acquire operation on the affected memory location.

For me this sounds exactly as implemented, i.e. for acquire, the load portion of RWM is acquire. Symmetrically for release, the store portion of RMW is release.

There is no reordering terminology. Probably our man page requires some tweak.

Yes, I think atomic(9) needs to be updated to reflect that acquire semantics for atomic_<op>_acq_<type>() apply specifically to the load portion of the operation. At least one consumer, smr_enter(), gets this wrong. There, the store must complete before any subsequent loads. On amd64 this is true since a load will not be reordered before a locked instruction (i.e., atomic_add_acq_int() behaves like __storeload_barrier()), but on arm64 there is no guarantee.