When enqueueing on an architecture with a weak memory model ensure
loading br->br_prod_head and br->br_cons_tail are ordered correctly.
If br_cons_tail is loaded first then other threads may perform a
dequeue and enqueue before br_prod_head is loaded. This will mean the
tail is one less than it should be and the code under the
prod_next == cons_tail check could incorrectly be skipped.
buf_ring_dequeue_mc has the same issue with br->br_prod_tail and
br->br_cons_head so needs the same fix.
Reported by: Ali Saidi <alisaidi@amazon.com>
Co-developed by: Ali Saidi <alisaidi@amazon.com>
Sponsored by: Arm Ltd