As with br_cons_tail use an atomic load acquire to read br_prod_tail
in buf_ring_dequeue_mc and buf_ring_peek*.
On dequeue we need to ensure we don't read the entry from the buf_ring
until it is available and prod_tail has updated. There is already an
appropriate store in the enqueue path and an appropriate load in the
single consumer dequeue, we just need one in the other functions that
read from the buf_ring.
Sponsored by: Arm Ltd