- in nvme_qpair_process_completions() do dma sync before completion buffer is used.
- in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm and arm64. Execution bus_dmamap_sync() is (and must be) sufficient to ensure that all CPU stores are visible to external (including DMA) observers.
- Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems, buffers continuously owned (and accessed) by DMA must be allocated with this flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems (or coherent buses in mixed systems).
MFC after: 3 weeks
I think that wmb() in nvme_qpair_submit_tracker() is relict from early
implementation (without bus_dmamap_sync() ). It's job of bus_dmamap_sync()
to ensure visibility of all CPU stores committed before, therefore write
barrier is clearly superfluous.
Unfortunately I have not right HW to test this on amd64/i386.