NVME: Multiple busdma related fixes.
- in nvme_qpair_process_completions() do dma sync before completion buffer is used.
- in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm and arm64. Bus_dmamap_sync() on these architectures is sufficient to ensure that all CPU stores are visible to external (including DMA) observers.
- Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems, buffers continuously owned (and accessed) by DMA must be allocated with this flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems (or coherent buses in mixed systems).
MFC after: 4 weeks
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D27446
(cherry picked from commit 8f9d5a8dbf4ea69c5f9a1e3a36e23732ffaa5c75)