Fix regression issues after r346645 in the LinuxKPI
Allow loading the same DMA address multiple times without any prior unload.
Sponsored by: Mellanox Technologies
Differential D20097
Fix regression issues after r346645 in the LinuxKPI • hselasky on Apr 29 2019, 4:16 PM. Authored by Tags None Referenced Files
Details
Diff Detail
Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older Changes
Comment Actions Rebase patch and fix multiple issues:
Comment Actions Ideally #2 and #3 would be discrete commits but I see the value in single place to point folks to. Comment Actions I've been running this for about 8 hours, on a mostly idle laptop with Firefox running, so far so good. I would like to give the reporters on the mailing list some time to test as well, since the hang is very random (I might have been lucky). I'll keep running the patch though. Comment Actions Tested with rping loop: for i in $(seq 1 100); do for j in $(seq 23 65535); do rping -s -C 1 -S $j; done;done Comment Actions This patch and r346984M, drm-current-kmod-4.16.g20190430 work fine, firefox and huge load 3D game. Comment Actions Would it be prudent to decouple #2 and #3? The fix for #1 raises some interesting implementation questions and may ultimately be fixed better in the driver anyway. Comment Actions I've had further success stories graphics wise, and my laptop is still running, so from a graphics perspective, this is good to go. Comment Actions @kib: I was thinking it would be better in a followup commit to add a debug knob to print the backtrace when double mappings happen or when the unmap cannot find the DMA address, and trace this down in the ibcore code. It is apparently a bug. Right now DRM-next and IBCORE works with this patch, and I think the current behaviour to kill old mappings on the same address is fine. Comment Actions I disagree completely. Suppose that two RDMA clients mmap the same shared memory, and then use the same buffer for transfers. Comment Actions @kib: Using the same memory location like this is not allowed or we should not allow it. Because scatter gather lists are involved, we cannot assume that a memory mapping starting at the same physical address are identical, just by comparing the size of the object (!) Technically the right approach is to fail double DMA maps, by returning 0 on the second approach. Right now that will fail ibcore .... Comment Actions Why ?
Then we should provide a method to identify and distinguish them.
No, it is wrong approach, and you already know this. Comment Actions For sg lists the tracking is clearly wrong right now. Linuxkpi should store each segment mapping into the trie, instead of only doing that for the first one. Hm, in fact I backtrack and wonder why do you need to store scatterlist mapping in trie at all. The trie is used to look up mapping by its bus address. For scatterlist, unmap is passed the original scatterlist which contains all the information, and there is no need to store any side-band data at all. Or if needed, the additional fields can be added to scatterlist (I realized that you want to store the dma_map there).
Comment Actions
You are right! I can store the DMA map pointer in the scatterlist itself. I'll update the patch. Comment Actions I don't see how multiple mappings aren't a bug. The linux API doesn't do any ref-counting. There the first unmap would wipe out both. If there is sharing of an underlying resource that needs to be coordinated at a higher level; this isn't the place. Tolerating it to provide some cover is one thing and I'm not sure I even agree that is the right approach. IMHO fixing the driver is. Comment Actions The double-mappings should not be bugs same as they are fine for our native busdma KPI. Consider:
One of my points is that physical address is user-controllable, in some situations, so the KPI must handle duplicates.
Comment Actions Overall it looks fine now, please fix small nits I pointed out, mostly the natural assertions that can improve the code. Comment Actions I can see how you may end up calling map a few times with the same address: e.g. a NIC sending the same packet to multiple hosts or something like that and basically you end up calling dma_map a few times with the same physical address. For bounce-with-bounce and DMAR the DMA address will be different and you are fine. It's the bounce-without-bounce case that appears to have gotten us into trouble. Too bad you need to call _bus_dmamap_load_phys to find out you are in this case and need to ref-count to keep your dmamap allocations consistent. In the case where physical address equals DMA address and the pctrie lookup succeeds the complexity of checking lengths and swapping a new mapping for the old isn't needed. If it was needed you’d have a nasty race where the old mapping was invalidated with something mid-flight. With that in mind, this should be sufficient: old = LINUX_DMA_PCTRIE_LOOKUP(&priv->ptree, obj->dma_addr); if (unlikely(old != NULL)) { old->dma_refcount++; bus_dmamap_unload(priv->dmat, obj->dma_map); bus_dmamap_destroy(priv->dmat, obj->dma_map); DMA_PRIV_UNLOCK(priv); uma_zfree(linux_dma_obj_zone, obj); return (old->dma_addr); } and avoids a cycle of LINUX_DMA_PCTRIE_REMOVE(), LINUX_DMA_PCTRIE_INSERT() along with a call to uma_zfree() while holding the DMA lock. Comment Actions Why do you suggest that length check is not needed ? Putting the discussion of a possible race aside, why the lengths of two loads must be the same ?
Comment Actions
It's a bit an exaggeration to say the length doesn't matter. And multiple mappings can indeed be of different lengths. What I should say is that if the length actually did have any impact then this code wouldn't work, or at least would have some race conditions, In thinking about this further the overhead of the added LINUX_DMA_PCTRIE_LOOKUP() imposed on DMAR and bounce with bounce and the overhead of bus_dmamap_create(), _bus_dmamap_load_phys(), bus_dmamap_unload() and bus_dmamap_destroy() imposed on bounce without bounce aren’t necessary. If you can agree with addition of: int bus_dma_could_bounce(bus_dma_tag_t dmat) { return (dmat->common.impl != &bus_dma_bounce_impl || dmat->bounce_flags & BUS_DMA_COULD_BOUNCE); } or something like that then the remainder can be greatly simplified to something like: linux_dma_map_phys() { ... if (!bus_dma_could_bounce(priv->dmat)) { return (phys); } … } linux_dma_unmap() { … if (!bus_dma_could_bounce(priv->dmat)) { return; } ... } and the point about lengths, races and maintaining reference counters goes away entirely. What I’m not a fan of is the current approach which does more than necessary but leaves open the possibility of doing not enough should it actually become necessary so why have a false sense of security. Comment Actions I've been running the new version of the patch for about 2 hours without any graphics issues. I'll let the computer run through the night. Comment Actions Even though addresses might not bounce, they might be translated. I'm not sure how common that is. --HPS Comment Actions This is fine in principle, but its usage below is not correct.
No, it is not, because the presence of BUS_DMA_COULD_BOUNCE does not mean that busdma always bounces. For instance, if the flag was set because the mask is limited, but the allocated pages all fall below the maxaddr, busdma_bounce.c is optimized enough to note that and avoid copying. In other words, even if bus_dma_could_bounce() returned true, does not imply that we cannot get two loads of the same phys address with identical bus addresses.
What do you mean by 'not enough' ? Are you able to provide (a hypothetical) situation where the current Hans' patch would break ? Comment Actions This is still running fine from a graphics perspective. Comment Actions Decouple #2 and #3 from #1. They are ready. #1 is still being discussed. It seems possible to simplify it. kib@ raised some valid points, and I'd like to respond. I need a little more time to think about it. Comment Actions Joining a few threads on this back together, below is a fleshed out version (minus arm64) of what I am thinking. It removes some penalty (the additional LOOKUP) when dmar is enabled and significantly reduces overhead in the bounce-without-bounce case. diff --git a/sys/compat/linuxkpi/common/src/linux_pci.c b/sys/compat/linuxkpi/common/src/linux_pci.c index e135298b0b9..8877d1a6956 100644 --- a/sys/compat/linuxkpi/common/src/linux_pci.c +++ b/sys/compat/linuxkpi/common/src/linux_pci.c @@ -502,6 +502,15 @@ linux_dma_map_phys(struct device *dev, vm_paddr_t phys, size_t len) priv = dev->dma_priv; + /* + * If the resultant mapping will be entirely 1:1 with the + * physical address, short-circuit the remainder of the + * bus_dma API. This avoids tracking collisions in the pctrie + * with the additional benefit of reducing overhead. + */ + if (bus_dma_id_mapped(priv->dmat, phys, len)) + return (phys); + obj = uma_zalloc(linux_dma_obj_zone, 0); DMA_PRIV_LOCK(priv); diff --git a/sys/x86/include/bus_dma.h b/sys/x86/include/bus_dma.h index f8a75682c6b..b4da787e9cf 100644 --- a/sys/x86/include/bus_dma.h +++ b/sys/x86/include/bus_dma.h @@ -35,6 +35,18 @@ #include <x86/busdma_impl.h> +/* + * Is DMA address 1:1 mapping of physical address + */ +static inline bool +bus_dma_id_mapped(bus_dma_tag_t dmat, vm_paddr_t buf, bus_size_t buflen) +{ + struct bus_dma_tag_common *tc; + + tc = (struct bus_dma_tag_common *)dmat; + return (tc->impl->id_mapped(dmat, buf, buflen)); +} + /* * Allocate a handle for mapping from kva/uva/physical * address space into bus device space. diff --git a/sys/x86/include/busdma_impl.h b/sys/x86/include/busdma_impl.h index 7b1dbeb0341..97c335f9a79 100644 --- a/sys/x86/include/busdma_impl.h +++ b/sys/x86/include/busdma_impl.h @@ -62,6 +62,7 @@ struct bus_dma_impl { void *lockfuncarg, bus_dma_tag_t *dmat); int (*tag_destroy)(bus_dma_tag_t dmat); int (*tag_set_domain)(bus_dma_tag_t); + bool (*id_mapped)(bus_dma_tag_t, vm_paddr_t, bus_size_t); int (*map_create)(bus_dma_tag_t dmat, int flags, bus_dmamap_t *mapp); int (*map_destroy)(bus_dma_tag_t dmat, bus_dmamap_t map); int (*mem_alloc)(bus_dma_tag_t dmat, void** vaddr, int flags, diff --git a/sys/x86/iommu/busdma_dmar.c b/sys/x86/iommu/busdma_dmar.c index 913f5b2af39..7cd68c5d26f 100644 --- a/sys/x86/iommu/busdma_dmar.c +++ b/sys/x86/iommu/busdma_dmar.c @@ -365,6 +365,13 @@ dmar_bus_dma_tag_destroy(bus_dma_tag_t dmat1) return (error); } +static bool +dmar_bus_dma_id_mapped(bus_dma_tag_t dmat, vm_paddr_t buf, bus_size_t buflen) +{ + + return (false); +} + static int dmar_bus_dmamap_create(bus_dma_tag_t dmat, int flags, bus_dmamap_t *mapp) { @@ -857,6 +864,7 @@ struct bus_dma_impl bus_dma_dmar_impl = { .tag_create = dmar_bus_dma_tag_create, .tag_destroy = dmar_bus_dma_tag_destroy, .tag_set_domain = dmar_bus_dma_tag_set_domain, + .id_mapped = dmar_bus_dma_id_mapped, .map_create = dmar_bus_dmamap_create, .map_destroy = dmar_bus_dmamap_destroy, .mem_alloc = dmar_bus_dmamem_alloc, diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c index edc90621edf..3d7dc7fad79 100644 --- a/sys/x86/x86/busdma_bounce.c +++ b/sys/x86/x86/busdma_bounce.c @@ -141,6 +141,8 @@ static int reserve_bounce_pages(bus_dma_tag_t dmat, bus_dmamap_t map, static bus_addr_t add_bounce_page(bus_dma_tag_t dmat, bus_dmamap_t map, vm_offset_t vaddr, vm_paddr_t addr1, vm_paddr_t addr2, bus_size_t size); static void free_bounce_page(bus_dma_tag_t dmat, struct bounce_page *bpage); +static int _bus_dmamap_pagesneeded(bus_dma_tag_t dmat, vm_paddr_t buf, + bus_size_t buflen); static void _bus_dmamap_count_pages(bus_dma_tag_t dmat, bus_dmamap_t map, pmap_t pmap, void *buf, bus_size_t buflen, int flags); static void _bus_dmamap_count_phys(bus_dma_tag_t dmat, bus_dmamap_t map, @@ -223,6 +225,19 @@ bounce_bus_dma_tag_create(bus_dma_tag_t parent, bus_size_t alignment, return (error); } +static bool +bounce_bus_dma_id_mapped(bus_dma_tag_t dmat, vm_paddr_t buf, bus_size_t buflen) +{ + int pagesneeded; + + if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) == 0) + return (true); + + pagesneeded = _bus_dmamap_pagesneeded(dmat, buf, buflen); + + return (pagesneeded == 0); +} + /* * Update the domain for the tag. We may need to reallocate the zone and * bounce pages. @@ -501,29 +516,38 @@ bounce_bus_dmamem_free(bus_dma_tag_t dmat, void *vaddr, bus_dmamap_t map) dmat->bounce_flags); } -static void -_bus_dmamap_count_phys(bus_dma_tag_t dmat, bus_dmamap_t map, vm_paddr_t buf, - bus_size_t buflen, int flags) +static int +_bus_dmamap_pagesneeded(bus_dma_tag_t dmat, vm_paddr_t buf, bus_size_t buflen) { vm_paddr_t curaddr; bus_size_t sgsize; + int pagesneeded = 0; - if (map != &nobounce_dmamap && map->pagesneeded == 0) { - /* - * Count the number of bounce pages - * needed in order to complete this transfer - */ - curaddr = buf; - while (buflen != 0) { - sgsize = MIN(buflen, dmat->common.maxsegsz); - if (bus_dma_run_filter(&dmat->common, curaddr)) { - sgsize = MIN(sgsize, - PAGE_SIZE - (curaddr & PAGE_MASK)); - map->pagesneeded++; - } - curaddr += sgsize; - buflen -= sgsize; + /* + * Count the number of bounce pages needed in order to + * complete this transfer + */ + curaddr = buf; + while (buflen != 0) { + sgsize = MIN(buflen, dmat->common.maxsegsz); + if (bus_dma_run_filter(&dmat->common, curaddr)) { + sgsize = MIN(sgsize, + PAGE_SIZE - (curaddr & PAGE_MASK)); + pagesneeded++; } + curaddr += sgsize; + buflen -= sgsize; + } + + return (pagesneeded); +} + +static void +_bus_dmamap_count_phys(bus_dma_tag_t dmat, bus_dmamap_t map, vm_paddr_t buf, + bus_size_t buflen, int flags) +{ + if (map != &nobounce_dmamap && map->pagesneeded == 0) { + map->pagesneeded = _bus_dmamap_pagesneeded(dmat, buf, buflen); CTR1(KTR_BUSDMA, "pagesneeded= %d\n", map->pagesneeded); } } @@ -1305,6 +1329,7 @@ struct bus_dma_impl bus_dma_bounce_impl = { .tag_create = bounce_bus_dma_tag_create, .tag_destroy = bounce_bus_dma_tag_destroy, .tag_set_domain = bounce_bus_dma_tag_set_domain, + .id_mapped = bounce_bus_dma_id_mapped, .map_create = bounce_bus_dmamap_create, .map_destroy = bounce_bus_dmamap_destroy, .mem_alloc = bounce_bus_dmamem_alloc, |