As mentioned in D20097 it's possible to address the issue of path-compressed radix trie collisions caused by multiple DMA mappings of the same physical address as well as make the common of case bounce-without-bounce almost as fast as it was before. This is accomplished by exposing whether or not the effective bus_dma implementation will simply be using an identity mapping or not.
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
I'll give this a spin and look for regressions in graphics land, but it might take a day or two. Can you add x11 as group reviewer?
The easiest (or only ?) way to make it work is to define bus_dma_id_mapped() as returning false on all unimplemented platforms and use double-registration patch from hans.
sys/x86/x86/busdma_bounce.c | ||
---|---|---|
235 | This blank line is not needed. | |
237 | And this. | |
238 | So you can write return (_bus_dmamap_pagesneeded(dmat, buf, buflen) == 0 and remove pagesneeded declaration at all. | |
524 | Style forbids initialization at locals declarations. | |
548 | There should be a blank line after '{' if no locals are declared. |
sys/compat/linuxkpi/common/src/linux_pci.c | ||
---|---|---|
555–560 | So, this path should now be pretty cheap, with the pctrie expected to be empty in the common case for bounce. But is it worth considering if we can avoid the mutex entirely? E.g. we could consider some kind of unlocked check of either pctrie_is_empty() or a new flag on priv indicating that we had ever inserted into the pctrie. |
Add support for arm64 plus rework _bus_dmamap_pagesneeded to short-circuit if we don't need an exact count but just need to know if *any* are needed.
Also as suggested I added an unlocked check for pctrie_is_empty() in linux_dma_unmap(). If this falsely indicates that the tree is empty when an insertion is actually in progress that implies linux_dma_map_phys() and linux_dma_unmap() are racing which is going to yield bigger issues.
Running the latest version of this patch on top of r347562M.
Machine boots and graphics comes up no problems, this with the built in Intel graphics in i7-3520M.
I'll leave the machine running during the night, in case something happens.
sys/compat/linuxkpi/common/src/linux_pci.c | ||
---|---|---|
548 | This of course cannot work. Esp. on architectures where IOMMU is required. |
This has been running fine during the night. From a graphics drivers no apparent regressions are visible, from light testing.
sys/compat/linuxkpi/common/src/linux_pci.c | ||
---|---|---|
495 | __aarch64__? |
sys/compat/linuxkpi/common/src/linux_pci.c | ||
---|---|---|
548 | No, of course this doesn't work on other architectures. The behavior there is same as it was before this effort began. With a bit of additional investigation, which I'm loath to make this change dependent on, I believe the other architectures can be enabled too. What I want to avoid is penalizing everyone with the reference counts. This approach of splitting the support -- where is known to work -- seemed the most sane way to move forward. |