Replace the DMAR unit's tlb_flush TAILQ by a custom list implementation that enables dmar_qi_task() to dequeue entries without holding the DMAR lock.
Details
# lockstat -C netperf -H lip3 -t TCP_MAERTS -D 1 -l 30
Before:
Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 32768 32768 30.01 3026.90 Adaptive mutex spin: 7742136 events in 30.074 seconds (257438 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 2758882 36% 36% 0.00 829 dmarhw iommu_domain_unload+0x1a8 2578220 33% 69% 0.00 2908 dmarhw dmar_qi_task+0x148 899545 12% 81% 0.00 4580 iodom dmar_domain_free_entry+0x52 844244 11% 91% 0.00 1543 iodom iommu_gas_map+0x370 653724 8% 100% 0.00 916 dmarhw dmar_qi_task+0x193 7498 0% 100% 0.00 494 so_rcv _sleep+0x231 8 0% 100% 0.00 660 buffer arena vmem_xfree+0x302 5 0% 100% 0.00 486 Giant usbd_do_request_flags+0x6d4 4 0% 100% 0.00 4620 Giant usbd_enum_lock+0x5e 3 0% 100% 0.00 75 IOMMU_MAP_ENTRY zone_put_bucket+0x28c 2 0% 100% 0.00 2235 ttymtx ttydev_write+0x192 1 0% 100% 0.00 3872 ttymtx ptsdev_poll+0x113 ------------------------------------------------------------------------------- Adaptive mutex block: 1 events in 30.074 seconds (0 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 1 100% 100% 0.00 4622 Giant usbd_enum_lock+0x5e ------------------------------------------------------------------------------- ...
After:
Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 32768 32768 30.00 3711.07 Adaptive mutex spin: 1154114 events in 30.113 seconds (38326 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 609410 53% 53% 0.00 749 iodom iommu_gas_map+0x384 538057 47% 99% 0.00 1175 iodom dmar_qi_task+0xf3 6593 1% 100% 0.00 864 so_rcv _sleep+0x231 19 0% 100% 0.00 841 Giant usbd_do_request_flags+0x6d4 15 0% 100% 0.00 1910 Giant usbd_enum_lock+0x5e 10 0% 100% 0.00 1457 dmarhw iommu_domain_unload+0x175 3 0% 100% 0.00 126 global softdep check_clear_deps+0x3d5 2 0% 100% 0.00 284 ix0:TX(1):callout _task_fn_admin+0x17e 1 0% 100% 0.00 3809 ttymtx ptsdev_poll+0x113 1 0% 100% 0.00 820 ttymtx ptsdev_read+0x1a7 1 0% 100% 0.00 3819 ttymtx ttydev_write+0x192 1 0% 100% 0.00 851 ehci0 usb_process+0xc1 1 0% 100% 0.00 65 IOMMU_MAP_ENTRY zone_put_bucket+0x28c ------------------------------------------------------------------------------- Adaptive mutex block: 4 events in 30.113 seconds (0 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 3 75% 75% 0.00 12516 Giant usbd_enum_lock+0x5e 1 25% 100% 0.00 7322 ehci0 usb_process+0xc1 ------------------------------------------------------------------------------- ...
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/x86/iommu/intel_qi.c | ||
---|---|---|
277 | So tlb_flush_tail entry cannot be freed before or while we are storing to it's tlb_flush_next, because dmar_qi_task never frees the last descriptor, and there we are guaranteed that tlb_flush_tail is the last because of the DMAR lock. Am I reading the algorithm correctly? I believe this should be explained in the comment. | |
346 | dmar_qi_invalidate_locked() ? | |
450 | Why this lock is needed? |
sys/x86/iommu/intel_qi.c | ||
---|---|---|
432–434 | Should I rename "head" to "zombie" to align with the description in intel_dmar.h? |
sys/x86/iommu/intel_qi.c | ||
---|---|---|
432–434 | IMO no, it is quite clear what it means both in the comment and in the code usage. Head indeed points to the list' head, and simultaneously this entry does not participate in the gas, so it is zombie. I believe that use of both terms there in fact makes the description cleaner. |