Page MenuHomeFreeBSD

x86/iommu: Reduce DMAR lock contention
ClosedPublic

Authored by alc on Jul 26 2022, 5:38 PM.
Tags
None
Referenced Files
F102952282: D35951.diff
Tue, Nov 19, 3:25 AM
Unknown Object (File)
Fri, Nov 8, 3:10 PM
Unknown Object (File)
Fri, Nov 1, 8:46 PM
Unknown Object (File)
Thu, Oct 31, 11:25 PM
Unknown Object (File)
Oct 9 2024, 3:03 AM
Unknown Object (File)
Oct 7 2024, 5:46 AM
Unknown Object (File)
Oct 5 2024, 10:26 PM
Unknown Object (File)
Oct 4 2024, 11:10 AM
Subscribers

Details

Summary

Replace the DMAR unit's tlb_flush TAILQ by a custom list implementation that enables dmar_qi_task() to dequeue entries without holding the DMAR lock.

Test Plan
# lockstat -C netperf -H lip3 -t TCP_MAERTS -D 1 -l 30

Before:

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65536  32768  32768    30.01    3026.90

Adaptive mutex spin: 7742136 events in 30.074 seconds (257438 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
2758882  36%  36% 0.00      829 dmarhw                 iommu_domain_unload+0x1a8
2578220  33%  69% 0.00     2908 dmarhw                 dmar_qi_task+0x148
899545  12%  81% 0.00     4580 iodom                  dmar_domain_free_entry+0x52
844244  11%  91% 0.00     1543 iodom                  iommu_gas_map+0x370
653724   8% 100% 0.00      916 dmarhw                 dmar_qi_task+0x193
 7498   0% 100% 0.00      494 so_rcv                 _sleep+0x231
    8   0% 100% 0.00      660 buffer arena           vmem_xfree+0x302
    5   0% 100% 0.00      486 Giant                  usbd_do_request_flags+0x6d4
    4   0% 100% 0.00     4620 Giant                  usbd_enum_lock+0x5e
    3   0% 100% 0.00       75 IOMMU_MAP_ENTRY        zone_put_bucket+0x28c
    2   0% 100% 0.00     2235 ttymtx                 ttydev_write+0x192
    1   0% 100% 0.00     3872 ttymtx                 ptsdev_poll+0x113
-------------------------------------------------------------------------------

Adaptive mutex block: 1 events in 30.074 seconds (0 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
    1 100% 100% 0.00     4622 Giant                  usbd_enum_lock+0x5e
-------------------------------------------------------------------------------

...

After:

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65536  32768  32768    30.00    3711.07

Adaptive mutex spin: 1154114 events in 30.113 seconds (38326 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
609410  53%  53% 0.00      749 iodom                  iommu_gas_map+0x384
538057  47%  99% 0.00     1175 iodom                  dmar_qi_task+0xf3
 6593   1% 100% 0.00      864 so_rcv                 _sleep+0x231
   19   0% 100% 0.00      841 Giant                  usbd_do_request_flags+0x6d4
   15   0% 100% 0.00     1910 Giant                  usbd_enum_lock+0x5e
   10   0% 100% 0.00     1457 dmarhw                 iommu_domain_unload+0x175
    3   0% 100% 0.00      126 global softdep         check_clear_deps+0x3d5
    2   0% 100% 0.00      284 ix0:TX(1):callout      _task_fn_admin+0x17e
    1   0% 100% 0.00     3809 ttymtx                 ptsdev_poll+0x113
    1   0% 100% 0.00      820 ttymtx                 ptsdev_read+0x1a7
    1   0% 100% 0.00     3819 ttymtx                 ttydev_write+0x192
    1   0% 100% 0.00      851 ehci0                  usb_process+0xc1
    1   0% 100% 0.00       65 IOMMU_MAP_ENTRY        zone_put_bucket+0x28c
-------------------------------------------------------------------------------

Adaptive mutex block: 4 events in 30.113 seconds (0 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
    3  75%  75% 0.00    12516 Giant                  usbd_enum_lock+0x5e
    1  25% 100% 0.00     7322 ehci0                  usb_process+0xc1
-------------------------------------------------------------------------------

...

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc requested review of this revision.Jul 26 2022, 5:38 PM
alc created this revision.
alc edited the test plan for this revision. (Show Details)
alc edited the test plan for this revision. (Show Details)

Add a couple comments.

sys/x86/iommu/intel_qi.c
277

So tlb_flush_tail entry cannot be freed before or while we are storing to it's tlb_flush_next, because dmar_qi_task never frees the last descriptor, and there we are guaranteed that tlb_flush_tail is the last because of the DMAR lock. Am I reading the algorithm correctly?

I believe this should be explained in the comment.

343

dmar_qi_invalidate_locked() ?

447

Why this lock is needed?

sys/x86/iommu/intel_qi.c
277

You are correct.

343

dmar_qi_invalidate_sync() is where the comment about lost wakeups resides.

447

So that we do not call wakeup before the incrementing thread has atomically slept and released the DMAR lock.

This revision is now accepted and ready to land.Jul 28 2022, 1:35 PM
sys/x86/iommu/intel_qi.c
429–431

Should I rename "head" to "zombie" to align with the description in intel_dmar.h?

sys/x86/iommu/intel_qi.c
429–431

IMO no, it is quite clear what it means both in the comment and in the code usage. Head indeed points to the list' head, and simultaneously this entry does not participate in the gas, so it is zombie. I believe that use of both terms there in fact makes the description cleaner.

This revision was automatically updated to reflect the committed changes.