Page MenuHomeFreeBSD

iommu_gas: simplify match_one, merge insert into it, drop uppermatch
ClosedPublic

Authored by dougm on Jun 8 2022, 8:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 7:58 PM
Unknown Object (File)
Wed, Jan 22, 8:59 PM
Unknown Object (File)
Sat, Jan 18, 9:35 PM
Unknown Object (File)
Fri, Jan 17, 6:17 AM
Unknown Object (File)
Thu, Jan 16, 4:14 AM
Unknown Object (File)
Tue, Jan 14, 6:15 AM
Unknown Object (File)
Sun, Jan 12, 8:47 AM
Unknown Object (File)
Fri, Jan 10, 11:24 AM
Subscribers

Details

Summary

Eliminate some redundant calculation in iommu_gas_match one, and since callers invoke iommu_gas_match_insert if and only of iommu_gas_match_one returns true, merge match_insert into match_one. Merge lowermatch and uppermatch into one.

Test Plan
In a GENERIC-NODEBUG kernel, the command netperf -H lip3 -t TCP_MAERTS -D 1 repeated 6 times on the modified code produced these lines:

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec

65536 32768 32768 10.01 3561.18
65536 32768 32768 10.00 3294.36
65536 32768 32768 10.00 3239.29
65536 32768 32768 10.00 3494.65
65536 32768 32768 10.00 3456.08
65536 32768 32768 10.00 3454.40

and on the unmodified code produced these lines:
 65536  32768  32768    10.00    3544.47   
 65536  32768  32768    10.00    3298.67   
 65536  32768  32768    10.00    3464.01
 65536  32768  32768    10.00    3259.26   
 65536  32768  32768    10.00    3448.33   
 65536  32768  32768    10.00    3420.01

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm requested review of this revision.Jun 8 2022, 8:08 PM
dougm created this revision.
dougm edited the test plan for this revision. (Show Details)

Remove a couple of changes that leaked in from another patch.

Merge with an overlapping commit.

Delete uppermatch. Rename lowermatch to match. Put the find-next-element code into its own function. Reduces size to 8100 or so.

dougm retitled this revision from iommu_gas: simplify match_one, merge insert into it to iommu_gas: simplify match_one, merge insert into it, drop uppermatch.Jul 1 2022, 5:37 PM
dougm edited the summary of this revision. (Show Details)
sys/dev/iommu/iommu_gas.c
334

This part of the comment should be either removed or updated.

357–358

This comment exactly duplicates the herald. One of them should be eliminated.

394

This comment probably needs some expansion to explain that now we look up in the upper region.

455

I believe you can remove these two lines now, and update the assert below to check that error == 0 || error == ENOMEM

472

I cannot find this check in a new code. It cannot be an assert because drivers can specify highaddr above max domain width.

Apply reviewer suggestions.

dougm added inline comments.
sys/dev/iommu/iommu_gas.c
334

I've updated it.

472

If common->highaddr >= domain->end, then the tree-climbing loop of lines 432-433 will terminate with entry == NULL and control will fall through several entry != NULL tests before reaching 'return (ENOMEM);'.

This revision is now accepted and ready to land.Jul 2 2022, 10:10 AM
dougm marked an inline comment as done.

Replace ummin/ummax with MIN/MAX.

Reading the busdma man page, I see that the addresses excluded by lowaddr and highaddr are those in (lowaddr, highaddr]. Put in '+1' in a few places to get those checks right.

This revision now requires review to proceed.Jul 4 2022, 3:13 AM

Change ">=" to ">" in a few places to account for my new understanding that lowaddr can be allocated and highaddr cannot.

Pull all the match code up into find_space, which is otherwise a useless wrapper function.

sys/dev/iommu/iommu_gas.c
326–341

The expected case is that boundary crossing is a non-issue. These conditional branches should not be repeated in that expected case. As such, this is a pessimization of the original code.

Add braces, in a case where I thought any decent compiler could optimize-out an extra test, but evidently I'm not using a decent compiler.

dougm marked an inline comment as done.Jul 9 2022, 3:31 AM
dougm added inline comments.
sys/dev/iommu/iommu_gas.c
326–341

I figured on the compiler working out that the tests would be redundant in the common case, but I was wrong.

sys/dev/iommu/iommu_gas.c
342–343

The compiler reloads a->entry over and over:

1ef2: 48 8b 47 28                   movq    40(%rdi), %rax
1ef6: 4c 89 18                      movq    %r11, (%rax)
1ef9: 4b 8d 04 08                   leaq    (%r8,%r9), %rax
1efd: 48 05 ff 0f 00 00             addq    $4095, %rax             # imm = 0xFFF
1f03: 48 25 00 f0 ff ff             andq    $-4096, %rax            # imm = 0xF000
1f09: 4c 01 d8                      addq    %r11, %rax
1f0c: 48 8b 4f 28                   movq    40(%rdi), %rcx
1f10: 48 89 41 08                   movq    %rax, 8(%rcx)
1f14: 48 8b 47 28                   movq    40(%rdi), %rax
1f18: c7 40 28 04 00 00 00          movl    $4, 40(%rax)
1f1f: 48 8b 07                      movq    (%rdi), %rax
1f22: 48 8b 77 28                   movq    40(%rdi), %rsi
1f26: 48 83 c0 68                   addq    $104, %rax
1f2a: 48 89 c7                      movq    %rax, %rdi
1f2d: e8 00 00 00 00                callq   0x1f32 <iommu_gas_match_one+0xf2>
dougm marked an inline comment as done.

Remove 'entry' from the iommu params struct and pass it directly.

Remove 'entry' from the iommu params struct and pass it directly.

This isn't the right approach. It adds register pressure on the compiler to maintain the value for entry over the entire execution of "match one". Accordingly the compiler has to use another callee saves register and actually allocate some stack space. It's better to simply define a local "entry" and initialize it from a->entry right before RB tree insertion.

Put entry back into the struct-of-all-arguments. Pull it out into a local variable just before use. Do something similar with 'domain'.

Optimize match_one start. Fix whitespace.

sys/dev/iommu/iommu_gas.c
343–352

'next' would be more descriptive.

Accept online and offline reviewer suggestions.

This revision is now accepted and ready to land.Jul 10 2022, 7:18 PM
This revision was automatically updated to reflect the committed changes.