Pre-calculate masks and page table/page directory bases, for LA48 and LA57 cases, trading a conditional for the memory accesses. This shaves 672 bytes of .text, by the cost of 32 .data bytes. Note that eliminated code contained one conditional, and several costly movabs instructions, which are traded by two memory fetches per vtop{t,d}e() inlines.
Details
- Reviewers
alc markj - Commits
- rG720a892ac60f: amd64 pmap: simplify vtopte() and vtopde()
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/amd64/amd64/pmap.c | ||
---|---|---|
1519 | I think these can be static. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
1519 | I can do this, but the patch is the part of larger series where the variables are used from cpu_switch.S. I do not mind adding 'static', but I will remove it shortly I hope. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
1521 | Do we get lucky, in the sense that the linker places both variables in the same cache line? |
sys/amd64/amd64/pmap.c | ||
---|---|---|
1519 | Ok, no need for the churn I guess. I think the amd64 dtrace_toxic_ranges() implementation could also be simplified in this case. Though, it looks like the real intent there is to block accesses to [0, VM_MAXUSER_ADDRESS]. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
1521 | In case of my specific kernel config, I see ffffffff8089f3e8 D vtoptem ffffffff8089f3f0 D PTmap ffffffff8089f3f8 D vtopdem ffffffff8089f400 D PDmap Which means that vtoptem/PTmap are in same case line definitely, and I am sure that vtopdem/PDmap are in two different cache lines. I am not sure that we want to create holes in read_mostly segment. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
1519 | Hmm, [0, VM_MAXUSER_ADDRESS] isn't right, the hole of non-canonical addresses must also be excluded. Do we have a suitable constant for this? VM_MIN_KERNEL_ADDRESS is not right either. UPT_MIN_ADDRESS has the right value, but its existing uses in the kernel don't really make sense to me and I'm not sure what it's supposed to mean on amd64. |
sys/amd64/amd64/pmap.c | ||
---|---|---|
1519 | This value cannot be constant, because the definition of canonical address depends on la57/la48 mode. In kernel the definition would be like (la57 ? VM_MAXUSER_ADDRESS_LA57 : VM_MAXUSER_ADDRESS_LA48) |
Here is the start of pmap_qenter(). Note the right shift followed by left shift:
ffffffff810619a0 <pmap_qenter>: ffffffff810619a0: 55 pushq %rbp ffffffff810619a1: 48 89 e5 movq %rsp, %rbp ffffffff810619a4: 41 56 pushq %r14 ffffffff810619a6: 53 pushq %rbx ffffffff810619a7: 49 89 f8 movq %rdi, %r8 ffffffff810619aa: 48 c1 ef 0c shrq $12, %rdi ffffffff810619ae: 48 23 3d a3 0e 7a 00 andq 7999139(%rip), %rdi # 0xffffffff81802858 <vtoptem> ffffffff810619b5: 48 c1 e7 03 shlq $3, %rdi ffffffff810619b9: 48 03 3d a0 0e 7a 00 addq 7999136(%rip), %rdi # 0xffffffff81802860 <PTmap> ffffffff810619c0: 48 63 c2 movslq %edx, %rax ffffffff810619c3: 4c 8d 0c c7 leaq (%rdi,%rax,8), %r9 ffffffff810619c7: 4c 39 cf cmpq %r9, %rdi ffffffff810619ca: 0f 83 ce 00 00 00 jae 0xffffffff81061a9e <pmap_qenter+0xfe>
It would be nice to eliminate the shlq by adjusting the mask so that we and first and shift second. However, if you rewrite the expression in the obvious way, the compiler won't know that the mask guarantees that the low-order 3 bits are zero so it will still shift right then left.