Page MenuHomeFreeBSD

arm64 pmap: Add ATTR_CONTIGUOUS support [Part 1]
ClosedPublic

Authored by alc on Nov 23 2023, 6:32 PM.
Tags
None
Referenced Files
F106964990: D42737.diff
Wed, Jan 8, 4:48 AM
Unknown Object (File)
Mon, Jan 6, 2:45 PM
Unknown Object (File)
Mon, Jan 6, 1:07 AM
Unknown Object (File)
Sun, Dec 29, 9:46 PM
Unknown Object (File)
Mon, Dec 23, 2:07 AM
Unknown Object (File)
Sat, Dec 21, 2:22 AM
Unknown Object (File)
Wed, Dec 18, 12:02 PM
Unknown Object (File)
Fri, Dec 13, 7:22 AM

Details

Summary

While this change only creates ATTR_CONTIGUOUS mappings in a few places, it adds all of the necessary code for handling them once they exist, including demotion, protection, and removal. Consequently, new ATTR_CONTIGUOUS usage can be added (and tested) incrementally.

The changes to copyinout.S and vmparam.h are really orthogonal and could be committed first. VM_LEVEL_0_ORDER is misconfigured when the page size is 16KB. It doesn't account for the page table page being 16KB.

Test Plan

The output from sysctl vm.pmap.kernel_maps should include:

  1. 64KB mappings on the kernel
0xffff000000000000-0xffff0000013bc000 rwx-sg     WB 0 9 27 12
  1. 64KB and/or 2MB mappings on devices
0xffff0000f8800000-0xffff0001085f4000 rw--sg    DEV 0 126 31 4

or

0xffff000108800000-0xffff000109800000 rw--sg DEV-NP 0 8 0 0

(More devices will use 64KB mappings once kern/subr_devmap.c is changed to allocate aligned virtual addresses.)

After a "buildworld" the output from sysctl vm.pmap should include non-zero counts for L2 fills and L3C demotions and removes:

vm.pmap.l3c.copies: 0
vm.pmap.l3c.protects: 0
vm.pmap.l3c.removes: 554
vm.pmap.l3c.mappings: 0
vm.pmap.l3c.demotions: 45859
vm.pmap.l2.fills: 2549

When an L2 demotion has to refill the L3 page table page, it sets ATTR_CONTIGUOUS on the L3 mappings, creating some user-space mappings with ATTR_CONTIGUOUS.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc requested review of this revision.Nov 23 2023, 6:32 PM
alc edited the test plan for this revision. (Show Details)

An Ampere Altra fails to boot with this patch applied:

panic: pmap_remove_kernel_l2: found existing mapping at 0xffffa83f717e29c0: 0x642c0002000711                                                                                                                                                                                                                                  
cpuid = 0                                                                                                                                                                                                                                                                                                                     
time = 2                                                                                                                                                                                                                                                                                                                      
KDB: stack backtrace:                                                                                                                                                                                                                                                                                                         
db_trace_self() at db_trace_self                                                                                                                                                                                                                                                                                              
db_trace_self_wrapper() at db_trace_self_wrapper+0x38                                                                                                                                                                                                                                                                         
vpanic() at vpanic+0x1a0                                                                                                                                                                                                                                                                                                      
panic() at panic+0x48                                                                                                                                                                                                                                                                                                         
pmap_remove_kernel_l2() at pmap_remove_kernel_l2+0x144                                                                                                                                                                                                                                                                        
pmap_kremove_device() at pmap_kremove_device+0xcc                                                                                                                                                                                                                                                                             
pmap_unmapdev() at pmap_unmapdev+0x90                                                                                                                                                                                                                                                                                         
nexus_deactivate_resource() at nexus_deactivate_resource+0x50                                                                                                                                                                                                                                                                 
pci_deactivate_resource() at pci_deactivate_resource+0x2c                                                                                                                                                                                                                                                                     
pci_deactivate_resource() at pci_deactivate_resource+0x2c                                                                                                                                                                                                                                                                     
resource_list_release() at resource_list_release+0x1c0                                                                                                                                                                                                                                                                        
pci_release_regions() at pci_release_regions+0x48                                                                                                                                                                                                                                                                             
mlx5_pci_close() at mlx5_pci_close+0x94                                                                                                                                                                                                                                                                                       
init_one() at init_one+0x11d8                                                                                                                                                                                                                                                                                                 
linux_pci_attach_device() at linux_pci_attach_device+0x404                                                                                                                                                                                                                                                                    
device_attach() at device_attach+0x3fc                                                                                                                                                                                                                                                                                        
device_probe_and_attach() at device_probe_and_attach+0x80                                                                                                                                                                                                                                                                     
pci_driver_added() at pci_driver_added+0x110                                                                                                                                                                                                                                                                                  
devclass_driver_added() at devclass_driver_added+0x48                                                                                                                                                                                                                                                                         
devclass_add_driver() at devclass_add_driver+0x148                                                                                                                                                                                                                                                                            
_linux_pci_register_driver() at _linux_pci_register_driver+0xbc                                                                                                                                                                                                                                                               
init() at init+0x18                                                                                                                                                                                                                                                                                                           
mi_startup() at mi_startup+0x1dc                                                                                                                                                                                                                                                                                              
virtdone() at virtdone+0x68                                                                                                                                                                                                                                                                                                   
KDB: enter: panic

mlx5 fails to attach (with or without the patch) and triggers the panic while detaching itself.

pmap_kremove_device: fix 2MB mapping removal; optimize TLB invalidation

There was one function where PTE_TO_PHYS/PHYS_TO_PTE conversion hadn't been done yet.

The Altra boots with the latest revision of the patch.

Altra kernel map: https://reviews.freebsd.org/P616 stats: https://reviews.freebsd.org/P617

DevKit kernel map: https://reviews.freebsd.org/P618

Still waiting on a buildworld for the devkit, it's rather slow compared to the Altra. :)

sys/arm64/arm64/pmap.c
4663

Perhaps we should introduce ATTR_PROMOTE, similar to other platforms.

6906

"L3 superpages"? The terminology used by comments is a bit inconsistent; elsewhere I see "contiguous superpage" and "64KB page".

7052

Is it possible to encounter an L3 superpage here? We already know that the mapping is clean and writeable.

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

In D42737#975855, @alc wrote:

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

Nevermind. :-) I see it for the Altra.

In D42737#975855, @alc wrote:

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

That's what I provided in the links - did you mean something else?

In both cases, this was the output of sysctl vm.pmap.kernel_maps immediately after booting. I can grab it from after the buildworld too, if that's useful.

In D42737#975855, @alc wrote:

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

That's what I provided in the links - did you mean something else?

In both cases, this was the output of sysctl vm.pmap.kernel_maps immediately after booting. I can grab it from after the buildworld too, if that's useful.

No, I just wasn't reading carefully before responding. I saw the counters from the Altra first.

The output from sysctl vm.pmap.kernel_maps looks good. It reinforces the following notion: After this change is committed, we should update kern/subr_devmap.c so that it allocates virtual addresses that align with the given physical addresses. Then, a lot more device mappings will use 64KB page mappings.

I have some local changes to subr_devmap.c in part to move pmap_mapbios there. I can have a look at aligning va and pa while I'm there.

sys/arm64/arm64/pmap.c
4663

Yes, @ehs3_rice.edu is going to make this change.

6906

Yes, @ehs3_rice.edu is going to change this to "L3C superpages".

7052

Yes. Above we asserted that a writeable L3C mapping must be dirty, not clean. Here we would be clearing the dirty bit on just one of the constituent pages.

sys/arm64/arm64/pmap.c
1145

The parentheses around the difference are unnecessary.

2071

It feels a bit iffy to perform this load and check without holding the pmap lock. I think it's safe so long as we don't have callers which try to map the same region in parallel (which would be a bug of course), but I had to think about it.

2084

Is it possible that attr may contain ATTR_CONTIGUOUS here? I think so, if the region being mapped is of size >= 64KB+2MB, suitably aligned, and we map the first portion as a L3C superpage.

2213

Typically we'd use __assert_unreachable() here, though then you can't pass values to the panic message. Alternately, keep the KASSERT() and add __unreachable().

4226

This should be dirtying mt, not m.

7052

I see, it's a bit too easy to forget that ATTR_S1_AP_RW_BIT being set means that the mapping is clean.

7792

I'd consider adding a helper to perform this truncation.

7795

Each use of va in this function (except the KTR log) masks off L3C_OFFSET; it'd be a bit simpler to do that once at the beginning of the function.

7885

Assert that ATTR_CONTIGUOUS is set?

sys/arm64/arm64/pmap.c
2084

Yes.

4226

Yes.

alc marked 9 inline comments as done.Nov 30 2023, 7:35 PM
alc added inline comments.
sys/arm64/arm64/pmap.c
2213

At this point, I don't feel like we really need the value.

alc marked an inline comment as done.

Eliot and I have addressed most of Mark's comments.

markj added inline comments.
sys/arm64/arm64/pmap.c
3698

This pattern appears a few times: we demote a L3C superpage, which requires a break-before-make sequence, and then remove a single entry from the demoted run of L3 mappings, requiring a second TLB invalidation. I wonder if it would be profitable to combine the two operations, so that the second TLB invalidation isn't necessary.

4671

As an optimization, could we skip over a L3C superpage here? That is, if ATTR_CONTIGUOUS is set for a PTE, is it necessary to inspect the rest of the PTEs in the block?

5685

If va_next can be zero, then isn't it possible for sva + L3C_OFFSET to wrap around as well?

5731

Shouldn't this be handled with a pmap_abort_ptp() call?

6261

Is this still true? I note that pmap_unwire() may be called on mlock()ed ranges after this point, but it looks like it tolerates not finding PTEs for the wired region.

7784

Most callers can safely ignore the return value, but a few (anything that may operate on the direct map) must not. I'd consider being explicit and either add a (void) cast in the places that do not need to check, or introduce a separate function which has a return type of void and does nothing but call this function, and use that function in places that currently don't need to check the return value.

This revision is now accepted and ready to land.Dec 8 2023, 10:47 PM
alc marked 7 inline comments as done.Mar 12 2024, 5:18 PM
alc added inline comments.
sys/arm64/arm64/pmap.c
1661–1663

My intention is to keep these XXX counters in our development branch, but remove them from the committed patch.

3698

We'll think about it.

4671

Yes, we have implemented that as part of a subsequent change, specifically, when promotion from 16 4KB pages to a 64KB page is added. As part of this change, it wouldn't have much effect.

5685

No, it can't. That's why we used L3C_OFFSET (65535) here, and not L3C_SIZE.

5731

This case is different from the one below. Here, we know that the decrement won't reduce the PTP's ref count to 0.

6261

Maybe, depending on whether there are concurrency related issues. If we only consider a single thread using and tearing down the address space, I think that the answer is "no".

7784

I'm adding the (void) casts.

alc marked 6 inline comments as done.

Add (void) casts. Refill a comment whose lines were too long.

This revision now requires review to proceed.Mar 12 2024, 7:07 PM

Teach sysctl vm.pmap.kernel_maps to correctly count ATTR_CONTIGUOUS superpages when the base page size is 16KB.

sys/arm64/include/vmparam.h
109

Shouldn't this be 12? If an allocation of 32MB is permitted, then the maximum allocation order is 11, so there are 12 possible allocation orders.

alc marked an inline comment as done.Mar 18 2024, 4:58 PM
alc added a subscriber: gallatin.
alc added inline comments.
sys/arm64/include/vmparam.h
109

Yes, you are correct. The email from @gallatin confirms it.

alc marked an inline comment as done.

Correct VM_NFREEORDER for 16KB page size.

Request aligned virtual addresses in pmap_mapdev{,_attr}().

This revision was not accepted when it landed; it landed in state Needs Review.Mar 24 2024, 5:48 PM
This revision was automatically updated to reflect the committed changes.
alc updated this revision to Diff 136172.

Reopen after reservation size fix was committed.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 30 2024, 6:47 PM
This revision was automatically updated to reflect the committed changes.