Page MenuHomeFreeBSD

arm64 pmap: Add ATTR_CONTIGUOUS support [Part 1]
ClosedPublic

Authored by alc on Nov 23 2023, 6:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 6, 2:45 PM
Unknown Object (File)
Mon, Jan 6, 1:07 AM
Unknown Object (File)
Sun, Dec 29, 9:46 PM
Unknown Object (File)
Mon, Dec 23, 2:07 AM
Unknown Object (File)
Sat, Dec 21, 2:22 AM
Unknown Object (File)
Wed, Dec 18, 12:02 PM
Unknown Object (File)
Fri, Dec 13, 7:22 AM
Unknown Object (File)
Tue, Dec 10, 9:51 AM

Details

Summary

While this change only creates ATTR_CONTIGUOUS mappings in a few places, it adds all of the necessary code for handling them once they exist, including demotion, protection, and removal. Consequently, new ATTR_CONTIGUOUS usage can be added (and tested) incrementally.

The changes to copyinout.S and vmparam.h are really orthogonal and could be committed first. VM_LEVEL_0_ORDER is misconfigured when the page size is 16KB. It doesn't account for the page table page being 16KB.

Test Plan

The output from sysctl vm.pmap.kernel_maps should include:

  1. 64KB mappings on the kernel
0xffff000000000000-0xffff0000013bc000 rwx-sg     WB 0 9 27 12
  1. 64KB and/or 2MB mappings on devices
0xffff0000f8800000-0xffff0001085f4000 rw--sg    DEV 0 126 31 4

or

0xffff000108800000-0xffff000109800000 rw--sg DEV-NP 0 8 0 0

(More devices will use 64KB mappings once kern/subr_devmap.c is changed to allocate aligned virtual addresses.)

After a "buildworld" the output from sysctl vm.pmap should include non-zero counts for L2 fills and L3C demotions and removes:

vm.pmap.l3c.copies: 0
vm.pmap.l3c.protects: 0
vm.pmap.l3c.removes: 554
vm.pmap.l3c.mappings: 0
vm.pmap.l3c.demotions: 45859
vm.pmap.l2.fills: 2549

When an L2 demotion has to refill the L3 page table page, it sets ATTR_CONTIGUOUS on the L3 mappings, creating some user-space mappings with ATTR_CONTIGUOUS.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

alc requested review of this revision.Nov 23 2023, 6:32 PM
alc edited the test plan for this revision. (Show Details)

An Ampere Altra fails to boot with this patch applied:

panic: pmap_remove_kernel_l2: found existing mapping at 0xffffa83f717e29c0: 0x642c0002000711                                                                                                                                                                                                                                  
cpuid = 0                                                                                                                                                                                                                                                                                                                     
time = 2                                                                                                                                                                                                                                                                                                                      
KDB: stack backtrace:                                                                                                                                                                                                                                                                                                         
db_trace_self() at db_trace_self                                                                                                                                                                                                                                                                                              
db_trace_self_wrapper() at db_trace_self_wrapper+0x38                                                                                                                                                                                                                                                                         
vpanic() at vpanic+0x1a0                                                                                                                                                                                                                                                                                                      
panic() at panic+0x48                                                                                                                                                                                                                                                                                                         
pmap_remove_kernel_l2() at pmap_remove_kernel_l2+0x144                                                                                                                                                                                                                                                                        
pmap_kremove_device() at pmap_kremove_device+0xcc                                                                                                                                                                                                                                                                             
pmap_unmapdev() at pmap_unmapdev+0x90                                                                                                                                                                                                                                                                                         
nexus_deactivate_resource() at nexus_deactivate_resource+0x50                                                                                                                                                                                                                                                                 
pci_deactivate_resource() at pci_deactivate_resource+0x2c                                                                                                                                                                                                                                                                     
pci_deactivate_resource() at pci_deactivate_resource+0x2c                                                                                                                                                                                                                                                                     
resource_list_release() at resource_list_release+0x1c0                                                                                                                                                                                                                                                                        
pci_release_regions() at pci_release_regions+0x48                                                                                                                                                                                                                                                                             
mlx5_pci_close() at mlx5_pci_close+0x94                                                                                                                                                                                                                                                                                       
init_one() at init_one+0x11d8                                                                                                                                                                                                                                                                                                 
linux_pci_attach_device() at linux_pci_attach_device+0x404                                                                                                                                                                                                                                                                    
device_attach() at device_attach+0x3fc                                                                                                                                                                                                                                                                                        
device_probe_and_attach() at device_probe_and_attach+0x80                                                                                                                                                                                                                                                                     
pci_driver_added() at pci_driver_added+0x110                                                                                                                                                                                                                                                                                  
devclass_driver_added() at devclass_driver_added+0x48                                                                                                                                                                                                                                                                         
devclass_add_driver() at devclass_add_driver+0x148                                                                                                                                                                                                                                                                            
_linux_pci_register_driver() at _linux_pci_register_driver+0xbc                                                                                                                                                                                                                                                               
init() at init+0x18                                                                                                                                                                                                                                                                                                           
mi_startup() at mi_startup+0x1dc                                                                                                                                                                                                                                                                                              
virtdone() at virtdone+0x68                                                                                                                                                                                                                                                                                                   
KDB: enter: panic

mlx5 fails to attach (with or without the patch) and triggers the panic while detaching itself.

pmap_kremove_device: fix 2MB mapping removal; optimize TLB invalidation

There was one function where PTE_TO_PHYS/PHYS_TO_PTE conversion hadn't been done yet.

The Altra boots with the latest revision of the patch.

Altra kernel map: https://reviews.freebsd.org/P616 stats: https://reviews.freebsd.org/P617

DevKit kernel map: https://reviews.freebsd.org/P618

Still waiting on a buildworld for the devkit, it's rather slow compared to the Altra. :)

sys/arm64/arm64/pmap.c
4663 ↗(On Diff #130581)

Perhaps we should introduce ATTR_PROMOTE, similar to other platforms.

6906 ↗(On Diff #130581)

"L3 superpages"? The terminology used by comments is a bit inconsistent; elsewhere I see "contiguous superpage" and "64KB page".

7052 ↗(On Diff #130581)

Is it possible to encounter an L3 superpage here? We already know that the mapping is clean and writeable.

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

In D42737#975855, @alc wrote:

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

Nevermind. :-) I see it for the Altra.

In D42737#975855, @alc wrote:

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

That's what I provided in the links - did you mean something else?

In both cases, this was the output of sysctl vm.pmap.kernel_maps immediately after booting. I can grab it from after the buildworld too, if that's useful.

In D42737#975855, @alc wrote:

Thanks, Mark. We'd like to see the output from sysctl vm.pmap.kernel_maps too.

That's what I provided in the links - did you mean something else?

In both cases, this was the output of sysctl vm.pmap.kernel_maps immediately after booting. I can grab it from after the buildworld too, if that's useful.

No, I just wasn't reading carefully before responding. I saw the counters from the Altra first.

The output from sysctl vm.pmap.kernel_maps looks good. It reinforces the following notion: After this change is committed, we should update kern/subr_devmap.c so that it allocates virtual addresses that align with the given physical addresses. Then, a lot more device mappings will use 64KB page mappings.

I have some local changes to subr_devmap.c in part to move pmap_mapbios there. I can have a look at aligning va and pa while I'm there.

sys/arm64/arm64/pmap.c
4663 ↗(On Diff #130581)

Yes, @ehs3_rice.edu is going to make this change.

6906 ↗(On Diff #130581)

Yes, @ehs3_rice.edu is going to change this to "L3C superpages".

7052 ↗(On Diff #130581)

Yes. Above we asserted that a writeable L3C mapping must be dirty, not clean. Here we would be clearing the dirty bit on just one of the constituent pages.

sys/arm64/arm64/pmap.c
1145 ↗(On Diff #130581)

The parentheses around the difference are unnecessary.

2071 ↗(On Diff #130581)

It feels a bit iffy to perform this load and check without holding the pmap lock. I think it's safe so long as we don't have callers which try to map the same region in parallel (which would be a bug of course), but I had to think about it.

2084 ↗(On Diff #130581)

Is it possible that attr may contain ATTR_CONTIGUOUS here? I think so, if the region being mapped is of size >= 64KB+2MB, suitably aligned, and we map the first portion as a L3C superpage.

2213 ↗(On Diff #130581)

Typically we'd use __assert_unreachable() here, though then you can't pass values to the panic message. Alternately, keep the KASSERT() and add __unreachable().

4226 ↗(On Diff #130581)

This should be dirtying mt, not m.

7052 ↗(On Diff #130581)

I see, it's a bit too easy to forget that ATTR_S1_AP_RW_BIT being set means that the mapping is clean.

7792 ↗(On Diff #130581)

I'd consider adding a helper to perform this truncation.

7795 ↗(On Diff #130581)

Each use of va in this function (except the KTR log) masks off L3C_OFFSET; it'd be a bit simpler to do that once at the beginning of the function.

7885 ↗(On Diff #130581)

Assert that ATTR_CONTIGUOUS is set?

sys/arm64/arm64/pmap.c
2084 ↗(On Diff #130581)

Yes.

4226 ↗(On Diff #130581)

Yes.

alc marked 9 inline comments as done.Nov 30 2023, 7:35 PM
alc added inline comments.
sys/arm64/arm64/pmap.c
2213 ↗(On Diff #130581)

At this point, I don't feel like we really need the value.

alc marked an inline comment as done.

Eliot and I have addressed most of Mark's comments.

markj added inline comments.
sys/arm64/arm64/pmap.c
3698 ↗(On Diff #130868)

This pattern appears a few times: we demote a L3C superpage, which requires a break-before-make sequence, and then remove a single entry from the demoted run of L3 mappings, requiring a second TLB invalidation. I wonder if it would be profitable to combine the two operations, so that the second TLB invalidation isn't necessary.

4671 ↗(On Diff #130868)

As an optimization, could we skip over a L3C superpage here? That is, if ATTR_CONTIGUOUS is set for a PTE, is it necessary to inspect the rest of the PTEs in the block?

5685 ↗(On Diff #130868)

If va_next can be zero, then isn't it possible for sva + L3C_OFFSET to wrap around as well?

5731 ↗(On Diff #130868)

Shouldn't this be handled with a pmap_abort_ptp() call?

6261 ↗(On Diff #130868)

Is this still true? I note that pmap_unwire() may be called on mlock()ed ranges after this point, but it looks like it tolerates not finding PTEs for the wired region.

7784 ↗(On Diff #130868)

Most callers can safely ignore the return value, but a few (anything that may operate on the direct map) must not. I'd consider being explicit and either add a (void) cast in the places that do not need to check, or introduce a separate function which has a return type of void and does nothing but call this function, and use that function in places that currently don't need to check the return value.

This revision is now accepted and ready to land.Dec 8 2023, 10:47 PM
alc marked 7 inline comments as done.Mar 12 2024, 5:18 PM
alc added inline comments.
sys/arm64/arm64/pmap.c
1661–1663 ↗(On Diff #130868)

My intention is to keep these XXX counters in our development branch, but remove them from the committed patch.

3698 ↗(On Diff #130868)

We'll think about it.

4671 ↗(On Diff #130868)

Yes, we have implemented that as part of a subsequent change, specifically, when promotion from 16 4KB pages to a 64KB page is added. As part of this change, it wouldn't have much effect.

5685 ↗(On Diff #130868)

No, it can't. That's why we used L3C_OFFSET (65535) here, and not L3C_SIZE.

5731 ↗(On Diff #130868)

This case is different from the one below. Here, we know that the decrement won't reduce the PTP's ref count to 0.

6261 ↗(On Diff #130868)

Maybe, depending on whether there are concurrency related issues. If we only consider a single thread using and tearing down the address space, I think that the answer is "no".

7784 ↗(On Diff #130868)

I'm adding the (void) casts.

alc marked 6 inline comments as done.

Add (void) casts. Refill a comment whose lines were too long.

This revision now requires review to proceed.Mar 12 2024, 7:07 PM

Teach sysctl vm.pmap.kernel_maps to correctly count ATTR_CONTIGUOUS superpages when the base page size is 16KB.

sys/arm64/include/vmparam.h
109

Shouldn't this be 12? If an allocation of 32MB is permitted, then the maximum allocation order is 11, so there are 12 possible allocation orders.

alc marked an inline comment as done.Mar 18 2024, 4:58 PM
alc added a subscriber: gallatin.
alc added inline comments.
sys/arm64/include/vmparam.h
109

Yes, you are correct. The email from @gallatin confirms it.

alc marked an inline comment as done.

Correct VM_NFREEORDER for 16KB page size.

Request aligned virtual addresses in pmap_mapdev{,_attr}().

This revision was not accepted when it landed; it landed in state Needs Review.Mar 24 2024, 5:48 PM
This revision was automatically updated to reflect the committed changes.
alc updated this revision to Diff 136172.

Reopen after reservation size fix was committed.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 30 2024, 6:47 PM
This revision was automatically updated to reflect the committed changes.