Page MenuHomeFreeBSD

arm64 pmap: Add ATTR_CONTIGUOUS support [Part 2]
ClosedPublic

Authored by alc on Apr 1 2024, 5:26 AM.
Tags
None
Referenced Files
F103023506: D44575.diff
Tue, Nov 19, 11:05 PM
Unknown Object (File)
Mon, Nov 18, 5:01 PM
Unknown Object (File)
Mon, Nov 4, 4:56 AM
Unknown Object (File)
Mon, Nov 4, 4:55 AM
Unknown Object (File)
Mon, Nov 4, 4:55 AM
Unknown Object (File)
Mon, Nov 4, 4:55 AM
Unknown Object (File)
Mon, Nov 4, 4:55 AM
Unknown Object (File)
Mon, Nov 4, 4:55 AM

Details

Summary

Create ATTR_CONTIGUOUS mappings in pmap_enter_object().

On a Cortex A72-based EC2 a1.4xlarge VM, the make -j16 buildworld times for a -NODEBUG kernel and MALLOC_PRODUCTION user-space drop from 101 minutes to 94 minutes. The read-only data and text sections of large (2 MB+) executables, e.g., clang, are being mapped using 64 KB pages. I do, however, want to point out that a good portion of the reduction in buildworld time is coming from performing a smaller number of icache flushes when creating executable mappings.

Also included is a BTI-related bug fix to pmap_enter_l2().

Test Plan

Here are some stats after make -j8 buildworld from a GENERIC kernel configured with a 16 KB base page size on an EC2 m7g.2xlarge.

23601.828u 665.253s 53:45.06 752.4%     89071+1326k 152492+35485io 133520pf+0w
Mon Apr  1 19:56:39 UTC 2024
vm.pmap.l3c.copies: 0
vm.pmap.l3c.protects: 0
vm.pmap.l3c.removes: 0
vm.pmap.l3c.mappings: 104719
vm.pmap.l3c.demotions: 20
vm.pmap.l2.fills: 3
vm.pmap.l2.promotions: 1859
vm.pmap.l2.p_failures: 1633
vm.pmap.l2.mappings: 187
vm.pmap.l2.demotions: 1083
vm.pmap.superpages_enabled: 1
vm.pmap.vmid.epoch: 0
vm.pmap.vmid.next: 0
vm.pmap.vmid.bits: 0
vm.pmap.asid.epoch: 0
vm.pmap.asid.next: 64341
vm.pmap.asid.bits: 16
vm.reserv.reclaimed: 0
vm.reserv.partpopq: 
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1, 4793472K,    151

vm.reserv.fullpop: 9
vm.reserv.freed: 378421
vm.reserv.broken: 4

The L3C size is 2 MB, and the number of L3C mappings is very close to the number of L2 mappings on a kernel configured with a 4 KB base page size, which is what I would expect.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc requested review of this revision.Apr 1 2024, 5:26 AM
sys/arm64/arm64/pmap.c
5460–5462

As part of a future change that updates pagesizes[] to include L3C_SIZE, pmap_enter(..., psind=1) will also call this function.

5689–5690

When pagesizes[] is updated to include L3C_SIZE and vm_reserv.c is updated to set psind=1 on L3C pages, I expect this to switch to checking the page's psind field.

I do, however, want to point out that a good portion of the reduction in buildworld time is coming from performing a smaller number of icache flushes when creating executable mappings.

Have you looked at teaching the vm code to manage the icache? We currently call cpu_icache_sync_range more than we need to, e.g. if mapping the same physical address as twice we will call it twice.

sys/arm64/arm64/pmap.c
1697–1711

XXX?

It would also be good it we didn't list the size as it will be different based on the granule size.

5429

Can we name this something that doesn't list the size as it may be incorrect.

sys/arm64/arm64/pmap.c
1697–1711

The XXX counters are here just for testing purposes. They were in the prior part too. I will delete them before committing the changes.

Rename pmap_enter_object()'s helpers to not have 64k or 2m in their names.

I do, however, want to point out that a good portion of the reduction in buildworld time is coming from performing a smaller number of icache flushes when creating executable mappings.

Have you looked at teaching the vm code to manage the icache? We currently call cpu_icache_sync_range more than we need to, e.g. if mapping the same physical address as twice we will call it twice.

We discussed this a while ago. To minimize icache syncing, I believe we need to identify all of the places where a kernel might modify a user-mapped page via the direct map. I think that hooking uiomove_* would get us most of the way there, but it's hard to be confident that that's sufficient.

sys/arm64/arm64/pmap.c
5594

Do we want to set *ml3p = NULL in this error path as well?

sys/vm/vm_reserv.c
1072

Perhaps also assert that npages <= VM_LEVEL_0_NPAGES?

Add KASSERT to vm_reserv_is_populated().

alc marked 2 inline comments as done.Apr 6 2024, 5:19 AM
alc added inline comments.
sys/arm64/arm64/pmap.c
5594

We did not because the PTP pointer will still be valid. The PTP can't have been freed. Curiously, pmap_enter_quick_locked() does however return NULL in this case even though it too need not.

alc marked an inline comment as done.Apr 6 2024, 5:49 AM

I do, however, want to point out that a good portion of the reduction in buildworld time is coming from performing a smaller number of icache flushes when creating executable mappings.

Have you looked at teaching the vm code to manage the icache? We currently call cpu_icache_sync_range more than we need to, e.g. if mapping the same physical address as twice we will call it twice.

We discussed this a while ago. To minimize icache syncing, I believe we need to identify all of the places where a kernel might modify a user-mapped page via the direct map. I think that hooking uiomove_* would get us most of the way there, but it's hard to be confident that that's sufficient.

Yes, exactly. After the next couple of ATTR_CONTIGUOUS pieces are done, or at least up for review, I'll post a draft icache management patch so that we can begin working on our confidence. :-)

sys/arm64/arm64/pmap.c
5574

The 14 lines starting here assume that *ml3p is non-NULL, and do not read l3p. So they could be moved up to after the assignment to l3p in line 5548, and limited to the !ADDR_IS_KERNEL case. You'd have to move the have_lp3 target to just past line 5548 as well.

5592

The variable old_l3e is used only in this line, so I suspect that it can be dropped.

sys/arm64/arm64/pmap.c
5469

Is l2p necessary? Couldn't it be replaced by pde everywhere, since once l2p is computed from pde, that value of pde is not used again.

sys/arm64/arm64/pmap.c
5513

#define PTE_TO_VM_PAGE(pte) PHYS_TO_VM_PAGE(PTE_TO_PHYS(pte))

would be a useful macro that would save a lot of horizontal space.

sys/arm64/arm64/pmap.c
5494

How to write this section without gotos, in case you want to:

` l2pindex = pmap_l2_pindex(va);

		l3p = NULL;
		if (*ml3p != NULL && (*ml3p)->pindex != l2pindex)
			ml3p = NULL;
		while (*ml3p == NULL) {
			/*
			 * Get the L2 entry.
			 */
			pde = pmap_pde(pmap, va, &lvl);

			/*
			 * If the L2 entry is a superpage, we either abort or
			 * demote depending on the given flags.
			 */
			if (lvl == 1) {
				pde = pmap_l1_to_l2(pde, va);
				if ((pmap_load(pde) & ATTR_DESCR_MASK) ==
				    L2_BLOCK) {
					if ((flags & PMAP_ENTER_NOREPLACE) != 0)
						return (KERN_FAILURE);
					l3p = pmap_demote_l2_locked(pmap, pde,
					    va, lockp);
					if (l3p == NULL)
						pde = NULL;
				}
			} else if (lvl == 2) {
				if (pmap_load(pde) == 0)
					pde = NULL;
			} else
				pde = NULL;

			/*
			 * If the L3 PTP is not mapped, we attempt to allocate it.
			 */
			if (pde != NULL) {
				*ml3p = PHYS_TO_VM_PAGE(PTE_TO_PHYS(
				    pmap_load(pde)));
			} else {
				*ml3p = _pmap_alloc_l3(pmap, l2pindex, (flags &
				    PMAP_ENTER_NOSLEEP) != 0 ? NULL : lockp);
				if (*ml3p != NULL) {
					--(*ml3p)->ref_count;
				} else if ((flags & PMAP_ENTER_NOSLEEP) != 0)
					return (KERN_FAILURE);
			}
		}
		(*ml3p)->ref_count += L3C_ENTRIES;

		/*
		 * If bti is not the same for the whole L3C range, return failure
		 * and let vm_fault() cope.  Check after L3 allocation, since
		 * it could sleep.
		 */
		if (!pmap_bti_same(pmap, va, va + L3C_SIZE)) {
			KASSERT(*ml3p != NULL, ("pmap_enter_l3c: missing L3 PTP"));
			(*ml3p)->ref_count -= L3C_ENTRIES - 1;
			pmap_abort_ptp(pmap, va, *ml3p);
			*ml3p = NULL;
			return (KERN_PROTECTION_FAILURE);
		}
		if (l3p == NULL)
			l3p = (pt_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(*ml3p));

`

Eliminate an unnecessary variable.

alc marked 2 inline comments as done.Apr 6 2024, 10:53 PM
alc added inline comments.
sys/arm64/arm64/pmap.c
5513

I would be happy to see this added as a followup change.

alc marked 2 inline comments as done.Apr 6 2024, 10:58 PM
alc added inline comments.
sys/arm64/arm64/pmap.c
5469

It is more descriptive than pde, which can point to either an L1 or L2 entry.

alc marked an inline comment as done.Apr 6 2024, 10:59 PM
alc marked an inline comment as done.Apr 6 2024, 11:02 PM
sys/arm64/arm64/pmap.c
5494

I wish to edit this bit of code, which I can only do by pasting another slightly altered version.

		l3p = NULL;
		if (*ml3p != NULL && (*ml3p)->pindex != l2pindex)
			ml3p = NULL;
		while (*ml3p == NULL) {
			/*
			 * Get the L2 entry.
			 */
			pde = pmap_pde(pmap, va, &lvl);
			KASSERT(lvl == 1 || lvl == 2,
			    ("pmap_enter_l3c: Invalid level %d", lvl));

			/*
			 * If the L2 entry is a superpage, we either abort or
			 * demote depending on the given flags.
			 */
			if (lvl == 1) {
				pde = pmap_l1_to_l2(pde, va);
				if ((pmap_load(pde) & ATTR_DESCR_MASK) !=
				    L2_BLOCK)
					pde = NULL;
				else if ((flags & PMAP_ENTER_NOREPLACE) != 0)
					return (KERN_FAILURE);
				else if ((l3p = pmap_demote_l2_locked(pmap,
					    pde, va, lockp)) == NULL)
					pde = NULL;
			} else if (pmap_load(pde) == 0)
				pde = NULL;

			/*
			 * If the L3 PTP is not mapped, we attempt to allocate
			 * it.
			 */
			if (pde != NULL) {
				*ml3p = PHYS_TO_VM_PAGE(PTE_TO_PHYS(
				    pmap_load(pde)));
			} else {
				*ml3p = _pmap_alloc_l3(pmap, l2pindex, (flags &
				    PMAP_ENTER_NOSLEEP) != 0 ? NULL : lockp);
				if (*ml3p != NULL) {
					--(*ml3p)->ref_count;
				} else if ((flags & PMAP_ENTER_NOSLEEP) != 0)
					return (KERN_FAILURE);
			}
		}
		(*ml3p)->ref_count += L3C_ENTRIES;

		/*
		 * If bti is not the same for the whole L3C range, return failure
		 * and let vm_fault() cope.  Check after L3 allocation, since
		 * it could sleep.
		 */
		if (!pmap_bti_same(pmap, va, va + L3C_SIZE)) {
			KASSERT(*ml3p != NULL, ("pmap_enter_l3c: missing L3 PTP"));
			(*ml3p)->ref_count -= L3C_ENTRIES - 1;
			pmap_abort_ptp(pmap, va, *ml3p);
			*ml3p = NULL;
			return (KERN_PROTECTION_FAILURE);
		}
		if (l3p == NULL)
			l3p = (pt_entry_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(*ml3p));
This revision was not accepted when it landed; it landed in state Needs Review.Apr 7 2024, 5:29 PM
This revision was automatically updated to reflect the committed changes.

Update to reflect committed change.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 9 2024, 4:22 PM
This revision was automatically updated to reflect the committed changes.