Page MenuHomeFreeBSD

arm64 pmap: Add ATTR_CONTIGUOUS support [Part 3]
ClosedPublic

Authored by alc on Apr 27 2024, 6:54 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 16, 5:33 PM
Unknown Object (File)
Nov 28 2024, 5:47 PM
Unknown Object (File)
Nov 28 2024, 5:47 PM
Unknown Object (File)
Nov 28 2024, 5:47 PM
Unknown Object (File)
Nov 28 2024, 5:47 PM
Unknown Object (File)
Nov 28 2024, 5:27 PM
Unknown Object (File)
Nov 21 2024, 2:04 PM
Unknown Object (File)
Nov 21 2024, 4:42 AM

Details

Summary

Introduce L3C promotion of base page mappings.

Given the frequency of L3C counter updates, switch to per-CPU counters to avoid cache line ping ponging.

Revise the L3C counter descriptions to reflect the fact that the size of an L3C mapping varies depending on the base page size.

Test Plan

Eliot and I have done extensive testing of L3C promotion on a variety of workloads. This includes some testing by me on a system configured with a 16KB base page size. Happily, on that system the number of L3C promotions to 2MB mappings is very close to the number of L2 promotions on a system with a 4KB base page size. Moreover, as expected, the number of L2 promotions is unaffected by this change.

The downside to this change is the increased direct and indirect costs of madvise(MADV_FREE). In a buildworld workload, the net effect is still positive. However, for GraphChi computing the page rank algorithm on a large graph, there is a significant increase in the number of page faults. Jemalloc is performing madvise(MADV_FREE) on a significant amount of memory that gets reused, and we suffer page faults to repromote to L3C (and L2) mappings. This has always been an issue with madvise(MADV_FREE), but with this change it is somewhat worse.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc requested review of this revision.Apr 27 2024, 6:54 PM
markj added inline comments.
sys/arm64/arm64/pmap.c
4800

The second condition is wrapped by unneeded parentheses.

5310

Do you plan to modify pmap_promote_l2() to avoid scanning every PTE in a L3C superpage mapping? Is there a reason that optimization wouldn't be straightforward?

This revision is now accepted and ready to land.Apr 28 2024, 3:46 PM
alc edited the summary of this revision. (Show Details)

Eliminate unnecessary parentheses.

This revision now requires review to proceed.Apr 28 2024, 8:19 PM
alc marked 2 inline comments as done.Apr 29 2024, 9:41 AM
alc added inline comments.
sys/arm64/arm64/pmap.c
5310

Yes, eventually. Eliot has it in his prototype. Right now, in this patch, we don't yet perform L3C promotion from pmap_enter_quick_locked, so we can't yet expect that all 32 (or 16) L3C-sized ranges in an L2 reservation have been promoted when we are in the L2 promotion code. The problem being that I wasn't comfortable with the low ratio of successful L3C promotions to failed attempts by pmap_enter_quick_locked. Specifically, we don't yet have an equivalent cheap test for avoiding L3C promotion attempts that will fail that is akin to mpte->ref_count == NL3PG that pretty effectively avoids a lot of L2 promotion attempts that will fail.

This revision was not accepted when it landed; it landed in state Needs Review.May 8 2024, 2:37 AM
This revision was automatically updated to reflect the committed changes.
alc marked an inline comment as done.