Page MenuHomeFreeBSD

amd64 pmap: allocate leaf page table page for wired userspace 2M pages
ClosedPublic

Authored by bnovkov on Jul 21 2023, 11:36 AM.
Tags
None
Referenced Files
F107145432: D41132.diff
Fri, Jan 10, 7:56 PM
Unknown Object (File)
Tue, Dec 31, 4:12 PM
Unknown Object (File)
Thu, Dec 26, 12:49 AM
Unknown Object (File)
Mon, Dec 23, 9:43 PM
Unknown Object (File)
Mon, Dec 16, 3:21 PM
Unknown Object (File)
Dec 5 2024, 6:23 PM
Unknown Object (File)
Dec 1 2024, 7:43 PM
Unknown Object (File)
Nov 25 2024, 7:15 PM
Subscribers

Details

Summary

This patch reverts the changes made in D19670 and fixes the original issue by allocating and prepopulating a leaf page table page for wired userspace 2M pages.

The original issue is an edge case that creates an unmapped, wired region in userspace. Subsequent faults on this region can trigger wired superpage creation, which leads to a panic in pmap_demote_pde_locked as the pmap does not create a leaf page table page for the wired superpage. D19670 fixed this by disallowing preemptive creation of wired superpage mappings, but that fix is currently interfering with an ongoing effort of speeding up vm_map_wire for large, contiguous entries (e.g. bhyve wiring guest memory) .

Test Plan

The C repros generated by syzkaller no longer trigger this panic, so I've tested the changes using a pseudodevice module that inserts a wired 2M page mapping in the calling process's address space through an ioctl.
I've attached an archive containing the source files for the module and a test program

.
Running the test program with D19670 reverted triggers the same panic. The program exits normally when running it with this patch.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

The C repros generated by syzkaller no longer trigger this panic

hmm, that is surprising. Could you share the program and output from ktrace -tcf ./repro?

sys/amd64/amd64/pmap.c
7519

Further below we handle the case where a mapping already exists at the specified va. In that situation a leaf PTP may already be resident in the trie, for example if the existing mapping was created by a promotion. Then the new pmap_insert_pt_page() call will fail.

We should insert the leaf PTP further below, after any existing mapping has been dealt with.

7521

I suspect that VM_ALLOC_INTERRUPT is inappropriate here, since we can gracefully handle allocation failures. _INTERRUPT should generally be used to minimize the risk of failure in paths where error handling isn't possible.

7528

Local variables should be defined at the beginning of the function or block.

sys/vm/vm_fault.c
391

Just a note: this part shouldn't be committed until other pmaps which implement pmap_enter(psind = 1) (i386, arm64, riscv) have been similarly modified.

The C repros generated by syzkaller no longer trigger this panic

hmm, that is surprising. Could you share the program and output from ktrace -tcf ./repro?

Sure, here is the ktrace output


and the link to the program.
Also, for some reason the test program tarball I originally attached wasn't visible, should be accessible now.

sys/vm/vm_fault.c
391

Of course, if this approach gets accepted I'll move it into a separate patch.

sys/amd64/amd64/pmap.c
7636

It is really strange to return an error in case of page table page allocation failure, but panic on trie insert.

bnovkov added inline comments.
sys/amd64/amd64/pmap.c
7636

thank you for pointing this out, I've reworked the condition and removed the panic.

sys/amd64/amd64/pmap.c
7636

If pmap_insert_pt_page() failed, wouldn't you leak the page table page?

bnovkov marked an inline comment as done.

Address @kib 's comment, free pagetable page when insertion fails.

bnovkov added inline comments.
sys/amd64/amd64/pmap.c
7636

Yes, that would cause a leak, I've reworked the code to deal with this.

sys/amd64/amd64/pmap.c
7621

pv entry must be removed if the ptpage below cannot be mounted. Perhaps PGA_WRITEABLE setting for the constituent pages should be postponed until we committed to return success.

7643
bnovkov added inline comments.
sys/amd64/amd64/pmap.c
7621

Do you think it would be more appropriate to move this code block before PV entry creation? That way we only have worry about removing and freeing the newly allocated ptpage if things go awry.

sys/amd64/amd64/pmap.c
7621

There are two things you need to worry about allocation failure: pde pv entry and page table page. I do not see much difference if you change order of allocation, either way you need to handle both failures.

Address @kib 's comments - remove pde pv entry upon leaf ptpage insertion or allocation failure, defer setting PGA_WRITEABLE for constituent pages.

sys/amd64/amd64/pmap.c
7595

Increase the scope of this variable, and initialize it to NULL.

7601

This has three problems: It's removing from the 4KB PV list when it should be removing from the 2MB PV list. It should only attempt removal if the PDE has the managed bit set. It needs to conditionally perform pmap_abort_ptp().

7613

This does not correctly handle the PAT bit, which is in a different bit position between PDEs and PTEs. That said, there is no reason to initialize the PTEs. Instead, pass false as the last argument to pmap_insert_pt_page(), and the page table page will be initialized on demotion, if demotion ever occurs. (The comment describing the last parameter to pmap_insert_pt_page() is specific to its current usage and can be updated later.)

7621

Actually, the error handling will be simpler if you move it.

sys/amd64/amd64/pmap.c
7561

On some occasions, the page table page that you are later going to allocate is being freed here. :-)

Address @alc 's comments:

  • Defer PTE initialization until demotion
  • Rearrange code to simplify error handling

I've also excluded the changes to vm_fault.c, as per @markj 's remark.

sys/amd64/amd64/pmap.c
7581

As a micro-optimization, because wired mappings are rare and user-space mapping are common, so test the condition to most likely to be false first.

7586–7587

We need to undo the pmap_insert_pt_page().

7587

Please note that the surrounding code does not introduce blank lines except around blocks of code preceded by a multiline block comment. This is an idiosyncrasy of style(9) that admittedly isn't always adhered to.

7589

The page is not guaranteed to be zero filled.

This looks correct now. Thanks!

sys/amd64/amd64/pmap.c
7589

This should only be indented as far as the opening parentheses on the previous line.

This revision is now accepted and ready to land.Sep 9 2023, 6:19 PM
sys/amd64/amd64/pmap.c
7582

This needs to be pmap_pde_pindex(va). pmap_pde_index() gives you the PDE index for the address, which despite the similar name is completely different. I'll fix this before pushing.