Page MenuHomeFreeBSD

Stop early clipping in vm_map_protect
ClosedPublic

Authored by dougm on Jun 20 2019, 4:19 PM.
Tags
None
Referenced Files
F107100419: D20711.id58934.diff
Fri, Jan 10, 3:07 AM
F107091097: D20711.id58919.diff
Thu, Jan 9, 11:30 PM
Unknown Object (File)
Fri, Jan 3, 11:18 AM
Unknown Object (File)
Fri, Jan 3, 10:45 AM
Unknown Object (File)
Fri, Jan 3, 10:12 AM
Unknown Object (File)
Fri, Jan 3, 9:17 AM
Unknown Object (File)
Wed, Jan 1, 1:10 AM
Unknown Object (File)
Sat, Dec 28, 4:39 AM
Subscribers

Details

Summary

vm_map_protect may return an error response after clipping the first or last map entry in the region to be reserved. This creates a pair of matching entries that should have been "simplified" back into one, or never created. This change defers the clipping of those entries until most of the vm_map_protect failure cases have been ruled out.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

vm_map.c
2535

Should we be accumulating the total swap reserved in this loop and releasing it on an error return?

2538

Should we be delaying the assignment of current->cred until the next loop, since otherwise we may cause two entries to become mergeable, and return early with KERN_RESOURCE_SHORTAGE without either restoring them to their original states or merging them?

vm_map.c
2535

Ideally, I think so. Note though that you would also need to go back and release any acquired cred refs. If you do as you suggest below and defer assignment of current->cred to the second loop, you wouldn't have to worry about this.

2576–2577

We may have reserved swap space for an earlier entry without clipping it. If the entry spanned one of start or end we will have marked the entry as charged by setting entry->cred, but the amount of reserved swap space will be less than entry->end - entry->start. So when the entry is deleted, we may release more swap space than was reserved. At least, I don't see anything preventing this scenario.

dougm retitled this revision from Stop early clipping in vm_map_reserve to Stop early clipping in vm_map_protect.Jun 22 2019, 3:55 AM
dougm edited the summary of this revision. (Show Details)

Reduce the swap_reserve calls to one, and not clip or modify map entry fields until it succeeds. This change makes an assumption about the need to check ptoa(obj->size) - that it isn't necessary when you the start and end of the map entry - that I'm believing because alc told me so.

Relative to the previous version, combine the last two loops of vm_map_protect into one.

Don't allocate swap space until the in-transition test is passed.

With D20711.58891.diff I got:

20190622 09:11:08 all (10/70): mmap18.sh
panic: no cred for entry 0xfffff8060ee67310
cpuid = 1
time = 1561187513
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00adcfe750
vpanic() at vpanic+0x19d/frame 0xfffffe00adcfe7a0
panic() at panic+0x43/frame 0xfffffe00adcfe800
vm_fault_copy_entry() at vm_fault_copy_entry+0x65a/frame 0xfffffe00adcfe8b0
vm_map_protect() at vm_map_protect+0x496/frame 0xfffffe00adcfe960
kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00adcfe990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adcfeab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00adcfeab0
--- syscall (74, FreeBSD ELF64, sys_mprotect), rip = 0x80042f00a, rsp = 0x7fffdfbfbf78, rbp = 0x7fffdfbfbfb0 ---
KDB: enter: panic
[ thread pid 60596 tid 100406 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> run pho
db:0:pho> set $lines 20000
db:0:pho>  run pho1
db:1:pho1> bt
Tracing pid 60596 tid 100406 td 0xfffff801f87d8000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe00adcfe750
vpanic() at vpanic+0x1ba/frame 0xfffffe00adcfe7a0
panic() at panic+0x43/frame 0xfffffe00adcfe800
vm_fault_copy_entry() at vm_fault_copy_entry+0x65a/frame 0xfffffe00adcfe8b0
vm_map_protect() at vm_map_protect+0x496/frame 0xfffffe00adcfe960
kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00adcfe990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adcfeab0

https://people.freebsd.org/~pho/stress/log/dougm035.txt

Reorder events in the final loop now that the significance of 'cred' has been partially revealed to me.

I ran all of the mmap() tests I have on D20711.58892.diff for a total of 8 hours.
I also did a buildworld / installworld.
No problems seen with this partial test.

Don't account for clipping in computing the amount of swap space to reserve. Fix some style glitches.

vm_map.c
2477

{} are not needed.

2503

(...eflags & MAP_ENTRY_NEEDS_COPY) != 0

You can check that in_tran == NULL there, I am not sure.

2551

!= 0

2576–2577

This is still not done, right ?

dougm marked 3 inline comments as done.

Address easy suggestions from reviewer.

vm_map.c
2576–2577

I'm not clear about whether this comment expresses a concern about a bug that this patch would introduce, or a bug that's already there that this patch is being asked to fix.

So far, I'm not trying to fix anything except the cases where vm_map_protect clips or modifies entries and then fails without undoing those changes.

Is this about using entry->end - entry->size instead of ptoa(obj->size)? I'm happy to go back to ptoa(obj->size). Somebody told me that they were the same.

If that's not it, then I'm not sure what we're talking about.

vm_map.c
2576–2577

Look at how the clip_start entry is handled, for instance for NEED_COPY case. You reserve the amount of swap needed for non-clipped entry, then set cred for clipped. This means that the swap reserved is bigger than the size of the entry, and the difference would be never released.

vm_map.c
2508

Suppose an entry spans end. That entry will be clipped below, but we will reserve space for the new entry created by that clipping without charging that entry for the space.

2576–2577

I was pointing out a bug in the initial version of the patch, which is still present and related to my comment above. Basically, with your patch we are reserving swap space based on the size of unclipped map entries, and I believe that that is incorrect. I will try to provide more context in my comments in the future.

vm_map.c
2576–2577

That's right now. But back up to diff 58892, where the critical line was:

to_reserve += ulmin(end, current->end) -
                 ulmax(start, current->start);

so I was reserving just the amount needed for clipped entries. But Mark's first comment was made with this line in place, so I didn't understand it to be about accounting-for-clipping.

Why did I change this line 12 hours ago? Because a Peter Holm crash suggested I wasn't reserving enough swap space, and I thought this line might be the problem.

2576–2577

Thanks for all the help you can give me. But, as explained in the response to Kostik, I was accounting for clipping at the time you made your comment. I'm just not doing it now.

vm_map.c
2576–2577

The initial version had a different variant of the problem: it accounted for the clipping when reserving swap space, but the first loop could return an error before any clipping actually happened. So the amount of swap space reserved may not have matched the entry size.

vm_map.c
2576–2577

Isn't this what happens now? That is, clip_start calls vm_entry_charge_object, which leaves the two halves of the clipped entry both with the same underlying object and the same charge. Isn't that range being charged twice *now*? I don't see where clip_start is later making two objects, one for each of the two map entries, with reduced charges.

Don't assume I can ignore ptoa(obj->size). Account for clipping when there's no object.

If vm_map_entry_back is going to be invoked by clipping, then use the size of the untrimmed entry, which is what vm_map_entry_back will set as the size of the object that it creates. Otherwise, if there's no object, there won't be an object made and use the clipped map entry difference. Otherwise, use the size from the pre-existing object, the size of which won't be modified by vm_map_entry_charge_object, or by clip_start or clip_end.

Remove a redundant condition, one which is tested for at the top of the loop.

Add an assertion that the amount of swap space to reserved has been calculated correctly.

With D20711.58934.diff I get:

20190624 05:49:07 all (2/15): mmap11.sh
panic: vm_map_protect: wrong amount reserved
cpuid = 1
time = 1561348157
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aca04800
vpanic() at vpanic+0x19d/frame 0xfffffe00aca04850
panic() at panic+0x43/frame 0xfffffe00aca048b0
vm_map_protect() at vm_map_protect+0x7b9/frame 0xfffffe00aca04960
kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00aca04990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00aca04ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aca04ab0

https://people.freebsd.org/~pho/stress/log/dougm037.txt

Move the simplification of entries to a separate, third loop, not because I want to but because I want to see if simplification is what's making my accounting assertion fail, or if the cause is some other thing I don't understand.

So everyone but Peter is free to ignore this version.

Retreat. I know I can improve something, and I know that I'm misunderstanding something else. I'll try to take the small win for now.

vm_map.c
2535

Isn't this test redundant? In other words, doesn't vm_map_clip_start() perform the same test?

This revision is now accepted and ready to land.Jun 25 2019, 2:54 AM