Page MenuHomeFreeBSD

tmpfs: recalculate OBJ_TMPFS_VREF on reinstantiating node' vnode
ClosedPublic

Authored by kib on Wed, May 8, 3:10 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, May 14, 6:33 PM
Unknown Object (File)
Tue, May 14, 3:07 PM
Unknown Object (File)
Tue, May 14, 12:27 AM
Unknown Object (File)
Mon, May 13, 12:10 PM
Unknown Object (File)
Fri, May 10, 4:43 PM
Unknown Object (File)
Wed, May 8, 12:13 PM
Unknown Object (File)
Wed, May 8, 12:13 PM
Unknown Object (File)
Wed, May 8, 10:06 AM
Subscribers

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Wed, May 8, 3:10 AM
sys/fs/tmpfs/tmpfs_subr.c
1108

Sorry, I don't understand this. Above we assert that object->un_pager.swp.writemappings == 0.

kib marked an inline comment as done.Wed, May 8, 2:02 PM
kib added inline comments.
sys/fs/tmpfs/tmpfs_subr.c
1108

That assert is simply wrong. If a vnode is reclaimed while some mapping exists, I do not quite understand why it did not triggered, though.

kib marked an inline comment as done.

Remove invalid assert about zero writemappings of the object for the new vnode (since node might be old).
Add asserts that we do not underflow writemapping counters for swap pagers.

I do not see how this solves the problem that @pho found. There, vm_map_process_deferred() was releasing writecounts, and in one case the backing vnode was doomed, so OBJ_TMPFS_VREF was clear.

I do not see how this solves the problem that @pho found. There, vm_map_process_deferred() was releasing writecounts, and in one case the backing vnode was doomed, so OBJ_TMPFS_VREF was clear.

Do you mean https://people.freebsd.org/~pho/stress/log/log0518.txt? I agree that this is different. But for me, log0518 states that clearing of OBJ_TMPFS_VREF and clearing of node->tn_vnode are not atomic (and really cannot be). So the assert "object with writable mappings does not have a reference" should be removed, because indeed vp might be in process of reclaim or the pointer to vp might be already invalid.

diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c
index 4ddf2b4ba5ff..595c50ffc949 100644
--- a/sys/fs/tmpfs/tmpfs_subr.c
+++ b/sys/fs/tmpfs/tmpfs_subr.c
@@ -133,9 +133,12 @@ tmpfs_pager_writecount_recalc(vm_object_t object, vm_offset_t old,
 		    ("object without writable mappings has a reference"));
 		VNPASS(vp->v_usecount > 0, vp);
 	} else {
-		VNASSERT((object->flags & OBJ_TMPFS_VREF) != 0, vp,
-		    ("object with writable mappings does not "
-		    "have a reference"));
+		/*
+		 * Cannot assert that the object with writable
+		 * mappings has OBJ_TMPFS_VREF set, because vp might
+		 * be already reclaiming or freed.  We only own the
+		 * object lock there.
+		 */
 	}
 
 	if (old == new) {
In D45119#1029542, @kib wrote:

I do not see how this solves the problem that @pho found. There, vm_map_process_deferred() was releasing writecounts, and in one case the backing vnode was doomed, so OBJ_TMPFS_VREF was clear.

Do you mean https://people.freebsd.org/~pho/stress/log/log0518.txt? I agree that this is different. But for me, log0518 states that clearing of OBJ_TMPFS_VREF and clearing of node->tn_vnode are not atomic (and really cannot be). So the assert "object with writable mappings does not have a reference" should be removed, because indeed vp might be in process of reclaim or the pointer to vp might be already invalid.

Yes, but that isn't sufficient. Consider that tmpfs_pager_writecount_recalc() will vrele() the vnode if the writecount drops to 0, which is incorrect if the TMPFS_VREF flag is already clear.

In D45119#1029542, @kib wrote:

I do not see how this solves the problem that @pho found. There, vm_map_process_deferred() was releasing writecounts, and in one case the backing vnode was doomed, so OBJ_TMPFS_VREF was clear.

Do you mean https://people.freebsd.org/~pho/stress/log/log0518.txt? I agree that this is different. But for me, log0518 states that clearing of OBJ_TMPFS_VREF and clearing of node->tn_vnode are not atomic (and really cannot be). So the assert "object with writable mappings does not have a reference" should be removed, because indeed vp might be in process of reclaim or the pointer to vp might be already invalid.

Yes, but that isn't sufficient. Consider that tmpfs_pager_writecount_recalc() will vrele() the vnode if the writecount drops to 0, which is incorrect if the TMPFS_VREF flag is already clear.

So this part should be also easy, it is enough to check that TMPFS_VREF is set before clearing it and vrele-ing the vnode. This should be correct since we observe the flag while owning the object lock.
I do not remember for sure, but I believe your patches include that chunk?

diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c
index 595c50ffc949..c4b4552b8c81 100644
--- a/sys/fs/tmpfs/tmpfs_subr.c
+++ b/sys/fs/tmpfs/tmpfs_subr.c
@@ -147,9 +147,13 @@ tmpfs_pager_writecount_recalc(vm_object_t object, vm_offset_t old,
 	}
 
 	if (new == 0) {
-		vm_object_clear_flag(object, OBJ_TMPFS_VREF);
-		VM_OBJECT_WUNLOCK(object);
-		vrele(vp);
+		if ((object->flags & OBJ_TMPFS_VREF) != 0) {
+			vm_object_clear_flag(object, OBJ_TMPFS_VREF);
+			VM_OBJECT_WUNLOCK(object);
+			vrele(vp);
+		} else {
+			VM_OBJECT_WUNLOCK(object);
+		}
 	} else {
 		if ((object->flags & OBJ_TMPFS_VREF) == 0) {
 			vref(vp);
In D45119#1029923, @kib wrote:
In D45119#1029542, @kib wrote:

I do not see how this solves the problem that @pho found. There, vm_map_process_deferred() was releasing writecounts, and in one case the backing vnode was doomed, so OBJ_TMPFS_VREF was clear.

Do you mean https://people.freebsd.org/~pho/stress/log/log0518.txt? I agree that this is different. But for me, log0518 states that clearing of OBJ_TMPFS_VREF and clearing of node->tn_vnode are not atomic (and really cannot be). So the assert "object with writable mappings does not have a reference" should be removed, because indeed vp might be in process of reclaim or the pointer to vp might be already invalid.

Yes, but that isn't sufficient. Consider that tmpfs_pager_writecount_recalc() will vrele() the vnode if the writecount drops to 0, which is incorrect if the TMPFS_VREF flag is already clear.

So this part should be also easy, it is enough to check that TMPFS_VREF is set before clearing it and vrele-ing the vnode. This should be correct since we observe the flag while owning the object lock.
I do not remember for sure, but I believe your patches include that chunk?

Right. And finally, suppose that old > new and new != 0. Then we should not re-set OBJ_TMPFS_VREF in the else-case of the hunk you posted below.

diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c
index 595c50ffc949..c4b4552b8c81 100644
--- a/sys/fs/tmpfs/tmpfs_subr.c
+++ b/sys/fs/tmpfs/tmpfs_subr.c
@@ -147,9 +147,13 @@ tmpfs_pager_writecount_recalc(vm_object_t object, vm_offset_t old,
 	}
 
 	if (new == 0) {
-		vm_object_clear_flag(object, OBJ_TMPFS_VREF);
-		VM_OBJECT_WUNLOCK(object);
-		vrele(vp);
+		if ((object->flags & OBJ_TMPFS_VREF) != 0) {
+			vm_object_clear_flag(object, OBJ_TMPFS_VREF);
+			VM_OBJECT_WUNLOCK(object);
+			vrele(vp);
+		} else {
+			VM_OBJECT_WUNLOCK(object);
+		}
 	} else {
 		if ((object->flags & OBJ_TMPFS_VREF) == 0) {
 			vref(vp);

Right. And finally, suppose that old > new and new != 0. Then we should not re-set OBJ_TMPFS_VREF in the else-case of the hunk you posted below.

This is not enough as well. If the vnode is reclaimed, then we cannot safely access vp in any combination of new/old. But we must, eg. if old == 0, new > 0.

I now think that the proper fix is to clear vp->v_object for tmpfs under the object lock.

tmpfs_destroy_vobject(): clear v_object under the object lock

Which allows tmpfs_pager_writecount_recalc() to reliably detect
reclaimed vnode and make its accesses to object->un_pager.swp.private
(== vp) safe against reclaim.

Running tests with D45119.138427.patch I got this:

VNASSERT failed: (object->flags & OBJ_TMPFS_VREF) != 0 not true at ../../../fs/tmpfs/tmpfs_subr.c:138 (tmpfs_pager_writecount_recalc)
0xfffffe016d72bbb8: type VBAD state VSTATE_DEAD op 0xffffffff818ac700
    usecount 1, writecount 1, refcount 1 seqc users 1
    hold count flags ()
    flags (VIRF_DOOMED|VIRF_TEXT_REF)
    lock type tmpfs: UNLOCKED
#0 0xffffffff80b12778 at lockmgr_lock_flags+0x1c8
#1 0xffffffff8113288a at VOP_LOCK1_APV+0x3a
#2 0xffffffff80c583b3 at _vn_lock+0x53
#3 0xffffffff80c40150 at vflush+0xd0
#4 0xffffffff80a69d20 at tmpfs_unmount+0x70
#5 0xffffffff80c348cc at dounmount+0x7dc
#6 0xffffffff80c3408c at kern_unmount+0x2dc
#7 0xffffffff810696c8 at amd64_syscall+0x158
#8 0xffffffff8103b2cb at fast_syscall_common+0xf8

https://people.freebsd.org/~pho/stress/log/log0524.txt

Do check for vp->v_object == NULL in writecount_recalc().

This revision is now accepted and ready to land.Mon, May 13, 1:20 PM

I ran test with the D45119.138429.patch. 5 hours with the problem test scenario, followed by 24 hours af all of the tempfs tests.
No problems seen.