Page MenuHomeFreeBSD

vfs_subr: optimize inval_buf_range
ClosedPublic

Authored by dougm on Oct 5 2024, 5:28 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 5, 6:39 AM
Unknown Object (File)
Mon, Nov 4, 11:00 PM
Unknown Object (File)
Sat, Nov 2, 9:34 PM
Unknown Object (File)
Sun, Oct 27, 9:00 PM
Unknown Object (File)
Wed, Oct 23, 8:50 AM
Unknown Object (File)
Tue, Oct 22, 10:49 PM
Unknown Object (File)
Sun, Oct 20, 6:47 AM
Unknown Object (File)
Sun, Oct 20, 6:47 AM
Subscribers

Details

Summary

Use a pctrie_lookup to avoid walking over low out-of-range buf list entries, and an early break to avoid the high out-of-range entries. Avoid writing almost identical loops for the dirty and clean lists.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm requested review of this revision.Oct 5 2024, 5:28 PM
dougm created this revision.
sys/kern/vfs_subr.c
2655–2656

Note that B_DELWRI tests are different between loops. All buffers in the clean queue must have B_DELWRI clean, the flag set means that we lost the iterator. For the dirty queue, it is reverse.

Address B_DELWRI error pointed out by @kib

This revision is now accepted and ready to land.Tue, Oct 8, 8:29 AM

Make the code less tricky, with fewer variables, and make the resulting amd64 code a few bytes smaller than before.

This revision now requires review to proceed.Wed, Oct 9, 4:09 AM
sys/kern/vfs_subr.c
2656

This is even more tricky than before.

Both parts of the comparison operator are converted to same type, in this case int. Then clean becomes 0 or 1, while B_DELWRI bit is not at position 0, so the left hand cannot have the value 1 ever.

So this expression is apparently true for any delayed-write buffer, which is not what needed.

sys/kern/vfs_subr.c
2656

!x is 0 when x is nonzero and 1 when x is zero.
The left hand is !(nbp->b_flags & B_DELWRI), which is either 0 (when the bit is set) or 1 (when the bit is not set.

But I can rewrite as

(nbp->b_flags & B_DELWRI) == (clean ? B_DELWRI : 0)

if that would be clearer.

kib added inline comments.
sys/kern/vfs_subr.c
2656

Huh, I missed the '!' before the logical-and expression. It is correct but still tricky (and not in style).

So yes please change as you proposed.

This revision is now accepted and ready to land.Wed, Oct 9, 11:49 AM

Rewrite a tricky expression, as discussed.

This revision now requires review to proceed.Wed, Oct 9, 4:20 PM
This revision is now accepted and ready to land.Thu, Oct 10, 9:02 AM
This revision was automatically updated to reflect the committed changes.
dougm added a subscriber: pho.

Don't use pctrie lookup to find the first block to scan; just use linear search. Using pctrie lookup leads to two crashes (truncate3.sh, mmap28.sh) on stress tests, and this version passes those tests. I guess there are items in the list not in the trie?

Don't use pctrie lookup to find the first block to scan; just use linear search. Using pctrie lookup leads to two crashes (truncate3.sh, mmap28.sh) on stress tests, and this version passes those tests. I guess there are items in the list not in the trie?

All buffers must be in trie. Might be, it is due to UFS metadata buffers having negative indexes?

Not sure if the panic is related to this patch?

20241011 13:49:02 all (664/970): swapoff4.sh
Oct 11 13:50:22 mercat1 kernel: pid 90548 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:23 mercat1 kernel: pid 90555 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:25 mercat1 kernel: pid 90550 (swap), jid 0, uid 0, was killed: failed to reclaim memory
panic: Assertion (object->flags & OBJ_SWAP) != 0 failed at ../../../vm/swap_pager.c:564
cpuid = 5
time = 1728647425
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d8174990
vpanic() at vpanic+0x13f/frame 0xfffffe01d8174ac0
panic() at panic+0x43/frame 0xfffffe01d8174b20
swapoff_one() at swapoff_one+0x8a9/frame 0xfffffe01d8174d00
kern_swapoff() at kern_swapoff+0x1ab/frame 0xfffffe01d8174e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d8174f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d8174f30
--- syscall (582, FreeBSD ELF64, swapoff), rip = 0xdc6777123ba, rsp = 0xdc6748c87e8, rbp = 0xdc6748c8920 ---

https://people.freebsd.org/~pho/stress/log/log0553.txt

In D46963#1072654, @pho wrote:

Not sure if the panic is related to this patch?

20241011 13:49:02 all (664/970): swapoff4.sh
Oct 11 13:50:22 mercat1 kernel: pid 90548 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:23 mercat1 kernel: pid 90555 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:25 mercat1 kernel: pid 90550 (swap), jid 0, uid 0, was killed: failed to reclaim memory
panic: Assertion (object->flags & OBJ_SWAP) != 0 failed at ../../../vm/swap_pager.c:564
cpuid = 5
time = 1728647425
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d8174990
vpanic() at vpanic+0x13f/frame 0xfffffe01d8174ac0
panic() at panic+0x43/frame 0xfffffe01d8174b20
swapoff_one() at swapoff_one+0x8a9/frame 0xfffffe01d8174d00
kern_swapoff() at kern_swapoff+0x1ab/frame 0xfffffe01d8174e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d8174f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d8174f30
--- syscall (582, FreeBSD ELF64, swapoff), rip = 0xdc6777123ba, rsp = 0xdc6748c87e8, rbp = 0xdc6748c8920 ---

https://people.freebsd.org/~pho/stress/log/log0553.txt

Nope. It is due to the swap pager iterators code, which adds (IMO) correct assertions that the object has the OBJ_SWAP flag set. But the flag might become cleared after the object relock, if the object is dying.

May be something like this

diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
index f4db46a32dee..50c320a95002 100644
--- a/sys/vm/swap_pager.c
+++ b/sys/vm/swap_pager.c
@@ -1922,6 +1922,7 @@ swap_pager_swapoff_object(struct swdevt *sp, vm_object_t object)
 			 * Make sure that pending writes finish before
 			 * returning.
 			 */
+dead:
 			vm_object_pip_wait(object, "swpoff");
 			swp_pager_meta_free_all(object);
 			break;
@@ -1988,11 +1989,15 @@ swap_pager_swapoff_object(struct swdevt *sp, vm_object_t object)
 			 * iterator and search again for the first swblk at or
 			 * after blks.index.
 			 */
+			if ((object->flags & OBJ_DEAD) != 0)
+				goto dead;
 			pctrie_iter_reset(&pages);
 			sb = swblk_iter_init(&blks, object, blks.index);
 			continue;
 		}
 		if (sb_empty) {
+			if ((object->flags & OBJ_DEAD) != 0)
+				goto dead;
 			swblk_iter_remove(&blks);
 			uma_zfree(swblk_zone, sb);
 		}

I ran the test swapoff4.sh for 3 hours with your patch and didn't see any issues.

I ran tests with D46963.144614.patch added for 14 hours. I did not observe any issues.

Restore the pctrie lookup. Add a check for negative block number. Passes mmap28.sh and truncate3.sh tests.

This revision is now accepted and ready to land.Sun, Oct 13, 10:58 PM

Copy what bnoreuselist() does to handle the problem with LOOKUP_GE results, which is more robust than what I did before.

This revision now requires review to proceed.Sun, Oct 13, 11:51 PM
This revision is now accepted and ready to land.Mon, Oct 14, 8:19 AM

I ran tests with D46963.144796.patch added for 6 hours without seeing any issues.

This revision was automatically updated to reflect the committed changes.