Details

Reviewers

alc
asomers
kib

Commits

rG2c8caa4b3925: vfs_subr: optimize inval_buf_range
rGaf4cd5e7b5b5: vfs_subr: optimize inval_buf_range

Summary

Use a pctrie_lookup to avoid walking over low out-of-range buf list entries, and an early break to avoid the high out-of-range entries. Avoid writing almost identical loops for the dirty and clean lists.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

dougm requested review of this revision.Oct 5 2024, 5:28 PM

dougm created this revision.

dougm added reviewers: asomers, kib.Oct 7 2024, 8:07 PM

kib added inline comments.Oct 8 2024, 7:27 AM

sys/kern/vfs_subr.c
2655–2656	Note that B_DELWRI tests are different between loops. All buffers in the clean queue must have B_DELWRI clean, the flag set means that we lost the iterator. For the dirty queue, it is reverse.

Address B_DELWRI error pointed out by @kib

kib accepted this revision.Oct 8 2024, 8:29 AM

This revision is now accepted and ready to land.Oct 8 2024, 8:29 AM

Make the code less tricky, with fewer variables, and make the resulting amd64 code a few bytes smaller than before.

This revision now requires review to proceed.Oct 9 2024, 4:09 AM

kib added inline comments.Oct 9 2024, 9:17 AM

sys/kern/vfs_subr.c
2656	This is even more tricky than before. Both parts of the comparison operator are converted to same type, in this case int. Then clean becomes 0 or 1, while B_DELWRI bit is not at position 0, so the left hand cannot have the value 1 ever. So this expression is apparently true for any delayed-write buffer, which is not what needed.

dougm added inline comments.Oct 9 2024, 10:03 AM

sys/kern/vfs_subr.c
2656	!x is 0 when x is nonzero and 1 when x is zero. The left hand is !(nbp->b_flags & B_DELWRI), which is either 0 (when the bit is set) or 1 (when the bit is not set. But I can rewrite as (nbp->b_flags & B_DELWRI) == (clean ? B_DELWRI : 0) if that would be clearer.

kib accepted this revision.Oct 9 2024, 11:49 AM

kib added inline comments.

sys/kern/vfs_subr.c
2656	Huh, I missed the '!' before the logical-and expression. It is correct but still tricky (and not in style). So yes please change as you proposed.

This revision is now accepted and ready to land.Oct 9 2024, 11:49 AM

Rewrite a tricky expression, as discussed.

This revision now requires review to proceed.Oct 9 2024, 4:20 PM

kib accepted this revision.Oct 10 2024, 9:02 AM

This revision is now accepted and ready to land.Oct 10 2024, 9:02 AM

Closed by commit rGaf4cd5e7b5b5: vfs_subr: optimize inval_buf_range (authored by dougm). · Explain WhyOct 10 2024, 9:08 AM

This revision was automatically updated to reflect the committed changes.

dougm added a commit: rGaf4cd5e7b5b5: vfs_subr: optimize inval_buf_range.

Herald added subscribers: olce, imp. · View Herald TranscriptOct 10 2024, 9:08 AM

dougm added a reverting change: rGff2263469ad0: Revert "vfs_subr: optimize inval_buf_range".Oct 10 2024, 10:46 AM

Don't use pctrie lookup to find the first block to scan; just use linear search. Using pctrie lookup leads to two crashes (truncate3.sh, mmap28.sh) on stress tests, and this version passes those tests. I guess there are items in the list not in the trie?

dougm reopened this revision.Oct 10 2024, 9:59 PM

In D46963#1072540, @dougm wrote:

Don't use pctrie lookup to find the first block to scan; just use linear search. Using pctrie lookup leads to two crashes (truncate3.sh, mmap28.sh) on stress tests, and this version passes those tests. I guess there are items in the list not in the trie?

All buffers must be in trie. Might be, it is due to UFS metadata buffers having negative indexes?

Not sure if the panic is related to this patch?

20241011 13:49:02 all (664/970): swapoff4.sh
Oct 11 13:50:22 mercat1 kernel: pid 90548 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:23 mercat1 kernel: pid 90555 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:25 mercat1 kernel: pid 90550 (swap), jid 0, uid 0, was killed: failed to reclaim memory
panic: Assertion (object->flags & OBJ_SWAP) != 0 failed at ../../../vm/swap_pager.c:564
cpuid = 5
time = 1728647425
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d8174990
vpanic() at vpanic+0x13f/frame 0xfffffe01d8174ac0
panic() at panic+0x43/frame 0xfffffe01d8174b20
swapoff_one() at swapoff_one+0x8a9/frame 0xfffffe01d8174d00
kern_swapoff() at kern_swapoff+0x1ab/frame 0xfffffe01d8174e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d8174f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d8174f30
--- syscall (582, FreeBSD ELF64, swapoff), rip = 0xdc6777123ba, rsp = 0xdc6748c87e8, rbp = 0xdc6748c8920 ---

https://people.freebsd.org/~pho/stress/log/log0553.txt

In D46963#1072654, @pho wrote:

Not sure if the panic is related to this patch?

20241011 13:49:02 all (664/970): swapoff4.sh
Oct 11 13:50:22 mercat1 kernel: pid 90548 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:23 mercat1 kernel: pid 90555 (swap), jid 0, uid 0, was killed: failed to reclaim memory
Oct 11 13:50:25 mercat1 kernel: pid 90550 (swap), jid 0, uid 0, was killed: failed to reclaim memory
panic: Assertion (object->flags & OBJ_SWAP) != 0 failed at ../../../vm/swap_pager.c:564
cpuid = 5
time = 1728647425
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d8174990
vpanic() at vpanic+0x13f/frame 0xfffffe01d8174ac0
panic() at panic+0x43/frame 0xfffffe01d8174b20
swapoff_one() at swapoff_one+0x8a9/frame 0xfffffe01d8174d00
kern_swapoff() at kern_swapoff+0x1ab/frame 0xfffffe01d8174e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d8174f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d8174f30
--- syscall (582, FreeBSD ELF64, swapoff), rip = 0xdc6777123ba, rsp = 0xdc6748c87e8, rbp = 0xdc6748c8920 ---

https://people.freebsd.org/~pho/stress/log/log0553.txt

Nope. It is due to the swap pager iterators code, which adds (IMO) correct assertions that the object has the OBJ_SWAP flag set. But the flag might become cleared after the object relock, if the object is dying.

May be something like this

diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
index f4db46a32dee..50c320a95002 100644
--- a/sys/vm/swap_pager.c
+++ b/sys/vm/swap_pager.c
@@ -1922,6 +1922,7 @@ swap_pager_swapoff_object(struct swdevt *sp, vm_object_t object)
 			 * Make sure that pending writes finish before
 			 * returning.
 			 */
+dead:
 			vm_object_pip_wait(object, "swpoff");
 			swp_pager_meta_free_all(object);
 			break;
@@ -1988,11 +1989,15 @@ swap_pager_swapoff_object(struct swdevt *sp, vm_object_t object)
 			 * iterator and search again for the first swblk at or
 			 * after blks.index.
 			 */
+			if ((object->flags & OBJ_DEAD) != 0)
+				goto dead;
 			pctrie_iter_reset(&pages);
 			sb = swblk_iter_init(&blks, object, blks.index);
 			continue;
 		}
 		if (sb_empty) {
+			if ((object->flags & OBJ_DEAD) != 0)
+				goto dead;
 			swblk_iter_remove(&blks);
 			uma_zfree(swblk_zone, sb);
 		}

I ran the test swapoff4.sh for 3 hours with your patch and didn't see any issues.

I ran tests with D46963.144614.patch added for 14 hours. I did not observe any issues.

Restore the pctrie lookup. Add a check for negative block number. Passes mmap28.sh and truncate3.sh tests.

kib accepted this revision.Oct 13 2024, 10:58 PM

This revision is now accepted and ready to land.Oct 13 2024, 10:58 PM

Copy what bnoreuselist() does to handle the problem with LOOKUP_GE results, which is more robust than what I did before.

This revision now requires review to proceed.Oct 13 2024, 11:51 PM

kib accepted this revision.Oct 14 2024, 8:19 AM

This revision is now accepted and ready to land.Oct 14 2024, 8:19 AM

I ran tests with D46963.144796.patch added for 6 hours without seeing any issues.

Closed by commit rG2c8caa4b3925: vfs_subr: optimize inval_buf_range (authored by dougm). · Explain WhyOct 14 2024, 5:12 PM

This revision was automatically updated to reflect the committed changes.

dougm added a commit: rG2c8caa4b3925: vfs_subr: optimize inval_buf_range.

vfs_subr: optimize inval_buf_range
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 144833

sys/kern/vfs_subr.c

vfs_subr: optimize inval_buf_rangeClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 144833

sys/kern/vfs_subr.c

vfs_subr: optimize inval_buf_range
ClosedPublic
Actions

Revision Contents
Changeset List