Page MenuHomeFreeBSD

swap_pager: use pctrie iterators
ClosedPublic

Authored by dougm on Sep 10 2024, 6:24 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 19, 10:46 AM
Unknown Object (File)
Sun, Nov 17, 4:42 PM
Unknown Object (File)
Fri, Nov 15, 12:37 AM
Unknown Object (File)
Thu, Nov 14, 5:22 PM
Unknown Object (File)
Mon, Nov 11, 7:47 AM
Unknown Object (File)
Wed, Nov 6, 8:05 PM
Unknown Object (File)
Sun, Oct 27, 12:05 PM
Unknown Object (File)
Wed, Oct 23, 3:54 AM

Details

Summary

Avoid walking from the pctrie root while iterating over pctrie elements by using iterators.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dougm requested review of this revision.Sep 10 2024, 6:24 AM
sys/vm/swap_pager.c
1926

The control flow here would be less confusing to me if this i == 0 case handling was moved out of the loop:

restart:
if (object->flags & OBJ_DEAD) {
    ...
}
swblk_iter_init(&blks, object);
sb = swblk_start(&blks, pi);
swp_pager_init_freerange(&range);
for (i = 0;;) {
    ...
}

and anywhere the loop drops the VM object look, instead of i = 0; continue;, goto restart;.

2241

This assignment could be moved into the loop header.

2254–2255

Unrelated to this diff, but, how is it safe to dereference sb->p while srcobject is unlocked? That seems wrong.

Similarly, I don't think it's safe to access sb after reacquiring the lock. Am I missing something?

2266

Would it be possible to have an explicit iter_invalidate operation which resets the path but leaves the current index and limit alone?

sys/vm/swap_pager.c
2254–2255

Where swap_pager_copy is called in vm_object_split, all the pages are xbusied first.

Where swap_pager_copy is called in vm_object_collapse, vm_object_collapse_scan does a "check for busy page" and calls vm_object_collapse_scan_wait for non-busy objects, which does ... I have no idea.

I can change this to save the sb values before releasing the lock, but I don't know that to be correct. I can change this to do a big goto restart after reacquiring the lock. But I'd just be making wild stabs in the dark.

2266

Sure.

Apply suggestions not related to race conditions.

Incorporate meta_transfer changes. Use blk.index for sp->p in several places. Compute 'start' more simply.

I wonder if we could combine swblk_iter_init() (and swblk_iter_limit_init()) and swblk_start(). Or perhaps introduce some macros similar to VM_MAP_ENTRY_FOREACH(). I think that would be straightforward but for swap_pager_find_least(), which expects the caller to initialize the iterator state. I made some inline comments which argue for keeping swap_pager_find_least() as it was for now.

sys/fs/tmpfs/tmpfs_vnops.c
2164 ↗(On Diff #143361)

What we really want here is a swap pager operation which returns the next pindex for which there is no swap block. Absent that, I'm not sure it's really worth maintaining an iterator here. (I'm trying to argue here that swap_pager_find_least() should keep its old interface, since I think that permits some simplification of the iterator interface.)

sys/vm/swap_pager.c
1917

Can we have two nested loops instead?

1979–1980

If m == NULL here, then the object lock was dropped and the iterator is invalid.

sys/vm/swap_pager.h
43 ↗(On Diff #143361)

This should go into the #ifdef _KERNEL block, together with the functions that reference it.

sys/vm/vm_object.c
1706 ↗(On Diff #143361)

The first part of this loop is searching for the smallest pindex >= pi for which there is a resident page or an assigned swap block. I wonder if it'd make sense to have a combined interface which searches for this pindex, since the vm_page scan will eventually also make use of iterators. (And even more radically, I wonder if it'd be possible and sensible to use the same trie for both resident pages and swap blocks.)

In other words, I wonder if we should just leave this loop as it was before for now.

dougm marked an inline comment as done.

Accept as many suggestions as possible, but without defining a FOREACH macro.

sys/vm/vm_object.c
1706 ↗(On Diff #143361)

@alc was suggesting this reuse just yesterday. He observed that some nodes would have both pages and swap blocks.

Incorporate changes from comments-only commit. Fix calculation of initial start value.

sys/vm/swap_pager.c
1921

The indentation here is wrong.

1957

Is it safe to access sb here, after the object lock was dropped and reacquired? Perhaps the existence of an xbusy page in the object at that pindex is sufficient to provide that guarantee.

sys/vm/vm_object.c
1706 ↗(On Diff #143361)

Indeed, we'd need to come up with some encoding for the keys which handles this overlap. I don't have any suggestions offhand. One comment is that instead of storing vm_page pointers, we could likely compress them into a (segid, offset) tuple which should in general require less than 64 bits.

Address @markj concerns.

sys/vm/swap_pager.c
1957

If it's safe now, it might still become unsafe in the future, or if not that, then remain a puzzle that someone else will ask about someday. So I'll rewrite to avoid the use of sb after lock release.

Undo the last non-whitespace change, since it breaks swap_pager_find_least by possibly creating an empty block.

sys/vm/swap_pager.c
2288–2291

The comment does not match the code.

Update the patch to reflect the changes already checked in to main to address the last couple of comments.

This revision is now accepted and ready to land.Sep 24 2024, 3:47 AM

I don't see any correctness problems.

sys/vm/swap_pager.c
567–568

Can we group the iter functions together?

574
1942

... and if (m == NULL) need only occur after vm_page_alloc()

1945–1958

The page is not dirtied here; as stated in a now deleted comment that happens after the restart.

1961–1962

There ought to be a comment explaining that we get here because of an object lock release.

1980–1981

The correctness of updating pi is subtle; it depends on us having already blocked allocation of swap blocks on the devise being shut down. Otherwise, a restart would always need to start from pi==0. I think this deserves a comment.

2204–2205

Why was this deleted?

2278

A comment motivating the reset would be good.

dougm marked 8 inline comments as done.

Address @alc feedback.

This revision now requires review to proceed.Sep 24 2024, 7:32 PM
sys/vm/swap_pager.c
567–568

Yes.

574

And swblk_next -> swblk_iter_next too.

1980–1981

Added a KASSERT to justify the comment.

2204–2205

A clerical error. Restored.

A performance test. With this patch {F96492700}, which computes stats on calls to swp_pager_meta_free, I ran all 8 stress2 swap tests, recording the stats immediately before and aofter the tests. The stats are 'cycles', 'calls', and 'pages' (freed). The results are:

Original:
vm.stats.swap.pages: 83867620
vm.stats.swap.cycles: 11894179396
vm.stats.swap.calls: 61589970

Modified:
vm.stats.swap.pages: 84177365
vm.stats.swap.cycles: 11623436808
vm.stats.swap.calls: 62086155

Even though, for whatever reason, the test with the modified code freed more pages and made more more function calls, the cycles spent on swp_pager_meta_free fell by 2.276%.

This revision is now accepted and ready to land.Sep 26 2024, 2:37 PM

With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:

x old
+ new
+------------------------------------------------------------------------------+
|    +   ++ +++ ++   ++ +                              x x x  x x  x x x   x  +|
||_____________M____A__________________|                  |_____A_____|        |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  12      11315490      11602387      11444362      11447090     88533.923
+  12      10575194      11651487      10713880      10792243     283272.81
Difference at 95.0% confidence
        -654847 +/- 177689
        -5.72064% +/- 1.54457%
        (Student's t, pooled s = 209859)

With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:

What exactly do you mean by "freeing"? I wouldn't expect this patch on its own to affect how fast we can free pages, or is this including your other iterator patches?

x old
+ new
+------------------------------------------------------------------------------+
|    +   ++ +++ ++   ++ +                              x x x  x x  x x x   x  +|
||_____________M____A__________________|                  |_____A_____|        |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  12      11315490      11602387      11444362      11447090     88533.923
+  12      10575194      11651487      10713880      10792243     283272.81
Difference at 95.0% confidence
        -654847 +/- 177689
        -5.72064% +/- 1.54457%
        (Student's t, pooled s = 209859)

What exactly do you mean by "freeing"? I wouldn't expect this patch on its own to affect how fast we can free pages, or is this including your other iterator patches?

I mean "calling swap_pager_meta_free" after constructing an object and filling it with blocks just for the opportunity to time the freeing of it. This patch does use iterators in swap_pager_meta_free. The timing results do not reflect changes to any file but swap_pager.c.

It's not easy to get a signoff from @alc. Maybe today will be the day. But he demands performance testing.

sys/vm/swap_pager.c
1916
1980

Why is this before the call to swblk_iter_next()? Alternatively, does this variable need to exist? Couldn't restart do swblk_iter(&blks, object, blks.index);?

2279

This is not style(9).

dougm marked 3 inline comments as done.

Update to account for overlapping changes to swap_pager.c. Address comments from @alc. Drop variable 'pi'.

This revision now requires review to proceed.Sep 29 2024, 7:18 AM

With this patch:
{F96917947}
I compared the cycles consumed freeing 1000000 pages without and with this patch. The results from ministat:

x old
+ new
+------------------------------------------------------------------------------+
|    +   ++ +++ ++   ++ +                              x x x  x x  x x x   x  +|
||_____________M____A__________________|                  |_____A_____|        |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  12      11315490      11602387      11444362      11447090     88533.923
+  12      10575194      11651487      10713880      10792243     283272.81
Difference at 95.0% confidence
        -654847 +/- 177689
        -5.72064% +/- 1.54457%
        (Student's t, pooled s = 209859)

I was hoping for a greater gain, but this is perfectly okay. The way this test is constructed probably represents a best case for swap_pager in the sense of bringing out the difference between the old implementation versus the iterator. The reason being that the swap blocks are contiguous, and so there are fewer subr_blist calls and those calls are freeing large batches. So, the time spent in subr_blist is minimized.

I would use this test to try to optimize the pctrie code underlying swblk_iter_next().

sys/vm/swap_pager.c
2425

From a performance standpoint, I'm skeptical of using iterators in cases like this where there will be 0 or 1 calls to swblk_iter_next(). If there are 0 calls, the cost of setting up the iterator is wasted cycles. If there is 1 call, the savings from the old implementation, i.e., walking the trie again, which should be almost all dcache hits, will be offset by the icache misses for executing a new function, swblk_iter_next(). In contrast, the old implementation simply used the same function twice.

This revision is now accepted and ready to land.Sep 29 2024, 7:56 AM
sys/vm/swap_pager.c
2425

The two callers of swap_pager_find_least both call it from within loops, and an earlier version of this had those functions passing an iterator into this function, rather than having this function create it's own every time. But @markj rejected that usage.

sys/vm/swap_pager.c
1967

Don't you need to reset the page iterator too?

Reset the page iterator too.

This revision now requires review to proceed.Sep 29 2024, 8:08 AM
alc added inline comments.
sys/vm/swap_pager.c
1963
2281
This revision is now accepted and ready to land.Sep 29 2024, 4:27 PM
This revision was automatically updated to reflect the committed changes.
dougm marked an inline comment as done.
sys/vm/swap_pager.c
2425

To be clear, my position was that we should try to avoid exposing the iterator interface directly and instead provide a function which can be used to interleave iteration of the swap block and VM object tries, since that's what those callers are doing.