Page MenuHomeFreeBSD

vm_page: uses iterators in page allocaction
ClosedPublic

Authored by dougm on Sun, Apr 6, 6:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 14, 7:55 PM
Unknown Object (File)
Mon, Apr 14, 7:11 AM
Unknown Object (File)
Mon, Apr 14, 5:00 AM
Unknown Object (File)
Mon, Apr 14, 12:53 AM
Unknown Object (File)
Sun, Apr 13, 11:25 PM
Unknown Object (File)
Sun, Apr 13, 11:21 PM
Unknown Object (File)
Sun, Apr 13, 9:19 PM
Unknown Object (File)
Sun, Apr 13, 9:11 PM
Subscribers

Details

Summary

Change vm_page_alloc_after() and vm_page_alloc_domain_after() to take a page iterator argument, to allow for faster insertion. Where callers of those functions don't already have page iterators to use, define them. Where vm_page_grab() is invoked in a loop, invoke a new version with an iterator argument, and pass the same iterator in each call.

Test Plan

I ran 8 buildworlds on unmodified and modified kernels (GENERIC-NODEBUG) and counted function calls and machine cycles for grab_pages, grab, alloc, and kmem_back:

                      Original                              Modified
            #calls      cycles/call              #calls       cycles/call
grab_pages: 758085      4054.885051148618        757899       3966.159947433629
grab_pages: 693497      4205.652320053295        693494       4158.973933444269
grab_pages: 693323      4275.696732980155        693157       4267.863258973075
grab_pages: 692783      4292.253825512462        693679       4304.876390953164
grab_pages: 693684      4850.906807710716        692991       4361.99599128993
grab_pages: 692779      4373.656027391131        693493       4347.772756177784
grab_pages: 692894      4396.158371987634        692740       4415.586531743511
grab_pages: 692491      4444.76656014302         692992       4440.5955956201515
grab:         1349      617.0689399555226          1358       526.9094256259204
grab:          130      1715.0923076923077          139       1682.8345323741007
grab:          145      1664.2344827586207          129       1564.8992248062016
grab:          158      1651.4177215189873          167       1462.497005988024
grab:          148      1728.75                     616       722.3409090909091
grab:          169      1931.0059171597634          208       1377.8942307692307
grab:          142      1754.6338028169014          155       1521.5354838709677
grab:          153      1709.9803921568628          183       1420.9617486338798
alloc:   382293375      1554.7286098667025    382281618       1544.4436770014927
alloc:   382013642      1552.513758136941     381996133       1548.5554854478069
alloc:   381994431      1549.8865196597592    381981646       1562.1966917489015
alloc:   381991733      1561.8816278387887    382003725       1550.2464379372218
alloc:   381991829      1549.7117339962788    382012218       1537.7313351899127
alloc:   381999275      1552.0720443147438    381991216       1558.5707305033945
alloc:   381997091      1562.2941208942348    381984455       1545.1126648098807
alloc:   381998338      1548.8421092737842    381997807       1548.8446383620208
kmem_back:     964      3784.6317427385893         1004       3391.101593625498
kmem_back:     252      4417.242063492064           249       3153.7670682730923
kmem_back:     254      4592.6614173228345          261       2927.9616858237546
kmem_back:     239      3848.928870292887           258       3419.4883720930234
kmem_back:     286      3735.7062937062938          275       4631.861818181818
kmem_back:     268      4253.3432835820895          288       3727.75
kmem_back:     249      4396.907630522089           238       3403.378151260504
kmem_back:     265      3507.879245283019           252       3459.059523809524

The small improvements are more evident for the functions rarely called, and the improvement in alloc may be imperceptible.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm requested review of this revision.Sun, Apr 6, 6:40 PM
dougm created this revision.

Other comments aside, I did some ApacheBench stress testing on a ktls-enabled nginx VM. Everything was stable during the hour-long testing run.

sys/vm/vm_object.c
2287

This pindex == 0 and vm_radix_iter_lookup_le combo appears in a couple of places in this patch, maybe this check should be a part of separate function that wraps vm_radix_iter_lookup (e.g. vm_page_iter_prev) ?
This seems like a thing one could easily forget to check for and we could avoid nesting ternary operators in a couple of places.

sys/vm/vm_page.c
5017–5018

It'd be cleaner without the comma-separated assignments IMO.

5038–5040

The if statements can be fused now that the lengthy assignment is gone.

dougm marked an inline comment as done.
dougm edited the test plan for this revision. (Show Details)
dougm added reviewers: alc, markj, kib.

Apply some suggestions from @bnovkov

Add an iterator to the vm_fault.c code that leads to page allocation, so that the failed lookup is saved in the iterator and not repeated in the allocation call. Testing suggests that this reduces cycle count for page allocation.

                        Original                              Modified
            #calls      cycles/call              #calls       cycles/call
grab_pages: 758085      4054.885051148618        758789	3971.8039718551536
grab_pages: 693497      4205.652320053295        693197	4171.511759283436
grab_pages: 693323      4275.696732980155        693141	4252.595432963856
grab_pages: 692783      4292.253825512462        692789	4262.037125300777
grab_pages: 693684      4850.906807710716        693586	4320.672734167068
grab_pages: 692779      4373.656027391131        692727	4358.834399409869
grab_pages: 692894      4396.158371987634        693103	4383.965191320771
grab_pages: 692491      4444.76656014302         692607	4411.22504104059
grab:         1349      617.0689399555226          1327	512.4235116804823
grab:          130      1715.0923076923077    	     93	1643.1182795698924
grab:          145      1664.2344827586207          166	1291.879518072289
grab:          158      1651.4177215189873          101	1898.7227722772277
grab:          148      1728.75                     113	1583.4690265486727
grab:          169      1931.0059171597634          536	644.2555970149253
grab:          142      1754.6338028169014          151	1372.9205298013244
grab:          153      1709.9803921568628    	     96	1836.8958333333333
alloc:   382293375      1554.7286098667025    382277015	1425.7204792786195
alloc:   382013642      1552.513758136941     382004216	1430.9802041765947
alloc:   381994431      1549.8865196597592    382014184	1429.184649824416
alloc:   381991733      1561.8816278387887    381999382	1431.8969729301814
alloc:   381991829      1549.7117339962788    382013913	1432.5837383807536
alloc:   381999275      1552.0720443147438    381974816	1428.0097463781487
alloc:   381997091      1562.2941208942348    381990390	1433.7703692545774
alloc:   381998338      1548.8421092737842    381989658	1428.7736867001777
kmem_back:     964      3784.6317427385893          408	5452.421568627451
kmem_back:     252      4417.242063492064           268	2730.820895522388
kmem_back:     254      4592.6614173228345          238	3665.0210084033615
kmem_back:     239      3848.928870292887           260	4146.703846153846
kmem_back:     286      3735.7062937062938          294	3151.921768707483
kmem_back:     268      4253.3432835820895          258	5104.709302325581
kmem_back:     249      4396.907630522089           272	3117.25
kmem_back:     265      3507.879245283019           242	4972.524793388429

D49741 has just the vm_fault.c changes added to the latest update here, so that it can be reviewed separately.

sys/vm/vm_page.c
5038–5040

I can rewrite a bit, but just fusing the conditions as you show would have us allocated an already allocated page in some cases.

The performance results in the TEST PLAN are much improved over what I was seeing last fall for a similar set of changes, in particular, the results for vm_page_alloc(). Clearly, the improvements to the implementation of iterators in the last few months, particularly I suspect the elimination of the rather large array on the stack, have made a difference. Modulo any low-level changes/fixes to this patch that are needed, I would be very happy to see it committed. This is really the most critical step in my mind to the elimination of the two linked list pointers from struct vm_page, which I hope will yield a small but measurable performance improvement, particularly for pages allocated from superpage reservations.

Drop a parameter and some dead code from vm_page_insert_lookup().

Update header comments for vm_page_insert_lookup.

sys/vm/vm_kern.c
565

I think you need to reinitialize the iterator here.

sys/vm/vm_page.c
2194–2195

I think this line needs to be wrapped earlier.

2265

The iterator may be invalidated here.

2266

If we slept and then are returning NULL, i.e., WAITFAIL was specified, should we reinitialize the iterator here to spare callers the pain of doing so?

dougm marked 4 inline comments as done.

Make sure that when vm_page_alloc_domain_after() returns, the iterator parameter is valid.

sys/vm/vm_page.c
5260

Isn't this reset redundant now?

dougm marked an inline comment as done.

In three places (vm_page_grab_iter, vm_page_grab_valid, vm_page_grab_pages) drop a call to pctrie_iter_reset() after a call to vm_page_alloc_after() returns NULL and the object lock was not released and reacquired, since vm_page_alloc_after() now promises to return with the iterator valid..

Drop a pctrie_iter_reset() fro vm_page_grab_zero_partial(), since vm_page_alloc_after() leaves the iterator valid.

markj added inline comments.
sys/vm/vm_fault.c
2243

The iterator needs to be reinitialized here too.

This revision is now accepted and ready to land.Mon, Apr 14, 1:16 AM
This revision was automatically updated to reflect the committed changes.
dougm marked an inline comment as done.