Change vm_page_alloc_after() and vm_page_alloc_domain_after() to take a page iterator argument, to allow for faster insertion. Where callers of those functions don't already have page iterators to use, define them. Where vm_page_grab() is invoked in a loop, invoke a new version with an iterator argument, and pass the same iterator in each call.
Details
- Reviewers
alc markj kib - Commits
- rG9cc73397534e: vm_page: use iterators in page allocaction
I ran 8 buildworlds on unmodified and modified kernels (GENERIC-NODEBUG) and counted function calls and machine cycles for grab_pages, grab, alloc, and kmem_back:
Original Modified #calls cycles/call #calls cycles/call grab_pages: 758085 4054.885051148618 757899 3966.159947433629 grab_pages: 693497 4205.652320053295 693494 4158.973933444269 grab_pages: 693323 4275.696732980155 693157 4267.863258973075 grab_pages: 692783 4292.253825512462 693679 4304.876390953164 grab_pages: 693684 4850.906807710716 692991 4361.99599128993 grab_pages: 692779 4373.656027391131 693493 4347.772756177784 grab_pages: 692894 4396.158371987634 692740 4415.586531743511 grab_pages: 692491 4444.76656014302 692992 4440.5955956201515 grab: 1349 617.0689399555226 1358 526.9094256259204 grab: 130 1715.0923076923077 139 1682.8345323741007 grab: 145 1664.2344827586207 129 1564.8992248062016 grab: 158 1651.4177215189873 167 1462.497005988024 grab: 148 1728.75 616 722.3409090909091 grab: 169 1931.0059171597634 208 1377.8942307692307 grab: 142 1754.6338028169014 155 1521.5354838709677 grab: 153 1709.9803921568628 183 1420.9617486338798 alloc: 382293375 1554.7286098667025 382281618 1544.4436770014927 alloc: 382013642 1552.513758136941 381996133 1548.5554854478069 alloc: 381994431 1549.8865196597592 381981646 1562.1966917489015 alloc: 381991733 1561.8816278387887 382003725 1550.2464379372218 alloc: 381991829 1549.7117339962788 382012218 1537.7313351899127 alloc: 381999275 1552.0720443147438 381991216 1558.5707305033945 alloc: 381997091 1562.2941208942348 381984455 1545.1126648098807 alloc: 381998338 1548.8421092737842 381997807 1548.8446383620208 kmem_back: 964 3784.6317427385893 1004 3391.101593625498 kmem_back: 252 4417.242063492064 249 3153.7670682730923 kmem_back: 254 4592.6614173228345 261 2927.9616858237546 kmem_back: 239 3848.928870292887 258 3419.4883720930234 kmem_back: 286 3735.7062937062938 275 4631.861818181818 kmem_back: 268 4253.3432835820895 288 3727.75 kmem_back: 249 4396.907630522089 238 3403.378151260504 kmem_back: 265 3507.879245283019 252 3459.059523809524
The small improvements are more evident for the functions rarely called, and the improvement in alloc may be imperceptible.
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Other comments aside, I did some ApacheBench stress testing on a ktls-enabled nginx VM. Everything was stable during the hour-long testing run.
sys/vm/vm_object.c | ||
---|---|---|
2287 | This pindex == 0 and vm_radix_iter_lookup_le combo appears in a couple of places in this patch, maybe this check should be a part of separate function that wraps vm_radix_iter_lookup (e.g. vm_page_iter_prev) ? | |
sys/vm/vm_page.c | ||
5017–5018 | It'd be cleaner without the comma-separated assignments IMO. | |
5038–5040 | The if statements can be fused now that the lengthy assignment is gone. |
Add an iterator to the vm_fault.c code that leads to page allocation, so that the failed lookup is saved in the iterator and not repeated in the allocation call. Testing suggests that this reduces cycle count for page allocation.
Original Modified #calls cycles/call #calls cycles/call grab_pages: 758085 4054.885051148618 758789 3971.8039718551536 grab_pages: 693497 4205.652320053295 693197 4171.511759283436 grab_pages: 693323 4275.696732980155 693141 4252.595432963856 grab_pages: 692783 4292.253825512462 692789 4262.037125300777 grab_pages: 693684 4850.906807710716 693586 4320.672734167068 grab_pages: 692779 4373.656027391131 692727 4358.834399409869 grab_pages: 692894 4396.158371987634 693103 4383.965191320771 grab_pages: 692491 4444.76656014302 692607 4411.22504104059 grab: 1349 617.0689399555226 1327 512.4235116804823 grab: 130 1715.0923076923077 93 1643.1182795698924 grab: 145 1664.2344827586207 166 1291.879518072289 grab: 158 1651.4177215189873 101 1898.7227722772277 grab: 148 1728.75 113 1583.4690265486727 grab: 169 1931.0059171597634 536 644.2555970149253 grab: 142 1754.6338028169014 151 1372.9205298013244 grab: 153 1709.9803921568628 96 1836.8958333333333 alloc: 382293375 1554.7286098667025 382277015 1425.7204792786195 alloc: 382013642 1552.513758136941 382004216 1430.9802041765947 alloc: 381994431 1549.8865196597592 382014184 1429.184649824416 alloc: 381991733 1561.8816278387887 381999382 1431.8969729301814 alloc: 381991829 1549.7117339962788 382013913 1432.5837383807536 alloc: 381999275 1552.0720443147438 381974816 1428.0097463781487 alloc: 381997091 1562.2941208942348 381990390 1433.7703692545774 alloc: 381998338 1548.8421092737842 381989658 1428.7736867001777 kmem_back: 964 3784.6317427385893 408 5452.421568627451 kmem_back: 252 4417.242063492064 268 2730.820895522388 kmem_back: 254 4592.6614173228345 238 3665.0210084033615 kmem_back: 239 3848.928870292887 260 4146.703846153846 kmem_back: 286 3735.7062937062938 294 3151.921768707483 kmem_back: 268 4253.3432835820895 258 5104.709302325581 kmem_back: 249 4396.907630522089 272 3117.25 kmem_back: 265 3507.879245283019 242 4972.524793388429
The performance results in the TEST PLAN are much improved over what I was seeing last fall for a similar set of changes, in particular, the results for vm_page_alloc(). Clearly, the improvements to the implementation of iterators in the last few months, particularly I suspect the elimination of the rather large array on the stack, have made a difference. Modulo any low-level changes/fixes to this patch that are needed, I would be very happy to see it committed. This is really the most critical step in my mind to the elimination of the two linked list pointers from struct vm_page, which I hope will yield a small but measurable performance improvement, particularly for pages allocated from superpage reservations.
sys/vm/vm_kern.c | ||
---|---|---|
565 | I think you need to reinitialize the iterator here. | |
sys/vm/vm_page.c | ||
2194–2195 | I think this line needs to be wrapped earlier. | |
2265 | The iterator may be invalidated here. | |
2266 | If we slept and then are returning NULL, i.e., WAITFAIL was specified, should we reinitialize the iterator here to spare callers the pain of doing so? |
Make sure that when vm_page_alloc_domain_after() returns, the iterator parameter is valid.
sys/vm/vm_page.c | ||
---|---|---|
5260 | Isn't this reset redundant now? |
In three places (vm_page_grab_iter, vm_page_grab_valid, vm_page_grab_pages) drop a call to pctrie_iter_reset() after a call to vm_page_alloc_after() returns NULL and the object lock was not released and reacquired, since vm_page_alloc_after() now promises to return with the iterator valid..
Drop a pctrie_iter_reset() fro vm_page_grab_zero_partial(), since vm_page_alloc_after() leaves the iterator valid.
sys/vm/vm_fault.c | ||
---|---|---|
2243 | The iterator needs to be reinitialized here too. |