VM batchqueues allow bulk insertion of a group of pages into a vm page queue. Doing things in bulk, with one acquire of the vm page queue mutex reduces overheads. Batchqueues have helped us considerably with our workload.
However, with a large core count machine (64 or more) using a "Netflix" style workload, where millions of pages per second are processed via sendfile(), we still see lock contention on the inactive queue. According to lockstat, the vm inactive pagequeue mutex is the most contended in the system, and is called out of vm_page_pqbatch_submit() [1].
This patch changes how batchqueues work. Rather than waiting until the batchqueue is full to acquire the lock & process the queue, we now start trying to acquire the lock using trylocks when the batchqueue is 1/2 full. This removes almost all contention on the vm pagequeue mutex for us. [2]
So that the system does not loose the benefit of processing large batchqueues, I've doubled the size of the batchqueues. This way, when there is no contention, we process the same batch size as before.
[1]:
``
Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 428303 28% 28% 0.00 179962 vm inactive pagequeue vm_page_pqbatch_submit+0x234 372617 25% 53% 0.00 17459 counter_fo counter_fo_add+0x1f4 206186 14% 67% 0.00 1838 mlx5tx mlx5e_xmit+0x243 157055 10% 77% 0.00 1044 tcp_hpts_lck tcp_hpts_insert_diag+0x60e 104862 7% 84% 0.00 1143 tcp_hpts_lck tcp_hpts_remove+0xa8 ``
[2]
Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------------- 396499 33% 33% 0.00 13957 counter_fo counter_fo_add+0x1f4 203257 17% 50% 0.00 1343 mlx5tx mlx5e_xmit+0x243 162038 13% 63% 0.00 860 tcp_hpts_lck tcp_hpts_insert_diag+0x60e 108576 9% 72% 0.00 932 tcp_hpts_lck tcp_hpts_remove+0xa8 45724 4% 76% 0.00 165111 vm inactive pagequeue vm_page_pqbatch_submit+0x234