HomeFreeBSD

Revise ARC shrinker algorithm

Description

Revise ARC shrinker algorithm

The ARC shrinker callback arc_shrinker_count/_scan() is invoked by the
kernel's shrinker mechanism when the system is running low on free
pages. This happens via 2 code paths:

  1. "direct reclaim": The system is attempting to allocate a page, but we

are low on memory. The ARC shrinker callback is invoked from the
page-allocation code path.

  1. "indirect reclaim": kswapd notices that there aren't many free pages,

so it invokes the ARC shrinker callback.

In both cases, the kernel's shrinker code requests that the ARC shrinker
callback release some of its cache, and then it measures how many pages
were released. However, it's measurement of released pages does not
include pages that are freed via __free_pages(), which is how the ARC
releases memory (via abd_free_chunks()). Rather, the kernel shrinker
code is looking for pages to be placed on the lists of reclaimable pages
(which is separate from actually-free pages).

Because the kernel shrinker code doesn't detect that the ARC has
released pages, it may call the ARC shrinker callback many times,
resulting in the ARC "collapsing" down to arc_c_min. This has several
negative impacts:

  1. ZFS doesn't use RAM to cache data effectively.
  1. In the direct reclaim case, a single page allocation may wait a long

time (e.g. more than a minute) while we evict the entire ARC.

  1. Even with the improvements made in 67c0f0dedc5 ("ARC shrinking blocks

reads/writes"), occasionally arc_size may stay above arc_c for the
entire time of the ARC collapse, thus blocking ZFS read/write operations
in arc_get_data_impl().

To address these issues, this commit limits the ways that the ARC
shrinker callback can be used by the kernel shrinker code, and mitigates
the impact of arc_is_overflowing() on ZFS read/write operations.

With this commit:

  1. We limit the amount of data that can be reclaimed from the ARC via

the "direct reclaim" shrinker. This limits the amount of time it takes
to allocate a single page.

  1. We do not allow the ARC to shrink via kswapd (indirect reclaim).

Instead we rely on arc_evict_zthr to monitor free memory and reduce
the ARC target size to keep sufficient free memory in the system. Note
that we can't simply rely on limiting the amount that we reclaim at once
(as for the direct reclaim case), because kswapd's "boosted" logic can
invoke the callback an unlimited number of times (see
balance_pgdat()).

  1. When arc_is_overflowing() and we want to allocate memory,

arc_get_data_impl() will wait only for a multiple of the requested
amount of data to be evicted, rather than waiting for the ARC to no
longer be overflowing. This allows ZFS reads/writes to make progress
even while the ARC is overflowing, while also ensuring that the eviction
thread makes progress towards reducing the total amount of memory used
by the ARC.

  1. The amount of memory that the ARC always tries to keep free for the

rest of the system, arc_sys_free is increased.

  1. Now that the shrinker callback is able to provide feedback to the

kernel's shrinker code about our progress, we can safely enable
the kswapd hook. This will allow the arc to receive notifications
when memory pressure is first detected by the kernel. We also
re-enable the appropriate kstats to track these callbacks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10600

Details

Provenance
mahrensAuthored on Aug 1 2020, 4:10 AM
GitHub <noreply@github.com>Committed on Aug 1 2020, 4:10 AM
Parents
rG18c624302d44: ZTS: zvol_misc_volmode is flaky on FreeBSD
Branches
Unknown
Tags
Unknown