Page MenuHomeFreeBSD

VM: Stabilize map->anon_loc to reuse memory region
AbandonedPublic

Authored by austin.zhang_dell.com on Apr 21 2023, 6:12 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 15, 12:04 AM
Unknown Object (File)
Oct 16 2024, 5:26 PM
Unknown Object (File)
Oct 7 2024, 10:15 AM
Unknown Object (File)
Sep 30 2024, 11:44 PM
Unknown Object (File)
Sep 30 2024, 11:36 PM
Unknown Object (File)
Sep 30 2024, 9:42 PM
Unknown Object (File)
Sep 23 2024, 8:28 PM
Unknown Object (File)
Sep 8 2024, 11:12 PM

Details

Summary

introduce vm.anon_low_pref option to optimize cluster anon behavior.

Enabling this option, will stabilize map->anon_loc for searching clustering candidates allows better reuse of returned memory regions.
This option is disabled by default.

Continuously updating map->anon_loc could impact jemalloc metadata consumption when handling virtual address upward shifts.
By stabilizing the address of map->anon_loc, we can mitigate metadata consumption in jemalloc.

Test Plan

POC to simulate the issue:

  • sysctl vm.cluster_anon=2
  • Disable Jemalloc opt.retain: setenv MALLOC_CONF 'retain:false'
  • run below test code
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <assert.h>

#define M_SIZE (100 * 1024 * 1024)

int main() {
    char *buffer_ptr;
    void *mmap_ptr;

    printf("Press any key to start test...\n");
    getchar();

    while (1) {
        buffer_ptr = (char *)malloc(M_SIZE);
        assert(buffer_ptr);
        mmap_ptr = mmap(NULL, M_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
        assert(mmap_ptr);
        free(buffer_ptr);
        munmap(mmap_ptr, M_SIZE);
    }

    return 0;
}

test result:

  1. before the change, test code's memory consumption keep growing
root@freebsd-zhanga28:~/workspace/freebsd-src # uname -a
FreeBSD freebsd-zhanga28 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n261323-8243e174630f: Sun Mar  5 10:07:43 CST 2023     root@freebsd-zhanga28:/usr/obj/root/workspace/freebsd-src/amd64.amd64/sys/GENERIC amd64

root@freebsd-zhanga28:~/workspace/triage/mmap-anon # sysctl vm.cluster_anon
vm.cluster_anon: 2
root@freebsd-zhanga28:~/workspace/triage/mmap-anon # setenv MALLOC_CONF "retain:false"
root@freebsd-zhanga28:~/workspace/triage/mmap-anon # ./a.out
Press any key to start test...


root@freebsd-zhanga28:~/workspace/triage/mmap-anon # date ; ps -o pid,comm,rss,vsz -p `pgrep a.out`
Fri Apr 21 09:53:51 CST 2023
  PID COMMAND   RSS    VSZ
48055 a.out   62124 141880


root@freebsd-zhanga28:~/workspace/triage/mmap-anon # date ; ps -o pid,comm,rss,vsz -p `pgrep a.out`
Fri Apr 21 09:57:24 CST 2023
  PID COMMAND   RSS    VSZ
48055 a.out   91816 694840
  1. after change, test code's memory consumption is stable
FreeBSD freebsd-zhanga28 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n262347-5dd5c7b6cc07-dirty: Thu Apr 20 17:59:31 CST 2023     root@freebsd-zhanga28:/usr/obj/root/workspace/freebsd-src/amd64.amd64/sys/GENERIC amd64

root@freebsd-zhanga28:~ # date ; ps -o pid,comm,rss,vsz -p `pgrep a.out`
Fri Apr 21 10:06:41 CST 2023
PID COMMAND   RSS    VSZ
802 a.out   44224 127544

root@freebsd-zhanga28:~ # date ; ps -o pid,comm,rss,vsz -p `pgrep a.out`
Fri Apr 21 10:11:22 CST 2023
PID COMMAND   RSS    VSZ
802 a.out   84940 127544

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Is the patch reversed? Also, please generate full context when uploading a patch to the FreeBSD phabricator (diff -U 999999999).

If the patch is reversed, I believe that it is an optimization for very specific usage. For instance, if you change your demostration program to increment the malloc'ed size by PAGE_SIZE for each iteration, VSS would start growing the same as without the patch. Really it just disables anon clustering.

That said, I would not object against a knob to disable updating anon_loc, if it helps for some specific heavy loads.

In D39743#905866, @kib wrote:

Is the patch reversed? Also, please generate full context when uploading a patch to the FreeBSD phabricator (diff -U 999999999).

If the patch is reversed, I believe that it is an optimization for very specific usage. For instance, if you change your demostration program to increment the malloc'ed size by PAGE_SIZE for each iteration, VSS would start growing the same as without the patch. Really it just disables anon clustering.

That said, I would not object against a knob to disable updating anon_loc, if it helps for some specific heavy loads.

Thanks for reviewing. Indeed, if the memory allocation size increases with each iteration, the freed memory holes will become too small to be reusable.

In normal cases, the allocation size would vary, so reusing memory holes is possible, which would benefit userspace's memory footprint. My understanding is that it still involves anon clustering, please correct me if I misunderstand the design.
On the other hand, continually updating anon_loc could lead to user-level memory leaks, as it always return the highest available address, forcing jemalloc to map a new virtual memory range and bigger metadata usage.

Why do you have cluster_anon set to 2?

By setting vm.cluster_anon=1, the clustering and updating of anon_loc will not occur during the test, preventing the occurrence of the jemalloc leak issue.
Which means this change is specifically targeting scenarios with vm.cluster_anon=2. So is it not recommend to use cluster_anon=2 in practice? we tried this setting to suppress vma fragment.

austin.zhang_dell.com edited the summary of this revision. (Show Details)

introduce vm.anon_low_pref option to optimize cluster anon behavior.

sys/vm/vm_map.c
1986

I think that the description is not correct. For me, the function is like "Prefer lower address over clustering for anonymous mappings"

sys/vm/vm_map.c
1986

this option doesn't change the value of cluster, so the cluster logic should be still applied.

if (try == 1 && en_aslr && !cluster){
} else {
    //still enter this branch
}

update_anon would update the value of following curr_min_addr, without this logic, lower address is preferred.

sys/vm/vm_map.c
1986

It does change, but I suggest something different. Please try D39845