vfs cache: describe various optimization ideas
While here report a sample result from running on Sapphire Rapids:
An access(2) loop slapped into will-it-scale, like so:
while (1) { int error = access(tmpfile, R_OK); assert(error == 0); (*iterations)++; }
.. operating on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c
In operations per second:
lockless: 3462164
locked: 1362376
While the over 3.4 mln may seem like a big number, a critical look shows
it should be significantly higher.
A poor man's profiler, counting how many times given routine was sampled:
dtrace -w -n 'profile:::profile-4999 /execname == "a.out"/ {
@[sym(arg0)] = count(); } tick-5s { system("clear"); trunc(@, 40);
printa("%40a %@16d\n", @); clear(@); }'
[snip]
kernel`kern_accessat 231 kernel`cpu_fetch_syscall_args 324 kernel`cache_fplookup_cross_mount 340 kernel`namei 346 kernel`amd64_syscall 352 kernel`tmpfs_fplookup_vexec 388 kernel`vput 467 kernel`vget_finish 499 kernel`lockmgr_unlock 529 kernel`lockmgr_slock 558 kernel`vget_prep_smr 571 kernel`vput_final 578 kernel`vdropl 1070 kernel`memcmp 1174 kernel`0xffffffff80 2080 0x0 2231 kernel`copyinstr_smap 2492 kernel`cache_fplookup 9246