Page MenuHomeFreeBSD

cache: add an introductory comment
ClosedPublic

Authored by mjg on Feb 15 2021, 1:37 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 24, 5:27 PM
Unknown Object (File)
Wed, Jan 22, 12:25 PM
Unknown Object (File)
Wed, Jan 22, 12:25 PM
Unknown Object (File)
Sat, Jan 18, 10:08 PM
Unknown Object (File)
Sat, Jan 18, 5:19 PM
Unknown Object (File)
Wed, Jan 8, 2:25 AM
Unknown Object (File)
Dec 22 2024, 6:51 PM
Unknown Object (File)
Dec 21 2024, 4:18 PM
Subscribers
None

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

mjg requested review of this revision.Feb 15 2021, 1:37 AM
mjg created this revision.
sys/kern/vfs_cache.c
141

Perhaps add some language that it is up to the filesystem to populate name cache as appropriate.

Also I believe it is useful to note that name cache cannot be too dynamic, which means that pseudo file systems tend to not use cache. For instance, procfs and devfs. I believe that nullfs does not use namecache at all due to the structure.

Note in the text that it might be useful to allow special filesystems to avoid namecache, nonc tmpfs option as an example.

153

Not component but a vnode, i.e. the lookup result. I think it is too confusing to leave it as is.

165

dtrace probe

199

This statement depends on fs. For instance, for UFS it is not true, i.e. there cannot exist a cached inode without corresponding vnode. Even for SU, where inode block might have dependencies attached to it (and thus cannot be reclaimed), it is still not thecase. I think this is also false for any filesystem in FreeBSD that uses persistent storage.

In other words, only tmpfs has this special feature.

203

I think this line needs some grammar fix, not sure which

sys/kern/vfs_cache.c
141

It states filesystems have to call cache_enter and even shows how ufs can end up doing it.

devfs is a bad example in that it is mostly static and at least one node (null) keeps seeing a lot of traffic

i'll see about procfs

199

In that case can something be done about:

/*
  * clustering stuff
  */
 daddr_t v_cstart;                       /* v start block of cluster */
 daddr_t v_lasta;                        /* v last allocation  */
 daddr_t v_lastw;                        /* v last write  */
 int     v_clen;                         /* v length of cur. cluster */

Removing this and moving v_hash into a 4 byte hole is almost enough to shrink the struct enough to fit 9 instead of 8 objects per page. There are few more bytes to shave elsewhere in the struct.

sys/kern/vfs_cache.c
199

none of zfs, tmpfs nor nullfs use it

sys/kern/vfs_cache.c
199

Of course zfs/tmpfs/nullfs do not use these fields, because they are for clustering support in the buffer cache.

I do not have a good idea where to move them. It would be fine to have them in inode, but then clustering code needs a way to find the place having vp as an input. Perhaps there could be a VOP like vop_clusterdata() that returns a pointer to the structure of these four fields.

sys/kern/vfs_cache.c
199

Cursory read suggests vfs_cluster.c can grow an argument which can be a pointer to these fields.

sys/kern/vfs_cache.c
199

VOP would be somewhat less clumsy, but ok. D28679

It is strange that you do not complain about v_bufobj then, on the same basis. On the other hand, moving bufobj to nodes is harder due to v_object.

sys/kern/vfs_cache.c
199

I don't have a strong opinion, just a suggestion. It does save an indirect function call.

There is a lot more to remove or shrink. Basically the vnode can be made to fit 9 per page with some headway.

Thanks for rewriting this comment.
I have provided some suggestions for cleanup and/or clarification.

sys/kern/vfs_cache.c
89

Like byte-range file locking, I originally implemented the name cache as part of UFS. Both were later extracted from UFS and provided as kernel support so that they could be used by other filesystems.

94

thorought should either be thorough or thoroughout

95

shortcommings => shortcomings

106

represented => described

111

vnode (see => vnode, see

118

hacks around the problem => minimises the number of locks

127

set => provided

145

For lockless case forward lookup avoids any writes to share areas apart from the terminal path component. =>
For lockless lookups, every path-name component, except possibly the last one, can be found and traversed without needing to make any modification to the cache data structures.

191

that => than

203

I do not understand this sentence: now hard to replace with malloc due to dependence on SMR.
What are the limits of SMR?
I am not aware of limits placed on use of malloc by SMR.
Are there limits placed on the cache entries by SMR?

This revision is now accepted and ready to land.Feb 16 2021, 1:23 AM
  • address some of the feedback
This revision now requires review to proceed.Feb 16 2021, 6:05 AM
mjg marked 4 inline comments as done.
mjg marked 3 inline comments as done.Feb 16 2021, 6:22 AM
mjg added inline comments.
sys/kern/vfs_cache.c
145

I don't think this covers it. Key here is that this scales completely and it stems from not bouncing cache lines with anyone.

203

First, consider a per-SMR kernel.

Namecache maintains its own zones but there is no good reason to do it that I know of, instead it could use malloc (with better than power-of-2 granularity for malloc zones). In general, if more of the kernel was using malloc instead of hand rolled zones, it would be easier to justify better malloc granularity and likely get better total memory usage. For example most names are < 8 characters or so, yet every single one allocates a 104 byte object with 45 bytes to hold them. But there is already some overhead for the very fact that we are dealing with a namecache entry, so the buffer which comes after it has to be worth it. Should namecache-specific zones get abolished, the problem would disappear.

Next note in the current implementation traversal has to access memory from any of 4 namecache zones, the vnode zone and pwd zone (for cwd and other pointers).

Consider a kernel with a generalized safe memory reclamation mechanism which supports all allocations. Whether the namecache is using custom zones, malloc or whatever else, there would be smr_enter or compatible call upfront to provide protection. This is roughly how it works in Linux with RCU.

SMR as implemented here is very different. "vfs_smr" object is created and installed in aforementioned zones, then smr_enter(vfs_smr) provides protection against freeing from said zones. But it does not provide any protection against freeing from malloc or whatever else not roped in with vfs_smr. Let's say someone added malloc_smr. It would have to be smr_enter-ed separately and that comes with a huge single-threaded cost (an atomic op). There was a mode to do it without said op, but it would still come with extra overhead making this a non-starter. I don't know how the current could would handle a hypothetical global smr.

That's the rough outline. As you can see the comment assumed some familiarity with the above.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 2 2021, 3:15 AM
This revision was automatically updated to reflect the committed changes.