Page MenuHomeFreeBSD

stat(2): add st_filerev
AcceptedPublic

Authored by kib on Mon, Jan 13, 10:41 PM.
Tags
None
Referenced Files
F107534932: D48452.diff
Wed, Jan 15, 1:35 PM
F107480042: D48452.diff
Tue, Jan 14, 7:18 PM
F107474999: D48452.diff
Tue, Jan 14, 5:26 PM
F107466118: D48452.diff
Tue, Jan 14, 2:04 PM
Subscribers

Details

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Mon, Jan 13, 10:41 PM

Since st_filerev provides similar information to st_gen, should it also be guarded by PRIV_VFS_GENERATION in vop_stat_helper_post? And does it warrant mention in stat.2 ?

Since st_filerev provides similar information to st_gen, should it also be guarded by PRIV_VFS_GENERATION in vop_stat_helper_post? And does it warrant mention in stat.2 ?

I do not know why st_gen is considered privileged information, but it might have
been that knowing it is useful when trying to fake a file handle?
st__filerev is not useful that way. It is really about the same as mtime and ctime, I think?

Since st_filerev provides similar information to st_gen, should it also be guarded by PRIV_VFS_GENERATION in vop_stat_helper_post? And does it warrant mention in stat.2 ?

I do not know why st_gen is considered privileged information, but it might have
been that knowing it is useful when trying to fake a file handle?
st__filerev is not useful that way. It is really about the same as mtime and ctime, I think?

Yeah, that make sense, Rick.

This revision is now accepted and ready to land.Tue, Jan 14, 2:28 AM
sys/kern/kern_descrip.c
1623

I believe kern_statat() needs a similar initialization.

kib marked an inline comment as done.Tue, Jan 14, 5:10 PM
kib added inline comments.
sys/kern/kern_descrip.c
1623

kern_statat() calls VOP_STAT() and not fo_stat(). I believed that I converted all existing VOP_STATs (ufs, nullfs, default) but apparently missed tmpfs.

If adding to kern_statat(), perhaps vop_stat_helper_pre() is more useful place, but I think it is enough to fix tmpfs.

kib marked an inline comment as done.

Zero st_filerev in tmpfs_stat()

This revision now requires review to proceed.Tue, Jan 14, 5:11 PM
sys/kern/vfs_default.c
1591

We need to initialize va_filerev before calling VOP_GETATTR above.

sys/kern/vfs_default.c
1591

Aren't filesystems already handle va_filerev?

sys/kern/vfs_default.c
1591

Sorry, I don't follow. Suppose I fstatat() a file on a filesystem that uses vop_stdstat(), e.g., fusefs. vap->va_filerev will be left uninitialized since fuse_vnop_getattr() doesn't know about the new field, and it will be copied to st_filerev, so we copy uninitialized bytes to userspace.

See the comment "Initialize defaults for new and unusual fields..." at the beginning of this function.

Out of curiosity, what's the need behind exposing this info? FYI, it seems va_filerev has been correct on UFS, ZFS and p9fs only, and filesystems using init_va_filerev() seem to set it to a dubious value.

sys/kern/kern_descrip.c
1623

This init here is for the sake of non-vnode files, a situation that can't happen in kern_statat().

sys/kern/vfs_default.c
1591

It's true that va_filerev has existed from the start (4.4BSD import), but since it isn't well known, I'd prefer it to be initialized for sure one way or the other. Under the "Initialize defaults for new and unusual fields" comment is one possibility.

However, some FSes (fuse and ZFS through zfsctl_root_getattr()) do not set va_filerev in VOP_GETATTR().

So perhaps the best course is instead to initialize it to 0 in a pre routine for VOP_GETATTR().

sys/kern/vfs_default.c
1591

Oh, I missed that va_filerev is not new. But yes, some filesystems still do not populate it, so we should either fix them or add this default value.

Out of curiosity, what's the need behind exposing this info? FYI, it seems va_filerev has been correct on UFS, ZFS and p9fs only, and filesystems using init_va_filerev() seem to set it to a dubious value.

There are two motivations:

  • To use in regression tests, to ensure that file systems like fusefs set va_filerev correctly.
  • To use for userspace NFS servers. None currently known can make use of this information, but future ones might.
kib marked an inline comment as done.Tue, Jan 14, 7:03 PM
kib added inline comments.
sys/kern/vfs_default.c
1591

I qualify this as a bug in fusefs, in fact. All (as far as I see) other fses do set va_filerev in VOP_GETATTR().

Ok, lets add this line to make the change go in.

Zero st_filerev in vop_stat_helper_pre().
Zero va_filerev before calling VOP_GETATTR() in vfs_stdstat()

I'm OK with these changes as they are.

That said, even if I agree there is, e.g., a bug in fusefs as mentioned in inline comments, I'm still wondering how to prevent non-initialized fields bugs in the VFS in a more general fashion. It's unfortunate that C does not allow to specify a default initializer per type. Maybe a follow-up attempt creating a new pre for VOP_GETATTR() would do good, although it will show up as not optimal as the compiler can't elide useless pre-initialization in it (for fields that are effectively filled by the FS' VOP_GETATTR() implementation). We would need a slightly different implementation for it to be able to perform such elisions.

This revision is now accepted and ready to land.Tue, Jan 14, 8:13 PM

I'm OK with these changes as they are.

That said, even if I agree there is, e.g., a bug in fusefs as mentioned in inline comments, I'm still wondering how to prevent non-initialized fields bugs in the VFS in a more general fashion. It's unfortunate that C does not allow to specify a default initializer per type. Maybe a follow-up attempt creating a new pre for VOP_GETATTR() would do good, although it will show up as not optimal as the compiler can't elide useless pre-initialization in it (for fields that are effectively filled by the FS' VOP_GETATTR() implementation). We would need a slightly different implementation for it to be able to perform such elisions.

Perhaps some background w.r.t. va_filerev is in order.
I think I added it when I added NFSv4 support to FreeBSD
around FreeBSD 7, although I haven't looked in the commit logs.

It is specifically for NFSv4's change attribute.
change is defined as a value that changes whenever the file's
data or metadata changes. The RFCs do not define how it changes,
just that the value should not be the same.

To implement this correctly it needs to be in the on-disk i-node or
similar (it should persist across cases where the i-node falls out of memory
and when the server reboots).

For file systems that do not store it on-disk, it can only be approximated.
For read-only exports, it need only remain the same, so setting it to 0
is appropriate.

For read-write exports, ctime is a good approximation if that lives on-disk.
Otherwise, setting it to a value that always changes when the i-node is created
in memory is probably the best you can do. (That is what init_va_filerev() does when
it sets it to binuptime.) This makes it look like the file has changed when it might not
have, but is probably preferable to not showing a change when one occurs.

fuse should probably do the best it can, given the above.

In summary, there is no one correct initialization if it is not stored on-disk.

Perhaps some background w.r.t. va_filerev is in order.
I think I added it when I added NFSv4 support to FreeBSD
around FreeBSD 7, although I haven't looked in the commit logs.

According to git, va_filerev has been present since the import of BSD 4.4, so is much older than that. UFS has been populating it since the import too. It does increment it by 1 at each modification. Initially, it would do so only for content modifications, which you changed in 2009 (933a30b6b76f59694a52, r191564) to incrementing on metadata changes also, along with storing it in the on-disk inode.

It is specifically for NFSv4's change attribute.
change is defined as a value that changes whenever the file's
data or metadata changes. The RFCs do not define how it changes,
just that the value should not be the same.

Thanks for this info. I have a more precise proposal below.

For file systems that do not store it on-disk, it can only be approximated.
For read-only exports, it need only remain the same, so setting it to 0
is appropriate.

Sure for the "approximation", but I think it would make sense to reserve the value 0, see below.

Shouldn't the revision change even in the case of read-only exports if the file is modified by other means (e.g., locally)?

For read-write exports, ctime is a good approximation if that lives on-disk.
Otherwise, setting it to a value that always changes when the i-node is created
in memory is probably the best you can do. (That is what init_va_filerev() does when
it sets it to binuptime.) This makes it look like the file has changed when it might not
have, but is probably preferable to not showing a change when one occurs.

I have an alternate point of view and proposal. According to what you reported about the RFCs, it is in fact a refinement of what they prescribe.

If the goal of va_filerev is only to be able to notice file changes, then it seems redundant with other attributes we already have (va_mtime and va_ctime), and I don't think FSs need to actually store it (on disk or in memory) as it could be generated on-the-fly from them (taking the max).

But what UFS does (and ext2fs, and maybe some others) is to increment this value by 1 (i_modrev). Crucially, this has more value than generating a number from va_mtime and va_ctime, and I think is more true to the "revision" word implied by the field names.

Having a number that increments by 1 at each change has more value since that gives the information of the number of modifications from the creation, which cannot be inferred in any way from the recorded times of last creation/modification. I think all filesystems should be changed to behave like this. Having the number of modifications could be useful, e.g., for statistics purposes.

We should probably reserve 0 for filesystems that do not support the feature, and have those that support it have new files start with a revision of 1, as well as old files on a cold start if the revision is not stored on disk.

This gives the possibility for remote clients to actually notice whether the revision is reliable at all (by testing if it is 0 or not) and to check if for some reason the revision went backwards after having grown (in which case, revision should not be assumed to be persistent).