Page MenuHomeFreeBSD

Kernel changes needed for lsof to work using only user mode APIs
AbandonedPublic

Authored by damjan.jov_gmail.com on Jan 29 2022, 10:30 AM.
Referenced Files
Unknown Object (File)
Tue, Dec 31, 2:36 AM
Unknown Object (File)
Sun, Dec 29, 7:27 PM
Unknown Object (File)
Wed, Dec 25, 2:23 PM
Unknown Object (File)
Dec 2 2024, 11:49 AM
Unknown Object (File)
Nov 27 2024, 3:10 AM
Unknown Object (File)
Nov 18 2024, 2:17 AM
Unknown Object (File)
Nov 18 2024, 12:47 AM
Unknown Object (File)
Nov 17 2024, 2:07 PM

Details

Reviewers
None
Group Reviewers
Contributor Reviews (src)
Summary
Kernel changes needed for lsof to work using only user mode APIs,
without KVM access (https://github.com/lsof-org/lsof/pull/184).

Advisory locking:
- Populate the type of lock into struct flock l_type in the lock
  iteration functions.
- Calculate whether the process has shared or exclusive locks on
  each vnode, and return this in struct xfile.
- Add a fcntl flag O_LOCKWHOLEFILE for whether the whole file is
  locked, and returns this from the kern.file sysctl too.

struct kinfo_file:
- Add kf_pipe_buffer_[in/out/size] fields to kf_pipe, and populate them.
- Add a kf_kqueue struct to the kf_un union, to allow querying
  kqueue state, and populate it.
- The kf_sock_rcv_sb_state and kf_sock_snd_sb_state fields are
  never written to in the kernel, which is a pre-existing bug on
  its own. Populate them properly.

struct xvnode:
- Bring back the kern.vnode sysctl.
- Change struct xvnode field sizes and offsets to be the same on
  32 and 64 bit architectures, so 32 bit calls can work on 64
  bit kernels.
- Add an xv_lockf field for the advisory lock pointer.

Bump __FreeBSD_version to 1400052.
Test Plan

Build lsof from https://github.com/lsof-org/lsof/pull/184 and verify output before and after. Especially compare locked files, kqueue info, and pipes.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

damjan.jov_gmail.com edited the summary of this revision. (Show Details)

Added kf_pipe_buffer_[in/out/size] fields to kf_pipe.

Please split the patch to per-object. For instance, I think that pipe/socket/kevent changes should be relatively straighforward. While lockf/xvnode do not look so.

Also please generate diffs with full context, like git diff -U99999999 ... so that phab review tool is more useful.

sys/kern/vfs_subr.c
4589

As discussed elsewhere, this is not going to work. It requires an allocation of numvnodes * sizeof(struct xvnode) bytes of contiguous KVA, with typical value of numvnodes around several of 1e6 on modern systems.

This interface needs to be completely reworked. That said, can you please explain what useful information does the xvnode provides for lsof? I find it strange, most of the structure is packed with kernel addresses, and if the claim is that it allows to stop reading /dev/kmem, what is the point?

sys/sys/fcntl.h
149

This is not going to work. The bits in flags for open(2) are scarce, and we do not want to spend a bit just for reporting.

Typical practice is to assign some meaning for the open-only flag. For instance, we reuse O_TTY_INIT for internal kernel reporting of the open failure. You may use the same flag as O_LOCKWHOLEFILE (whatever it means: some man page should be updated perhaps).

sys/sys/vnode.h
201

For kernel addresses, we recently got kvaddr_t type. Please use it.

Kernel changes needed for lsof to work using only user mode APIs,

Can you confirm that with the set of changes here and without use of kvm, lsof's full functionality is available?

Kernel changes needed for lsof to work using only user mode APIs,

Can you confirm that with the set of changes here and without use of kvm, lsof's full functionality is available?

lsof wants to print the f_ops pointer from struct file, not sure if it's worth adding to struct xfile. It's only used to log debug info for unknown file types.
It would be nice if the kernel provided the file's link count (st_nlink) somehow in struct file or kinfo_file or xvnode, so we don't have to stat() every path.
It also wants to read nullfs nodes from the kernel to get their null_lowervp, but when we have the kinfo_file path, it should be possible to calculate the original file path from the device and mountpoint anyway.

Once this is finished, on 14-CURRENT, all of lsof's functionality should be available without kvm access. You'll need to run lsof as root to see files opened by processes running as root, but you won't need kvm access. The kvm access will remain under #if __FreeBSD_version for older versions of FreeBSD, but even they only use it optionally, for further details on specific file types, the core of lsof already doesn't use kvm. For example you can already see kqueues, but to see their count and state, you either need these patches on 14-CURRENT, or kvm access.

sys/kern/vfs_subr.c
4589

That said, can you please explain what useful information does the xvnode provides for lsof? I find it strange, most of the structure is packed with kernel addresses, and if the claim is that it allows to stop reading /dev/kmem, what is the point?

lsof needs to know the filesystem properties for an open file. When a path is set in struct kinfo_file, we can statfs() it to get those. But when a path isn't set, the v_mount / xv_mount kernel pointer in struct vnode/xvnode can be used as an opaque lookup key to a dictionary mapping to cached data previously retrieved from statfs() for other files on that filesystem which had paths in their kinfo_file (and share the v_mount / xv_mount value with this file). Still, that is a rather unusual way of doing it.

sys/sys/fcntl.h
149

Would a new sysctl which lists struct lockf data be preferable?

Kernel changes needed for lsof to work using only user mode APIs,

Can you confirm that with the set of changes here and without use of kvm, lsof's full functionality is available?

lsof wants to print the f_ops pointer from struct file, not sure if it's worth adding to struct xfile. It's only used to log debug info for unknown file types.

I strongly recommend not to dig into f_ops. For instance, some file systems overload vnops with something different, while still being normal filesystem. An example is tmpfs.

It would be nice if the kernel provided the file's link count (st_nlink) somehow in struct file or kinfo_file or xvnode, so we don't have to stat() every path.

Which link count do you want, effective or current on disk (difference is in still non-satisfied metadata writes and open references)?
This is easy anyway.

It also wants to read nullfs nodes from the kernel to get their null_lowervp, but when we have the kinfo_file path, it should be possible to calculate the original file path from the device and mountpoint anyway.

We have enough spares to arrange for this as well. We can return fileid for lower vnode both for nullfs and unionfs.

Once this is finished, on 14-CURRENT, all of lsof's functionality should be available without kvm access. You'll need to run lsof as root to see files opened by processes running as root, but you won't need kvm access. The kvm access will remain under #if __FreeBSD_version for older versions of FreeBSD, but even they only use it optionally, for further details on specific file types, the core of lsof already doesn't use kvm. For example you can already see kqueues, but to see their count and state, you either need these patches on 14-CURRENT, or kvm access.

As I said, it would be very useful if you start splitting this patch into per-object pieces, starting with parts like pipes and sockets that probably are almost ready to go in. Also, the wish list, like the items above, would be nice to have to not drop anything you need for you work.

sys/kern/vfs_subr.c
4589

But struct kinfo_file.kf_un.kf_file always contains kf_file_fsid/kf_file_rdev. Isn't this data provide the same cross-matching against mount point, and is the better way to track vnode ownership by mount points?

sys/sys/fcntl.h
149

I think that the most likely answer is yes. I find it somewhat strange that you do not want to see the exact ranges, owners, and types of the advisory locks established over vnodes.

Thank you for your feedback. New versions of this patch are at:
https://reviews.freebsd.org/D34184
https://reviews.freebsd.org/D34323

I am closing this one.

FYI D34756
I hope to work on your other patches then.