rpcsec_tls: cleanup the rpctls_syscall()
ClosedPublic
Actions

Authored by glebius on Jan 24 2025, 7:35 AM.

Details

Reviewers

rmacklem
brooks

Group Reviewers

network

Commits

rG765ad4f03937: rpcsec_tls: cleanup the rpctls_syscall()

Summary

With all the recent changes we don't need extra argument that specifies
what exactly the syscalls does, neither we need a copyout-able pointer,
just a pointer sized integer.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 61971
Build 58855: arc lint + arc unit

Event Timeline

glebius created this revision.Jan 24 2025, 7:35 AM

Herald added a reviewer: brooks. · View Herald TranscriptJan 24 2025, 7:35 AM

Herald added subscribers: olce, imp. · View Herald Transcript

glebius requested review of this revision.Jan 24 2025, 7:35 AM

Harbormaster completed remote builds in B61971: Diff 149863.Jan 24 2025, 7:35 AM

glebius added a parent revision: D48570: rpc.tlsservd: provide parallelism with help of pthread(3).Jan 24 2025, 7:35 AM

markj added a subscriber: markj.Jan 24 2025, 4:35 PM

markj added inline comments.

sys/kern/syscalls.master
3252	What happens if one runs an old rpc.tlsservd with a new kernel?

glebius marked an inline comment as done.Jan 24 2025, 4:59 PM

glebius added inline comments.

sys/kern/syscalls.master
3252	The syscall will return EINVAL. I don't know if the syscall argument parsing layer will do that or it will actually enter sys_rpctls_syscall(). If the latter happens, the small value that previously meant a command would be used as socookie and looked up in the database of stored sockets, and of course that lookup would fail.

I'm having a hard time following the code in this stack of reviews. Do you have a public git branch somewhere we can look at it all in one place.

sys/kern/syscalls.master
3252	Argument "parsing" consists of casting an array of syscallarg_t's to the uap structure. There's no checking of any form including argument count. If socookie is used in e.g., a hash table looking up then the ops values won't ever resolve so that would be ok-ish. That being said it sure looks like socookie is a kernel pointer passed by userspace which is unsafe (I'm also seeing it being stored in an int64_t which should not be done).

In D48649#1109950, @brooks wrote:

I'm having a hard time following the code in this stack of reviews. Do you have a public git branch somewhere we can look at it all in one place.

https://github.com/glebius/FreeBSD/tree/gss-netlink

glebius added inline comments.Jan 24 2025, 7:20 PM

sys/kern/syscalls.master
3252	That being said it sure looks like socookie is a kernel pointer passed by userspace which is unsafe It is safe cause kernel doesn't try to dereference it. It uses it as unique key lookup value. All unsafeness here is only leaking where socket located in kernel memory, but this is already reported by e.g. netstat(1) and even without a privilege check. (I'm also seeing it being stored in an int64_t which should not be done). Can you please advice any better approach? On the RPC side it must be a fixed size variable, hence it is uint64_t. But on the syscall side I made it uintptr_t that is supposed to fit into uint64_t on any modern machine.

brooks added inline comments.Jan 24 2025, 8:00 PM

sys/kern/syscalls.master
3252	Ah, if it's just an integer key then uint64_t (or maybe kvaddr_t to make it clear an address is wanted) seems fine and it would be best to use consistently rather than uintptr_t.

glebius added inline comments.Jan 24 2025, 8:09 PM

sys/kern/syscalls.master
3252	Ah, if it's just an integer key then uint64_t (or maybe kvaddr_t to make it clear an address is wanted) seems fine and it would be best to use consistently rather than uintptr_t. kvaddr_t doesn't fit here, IMO, cause although we derive the number from a pointer, but we never plan to use it as a pointer or an address after that. For the fixed size integer, my concern is how the code in rpctls_impl.c is going to work on 32-bit platforms? arg.socookie = (uintptr_t)socookie; So if I change this uintptr_t to uint64_t, I guess on 32-bits it is going to be just zero-extended, and everything should be fine. Is that correct?

brooks added inline comments.Jan 24 2025, 8:44 PM

sys/kern/syscalls.master
3252	Ah, if it's just an integer key then uint64_t (or maybe kvaddr_t to make it clear an address is wanted) seems fine and it would be best to use consistently rather than uintptr_t. kvaddr_t doesn't fit here, IMO, cause although we derive the number from a pointer, but we never plan to use it as a pointer or an address after that. kvaddr_t is for an address which is not a pointer. (Under the hood it's __uint64_t). I don't really mind either way, I just wanted to make sure it wasn't following the u64 pattern used in place of proper types in linux. For the fixed size integer, my concern is how the code in rpctls_impl.c is going to work on 32-bit platforms? arg.socookie = (uintptr_t)socookie; So if I change this uintptr_t to uint64_t, I guess on 32-bits it is going to be just zero-extended, and everything should be fine. Is that correct? It will be zero extended. When it's a syscall argument the high bits are explicitly zeroed when the register array is populated. With a cast the integer promotion rules apply also zeroing the high bits.

When I originally did this, I thought of using the socket pointer as a "unique"
indicator for it in userpsace (for the TCP socket the client/server is using).
I chose not to do so, because I was concerned that the kernel code might...

close the socket
create a new socket, that happens to have the same kernel address.

As such, I used a 64bit reference# that was incremented by 1 for each use,
assuming it would never wrap around, given it is 64bits.

If the socket pointer is guaranteed to not get closed during the upcall,
using a 64bit variant of the socket ptr should be safe, I think?

sys/kern/syscalls.master
3252	Just fyi, KTLS (and, therefore this TLS RPC stuff) only works for 64bit arches, so I doubt it matters what a 32bit arch does with it.

Was changing the syscall and upcall arguments really necessary?
I thought it was the upcall RPC transport that needed to change.

Maybe the initial commit cycle could simply change the RPC upcall
transport to netlink and leave the rest as is. It would be a smaller change
and easier to isolate issues if only the RPC transport changes, I think?

In D48649#1110078, @rmacklem wrote:

Was changing the syscall and upcall arguments really necessary?
I thought it was the upcall RPC transport that needed to change.

Maybe the initial commit cycle could simply change the RPC upcall
transport to netlink and leave the rest as is. It would be a smaller change
and easier to isolate issues if only the RPC transport changes, I think?

No, the change of the syscall prototype isn't necessary, that's why it is at the top of the stack of reviews. Basically every step in the stack is a working revision, except those that say "this is user/kernel counterpart of previous commit". But after finishing the RPC-over-Netlink part I realized that there is so much space for refactoring created, so started doing that.

In D48649#1110072, @rmacklem wrote:

When I originally did this, I thought of using the socket pointer as a "unique"
indicator for it in userpsace (for the TCP socket the client/server is using).
I chose not to do so, because I was concerned that the kernel code might...

close the socket

create a new socket, that happens to have the same kernel address.

I was considering this situation. First important thing: it just can't happen at normal runtime. But we should be prepared for that and don't panic or leak resources. Let's look at it once more. Here is one legitimate scenario for this situation:

kernel allocates socket, calls RPC
daemon receives RPC call, but gets stuck (e.g. received SIGSTOP) or dies (SIGKILL or SIGSEGV or whatever)
kernel RPC call times out. In that case kernel just does soshutdown(). Here is the place:

} else if (stat == RPC_TIMEDOUT) {
        /*
         * Do a shutdown on the socket, since the daemon is probably
         * stuck in SSL_connect() trying to read the socket.   
         * Do not soclose() the socket, since the daemon will close()
         * the socket after SSL_connect() returns an error.
         */
        soshutdown(so, SHUT_RD);
}

So we basically leak the socket thus it can't be returned for a second time :) Good that we avoid aliasing, but bad that we are leaking socket. The above quoted code is before my changes, but my changes preserved this logic. So again, this situation can't happen because we never close a socket that is right now in the tree.

But I can imagine other scenarios, which require a malicious daemon (which is priveleged).

malicious daemon can grab the socket via syscall, but do not generate RPC reply. This case is covered to avoid panic - see that we first RB_FIND(), and only then RB_REMOVE().
or the opposite: daemon skips the syscall, but reports that RPC successful. This will result in intentional panic now - MPASS((RB_FIND(upsock_t, &upcall_sockets, &ups) == NULL)). This just can't happen due to bug, it requires intent on daemon side.

As you see, we are slightly inconsistent wrt to malicious daemon. I'm inclined not to invest too much into handling malicious daemon scenario. I'm inclined into looking at the kernel part and daemon as two parts of the same mechanism, that of course should be protected against various failure scenarios, but not malicious intent. What do you think?

I realized that failure handling for both the rpc.tlsclntd and rpc.tlsservd can be merged into one function. Created a separate review for that: https://reviews.freebsd.org/D48677 This new function rpctls_rpc_failed() should be one place to handle scenarios we are discussing, if we one to improve their handling.

Use uint64_t instead of uintptr_t for the socookie argument.

Harbormaster completed remote builds in B62023: Diff 149953.Jan 25 2025, 5:32 AM

glebius marked 3 inline comments as done.Jan 25 2025, 5:32 AM

JFYI: I got a panic on my desktop today, when rpc.tlsclntd was writing to fd that was pointing to a totally garbage socket structure. That happened during stress testing on mount/unmount, where desktop was casting stress against test virtual machine. But desktop was running gss-netlink branch, too.

So I am aware of one race here and investigating it. So letting you be aware, too.

P.S. Desktop runs pretty old version of my branch, but there were no large functional changes since.

Fix race when we access other thread stack data after dropping the lock.

Harbormaster completed remote builds in B62044: Diff 149994.Jan 26 2025, 9:07 PM

In D48649#1110487, @glebius wrote:

JFYI: I got a panic on my desktop today, when rpc.tlsclntd was writing to fd that was pointing to a totally garbage socket structure

panic fixed in the latest update. My github branch is also pushed to latest version.

rmacklem accepted this revision.Jan 29 2025, 12:36 AM

This revision is now accepted and ready to land.Jan 29 2025, 12:36 AM

Closed by commit rG765ad4f03937: rpcsec_tls: cleanup the rpctls_syscall() (authored by glebius). · Explain WhyFeb 1 2025, 9:04 AM

This revision was automatically updated to reflect the committed changes.

glebius added a commit: rG765ad4f03937: rpcsec_tls: cleanup the rpctls_syscall().