With all the recent changes we don't need extra argument that specifies
what exactly the syscalls does, neither we need a copyout-able pointer,
just a pointer sized integer.
Details
- Reviewers
rmacklem brooks - Group Reviewers
network - Commits
- rG765ad4f03937: rpcsec_tls: cleanup the rpctls_syscall()
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 61971 Build 58855: arc lint + arc unit
Event Timeline
sys/kern/syscalls.master | ||
---|---|---|
3252 | What happens if one runs an old rpc.tlsservd with a new kernel? |
sys/kern/syscalls.master | ||
---|---|---|
3252 | The syscall will return EINVAL. I don't know if the syscall argument parsing layer will do that or it will actually enter sys_rpctls_syscall(). If the latter happens, the small value that previously meant a command would be used as socookie and looked up in the database of stored sockets, and of course that lookup would fail. |
I'm having a hard time following the code in this stack of reviews. Do you have a public git branch somewhere we can look at it all in one place.
sys/kern/syscalls.master | ||
---|---|---|
3252 | Argument "parsing" consists of casting an array of syscallarg_t's to the uap structure. There's no checking of any form including argument count. If socookie is used in e.g., a hash table looking up then the ops values won't ever resolve so that would be ok-ish. That being said it sure looks like socookie is a kernel pointer passed by userspace which is unsafe (I'm also seeing it being stored in an int64_t which should not be done). |
sys/kern/syscalls.master | ||
---|---|---|
3252 |
It is safe cause kernel doesn't try to dereference it. It uses it as unique key lookup value. All unsafeness here is only leaking where socket located in kernel memory, but this is already reported by e.g. netstat(1) and even without a privilege check.
Can you please advice any better approach? On the RPC side it must be a fixed size variable, hence it is uint64_t. But on the syscall side I made it uintptr_t that is supposed to fit into uint64_t on any modern machine. |
sys/kern/syscalls.master | ||
---|---|---|
3252 | Ah, if it's just an integer key then uint64_t (or maybe kvaddr_t to make it clear an address is wanted) seems fine and it would be best to use consistently rather than uintptr_t. |
sys/kern/syscalls.master | ||
---|---|---|
3252 |
kvaddr_t doesn't fit here, IMO, cause although we derive the number from a pointer, but we never plan to use it as a pointer or an address after that. For the fixed size integer, my concern is how the code in rpctls_impl.c is going to work on 32-bit platforms? arg.socookie = (uintptr_t)socookie; So if I change this uintptr_t to uint64_t, I guess on 32-bits it is going to be just zero-extended, and everything should be fine. Is that correct? |
sys/kern/syscalls.master | ||
---|---|---|
3252 |
kvaddr_t is for an address which is not a pointer. (Under the hood it's __uint64_t). I don't really mind either way, I just wanted to make sure it wasn't following the u64 pattern used in place of proper types in linux.
It will be zero extended. When it's a syscall argument the high bits are explicitly zeroed when the register array is populated. With a cast the integer promotion rules apply also zeroing the high bits. |
When I originally did this, I thought of using the socket pointer as a "unique"
indicator for it in userpsace (for the TCP socket the client/server is using).
I chose not to do so, because I was concerned that the kernel code might...
- close the socket
- create a new socket, that happens to have the same kernel address.
As such, I used a 64bit reference# that was incremented by 1 for each use,
assuming it would never wrap around, given it is 64bits.
If the socket pointer is guaranteed to not get closed during the upcall,
using a 64bit variant of the socket ptr should be safe, I think?
sys/kern/syscalls.master | ||
---|---|---|
3252 | Just fyi, KTLS (and, therefore this TLS RPC stuff) only |
Was changing the syscall and upcall arguments really necessary?
I thought it was the upcall RPC transport that needed to change.
Maybe the initial commit cycle could simply change the RPC upcall
transport to netlink and leave the rest as is. It would be a smaller change
and easier to isolate issues if only the RPC transport changes, I think?
No, the change of the syscall prototype isn't necessary, that's why it is at the top of the stack of reviews. Basically every step in the stack is a working revision, except those that say "this is user/kernel counterpart of previous commit". But after finishing the RPC-over-Netlink part I realized that there is so much space for refactoring created, so started doing that.
I was considering this situation. First important thing: it just can't happen at normal runtime. But we should be prepared for that and don't panic or leak resources. Let's look at it once more. Here is one legitimate scenario for this situation:
- kernel allocates socket, calls RPC
- daemon receives RPC call, but gets stuck (e.g. received SIGSTOP) or dies (SIGKILL or SIGSEGV or whatever)
- kernel RPC call times out. In that case kernel just does soshutdown(). Here is the place:
} else if (stat == RPC_TIMEDOUT) { /* * Do a shutdown on the socket, since the daemon is probably * stuck in SSL_connect() trying to read the socket. * Do not soclose() the socket, since the daemon will close() * the socket after SSL_connect() returns an error. */ soshutdown(so, SHUT_RD); }
So we basically leak the socket thus it can't be returned for a second time :) Good that we avoid aliasing, but bad that we are leaking socket. The above quoted code is before my changes, but my changes preserved this logic. So again, this situation can't happen because we never close a socket that is right now in the tree.
But I can imagine other scenarios, which require a malicious daemon (which is priveleged).
- malicious daemon can grab the socket via syscall, but do not generate RPC reply. This case is covered to avoid panic - see that we first RB_FIND(), and only then RB_REMOVE().
- or the opposite: daemon skips the syscall, but reports that RPC successful. This will result in intentional panic now - MPASS((RB_FIND(upsock_t, &upcall_sockets, &ups) == NULL)). This just can't happen due to bug, it requires intent on daemon side.
As you see, we are slightly inconsistent wrt to malicious daemon. I'm inclined not to invest too much into handling malicious daemon scenario. I'm inclined into looking at the kernel part and daemon as two parts of the same mechanism, that of course should be protected against various failure scenarios, but not malicious intent. What do you think?
I realized that failure handling for both the rpc.tlsclntd and rpc.tlsservd can be merged into one function. Created a separate review for that: https://reviews.freebsd.org/D48677 This new function rpctls_rpc_failed() should be one place to handle scenarios we are discussing, if we one to improve their handling.
JFYI: I got a panic on my desktop today, when rpc.tlsclntd was writing to fd that was pointing to a totally garbage socket structure. That happened during stress testing on mount/unmount, where desktop was casting stress against test virtual machine. But desktop was running gss-netlink branch, too.
So I am aware of one race here and investigating it. So letting you be aware, too.
P.S. Desktop runs pretty old version of my branch, but there were no large functional changes since.