Differential D48559

rpcsec_tls/client: use netlink RPC client to talk to rpc.tlsclntd(8)
ClosedPublic
Actions

Authored by glebius on Jan 20 2025, 9:05 PM.

Details

Reviewers

rmacklem

Group Reviewers

network

Commits

rGa3a6dc24f34a: rpcsec_tls/client: use netlink RPC client to talk to rpc.tlsclntd(8)

Summary

In addition to using netlink(4) socket instead of unix(4) to pass
rpctlscd_* RPC commands to rpc.tlsclntd(8), the logic of passing file
descriptor is also changed. Since clnt_nl provides us all needed
parallelism and waits on individual RPC xids, we don't need to store
socket in a global variable and serialize all communication to the daemon.
Instead, we will augment rpctlscd_connect arguments with a cookie that is
basically a pointer to socket, that we keep for the daemon. While
sleeping on the request, we will store a database of all sockets
associated with rpctlscd_connect RPCs that we have sent to userland. The
daemon then will send us back the cookie in the
rpctls_syscall(RPCTLS_SYSC_CLSOCKET) argument and we will find and return
the socket for this upcall.

This will be optimized further in a separate commit, that will also touch
clnt_vc.c and other krpc files. This commit is intentionally made minimal,
so that it is easier to understand what changes with netlink(4) transport.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 61832
Build 58716: arc lint + arc unit

Event Timeline

glebius created this revision.Jan 20 2025, 9:05 PM

Herald added a subscriber: imp. · View Herald TranscriptJan 20 2025, 9:05 PM

glebius requested review of this revision.Jan 20 2025, 9:05 PM

Harbormaster completed remote builds in B61832: Diff 149606.Jan 20 2025, 9:05 PM

glebius added a parent revision: D48558: nfs: set vnet(9) context in mountnfs().Jan 20 2025, 9:05 PM

glebius added a child revision: D48560: rpc.tlsclntd: run netlink(4) service and use new API to get sockets.

As I mentioned, I am concerned that the "cookie" might end
up referring to the wrong socket, if the kernel code were to
close the socket (freeing the structure) and then another
socreate() happens to get memory at the same address.
Then the daemon replies to the upcall.

Is there a guarantee against the above happening?
(I thought about using the socket address as a "cookie",
but chose not to, due to the above concern.)

To handle parallelism, there could be a linked list of outstanding
"upcalls in progress". I did not do that, because I knew that the
daemon was a single thread in userland and, as such, I did not
see any reason to do concurrent upcalls to the daemon.
(For rpc.tlsservd there is a command line option that makes it
run multiple daemons, where each one could handle one upcall at
a time.)

rmacklem added inline comments.Jan 27 2025, 11:24 PM

sys/rpc/rpcsec_tls/rpctls_impl.c
337	This might be sufficient to guarantee the socket addr (cookie) does not get replaced by another socket structure, I think?

In D48559#1110900, @rmacklem wrote:

As I mentioned, I am concerned that the "cookie" might end
up referring to the wrong socket, if the kernel code were to
close the socket (freeing the structure) and then another
socreate() happens to get memory at the same address.
Then the daemon replies to the upcall.

I already did a write up on this topic in the other review, but I'm sure the write up was poor, since after re-reading it I wasn't sure anybody save me can understand it. But I still sent it :( So I will try explain better this time:

The "socket handoff database" is a searchable tree, and we search by socket pointer value. When kernel hands a socket to the daemon, the structure holding the socket is allocated on the stack of the sleeping thread. The sleep happens inside clnt_nl_call() and the structure sits in the frame of rpctls_connect() or rpctls_server(). If the daemon takes the socket via syscall, it is removed from the "database", so a second syscall won't find it. If daemon doesn't take the socket and the kernel RPC call times out, then the socket will be removed from the handoff database at the end of rpctls_connect()/rpctls_server(). What happens to the socket in the latter case? The socket is leaked as kernel assumes it now belongs to the daemon. And this behavior predates my changes, I left it as is for now. Here is the place:

} else if (stat == RPC_TIMEDOUT) {
        /*
         * Do a shutdown on the socket, since the daemon is probably
         * stuck in SSL_connect() trying to read the socket.
         * Do not soclose() the socket, since the daemon will close()
         * the socket after SSL_connect() returns an error.
         */
        soshutdown(so, SHUT_RD);
}

As you see socket is leaked. And the comment definitely isn't correct with my changes, and I suspect wasn't correct before.

Your concern is that is there a sequence of events that would lead to cookie reuse/aliasing? Like when an RPC call was sent down to daemon while holding one socket, but when daemon used syscall to grab a socket, it got an different socket that went through the whole lifecycle of sofree() and soalloc() ending in the same pointer.

Right now it is absolutely impossible because of the above described socket leak. If socket was not grabbed while kernel was sleeping in RPC - socket is leaked.

Is aliasing going to be possible if we fix the leak? I think theoretically yes. Here is the sequence:

Kernel allocates socket and sleeps in RPC
Daemon receives the RPC call, but then is literally stopped. E.g. SIGSTOP or swapped out to disk.
Kernel times out the RPC and frees the socket (instead of leaking it).
Kernel now has a different socket with the same pointer and runs RPC call
Daemon is SIGCONT-ed or swapped back into memory and does the syscall. It gets socket associated with the RPC call that sleeps now, but it would reply success with xid of the old RPC. So the second RPC call would also timeout and would be tried again. Looks like the situation should recover.

I see probability of such sequence extremely low and it seems to be a recoverable problem. And again, only after the leak is fixed.

But if you are really concerned, we can mix in something into the cookie. That can be a 64-bit random value if you'd like.

To handle parallelism, there could be a linked list of outstanding
"upcalls in progress". I did not do that, because I knew that the
daemon was a single thread in userland and, as such, I did not
see any reason to do concurrent upcalls to the daemon.
(For rpc.tlsservd there is a command line option that makes it
run multiple daemons, where each one could handle one upcall at
a time.)

Now rpc.tlsservd can run as many handshakes as needed in separate threads :) And it doesn't need to prefork for that, neither any serialization is required in the kernel to serve those threads. :)

sys/rpc/rpcsec_tls/rpctls_impl.c
337	This might be sufficient to guarantee the socket addr (cookie) does not get replaced by another socket structure, I think? Of course! For all successful RPCs cookie aliasing just can't happen, cause socket is now allocated and there can't appear a new socket with same address. For unsuccessful ones I'm writing down a longer write up below.