Page MenuHomeFreeBSD

NFS: Request use of TCP_USE_DDP for in-kernel TCP sockets
ClosedPublic

Authored by jhb on Feb 21 2024, 12:56 AM.
Tags
None
Referenced Files
F107006547: D44002.diff
Wed, Jan 8, 9:05 PM
Unknown Object (File)
Thu, Jan 2, 3:24 AM
Unknown Object (File)
Dec 6 2024, 12:32 PM
Unknown Object (File)
Dec 6 2024, 6:51 AM
Unknown Object (File)
Dec 6 2024, 6:51 AM
Unknown Object (File)
Dec 6 2024, 6:51 AM
Unknown Object (File)
Dec 6 2024, 6:22 AM
Unknown Object (File)
Dec 2 2024, 7:54 AM
Subscribers

Details

Summary

Since this is an optimization, ignore failures to enable the option.

For the server side, defer enabling DDP until the first non-NULLPROC
RPC is received. This allows TLS handling (which uses NULLPROC RPCs)
to enable TLS offload first.

Sponsored by: Chelsio Communications

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 56139
Build 53027: arc lint + arc unit

Event Timeline

jhb requested review of this revision.Feb 21 2024, 12:56 AM

As noted in the second inline comment, the socket
opt would normally be set in clnt_vc.c and clnt_rc.c
would just do a CLSET on the client to do that.

It probably would be confusing if this layering is
not maintained, I think?

sys/fs/nfsserver/nfs_nfsdkrpc.c
558

Trivial, but would it be less confusing to have
one "if" instead of two?
(I'll leave it up to you.)

sys/rpc/clnt_rc.c
223

Normally the clnt_rc.c (reconnect layer) would
do a CLSET_TCP_DDP and the so_setsockopt()
would be done for CLSET_TCP_DDP in clnt_vc.c
and ignored in clnt_dg.c.

I've never been much of an object oriented guy,
but the Sun folk loved these layers.

Oh, and to make it work on the client side, the
CLSET_TCP_DDP needs to be set in newnfs_connect().
(In the nmp != NULL section.)

One more comment. If this socket option must/should be
set right away (before any data travels on the TCP
connection), it should be left in clnt_reconnect_coonect(),
but should be moved to before the rpctls_connect() call block.
(Right after clnt_vc_create() at line#202.)

I cannot remember why I put the rpctls_connect() in clnt_rc.c
and not in clnt_vc.c, which kinda violates the layering.
It does a Null RPC that needs to happen as soon as the TCP
connection is created.

One more "bigger picture" comment.
If setting TCP_USE_DDP is something
that the NFS client will always want to
try and do, then I would scrap the client
side changes in this patch and put the
so_setsockopt() right in clnt_vc_create().

ie. Attempting it unconditionally instead
of worrying about a CLSET_DDP, which would
be used by the NFS client to conditionally do it.
(As you might have already guessed, I know nothing
about direct data placement.)

I did miss actually setting CLSET_DDP when reworking this patch. The effect of DDP is that the NIC is able to allocate large (e.g. up to 256k) receive buffers that receive a bunch of in-order data for a TCP socket. These large buffers are enqueued as large M_EXT mbufs in the socket buffer, so e.g. soreceive() will get a single mbuf containing a big chunk of data. It is an optimization for sockets that are serviced in the kernel like NFS. It is fine to just enable it always. My original prototype for testing had just enabled the option always in clnt_rc.c. To make it cleaner for upstream I had reworked it to add CLSET_DDP modeled on CLSET_TLS since I was worried there might be some uses of clnt_vc.c that aren't in-kernel NFS sockets, but I guess any socket managed by clnt_vc.c is an in-kernel socket where data is generally consumed in the kernel and not in userspace, so it should be ok to just enable it always. DDP can be enabled anytime (doesn't matter if there is already some data received "normally" first), but TLS does need to be enabled first so that TLS offload in the NIC can have priority over this feature.

Simplify based on review feedback

Hmm. Sounds like it should only be enabled if TLS is
not enabled?
For the TLS case, the receive code in clnt_vc.c does
upcalls to userland for non-application data records
and these are handled by the OpenSSL library using
a SSL_read() call.

This patch would enable it before TLS unconditionally,
I think?

If you look at clnt_rc.c, you'll find the rpctls_connect()
call right after the clnt_vc_create() call. That is what enables
the ktls via an upcall to the daemon that does a SSL_connect().
It sounds like that is where the socket option should be set?

Hmm. Sounds like it should only be enabled if TLS is
not enabled?
For the TLS case, the receive code in clnt_vc.c does
upcalls to userland for non-application data records
and these are handled by the OpenSSL library using
a SSL_read() call.

No, it's fine to enable if TLS is enabled, but we do want to enable TLS first to give TLS offload priority over DDP.
You can still use DDP for data copied out to userspace, it's just not as useful an optimization in that case.

This patch would enable it before TLS unconditionally,
I think?

Yes.

If you look at clnt_rc.c, you'll find the rpctls_connect()
call right after the clnt_vc_create() call. That is what enables
the ktls via an upcall to the daemon that does a SSL_connect().
It sounds like that is where the socket option should be set?

Yes, for the client I can move it back to where it was before, but leave off the CLSET_DDP complexity.

However, for the server side this seems more complicated. Ideally for the server side I'd like to defer enabling DDP until after the AUTH_TLS RPC to give TLS offload a chance to happen first. I'd be fine with a "good enough" solution if it's typical for the AUTH_TLS RPC to be the first RPC, so just enable DDP once a non-AUTH RPC has been received or something like that. Would an approach like that work?

In D44002#1009718, @jhb wrote:

Hmm. Sounds like it should only be enabled if TLS is
not enabled?
For the TLS case, the receive code in clnt_vc.c does
upcalls to userland for non-application data records
and these are handled by the OpenSSL library using
a SSL_read() call.

No, it's fine to enable if TLS is enabled, but we do want to enable TLS first to give TLS offload priority over DDP.
You can still use DDP for data copied out to userspace, it's just not as useful an optimization in that case.

This patch would enable it before TLS unconditionally,
I think?

Yes.

If you look at clnt_rc.c, you'll find the rpctls_connect()
call right after the clnt_vc_create() call. That is what enables
the ktls via an upcall to the daemon that does a SSL_connect().
It sounds like that is where the socket option should be set?

Yes, for the client I can move it back to where it was before, but leave off the CLSET_DDP complexity.

Yes, that sounds fine to me.

However, for the server side this seems more complicated. Ideally for the server side I'd like to defer enabling DDP until after the AUTH_TLS RPC to give TLS offload a chance to happen first. I'd be fine with a "good enough" solution if it's typical for the AUTH_TLS RPC to be the first RPC, so just enable DDP once a non-AUTH RPC has been received or something like that. Would an approach like that work?

I believe that the only RPCs allowed before the TLS probe are Null RPCs.

If you stick something like:
if (r->rq_proc != NULLPROC && xprt->xp_socket->so_type == SOCK_STREAM &&

atomic_cmpset_int(&xprt->xp_doneddp, 0, 1) != 0)
- do it

Just before the _authenticate() call, I think that should be fine.
(I think this is less confusing than putting it in _authenticate()? but I'll leave
it up to you.)
(xp_doneddp or whatever you call it will need to be added to SVCXPRT.
xprt is always malloc'd with M_ZERO.)

sys/rpc/svc.c
998 ↗(On Diff #135975)

I tried checking xp_type to see if the socket was SOCK_STREAM instead of checking pr_protocol, but xp_type isn't initialized (just set to zero) for NFS sockets. If we think that sockets used with RPC can only be IP sockets (so only UDP or TCP), then I could probably remove the pr_protocol check as UDP sockets would just fail the TCP socket option.

This version looks fine to me. I'll let you decide whether or
not to leave the TCP check in.

sys/rpc/svc.c
998 ↗(On Diff #135975)

The only conceivable other "protocol" would be
RDMA and I have no idea how that might fit in,
if I ever do it. (I have avoided trying to do it, due
to lack of access to hardware, etc.)

So, leave it in or take it out, whichever you think makes
sense.

This revision is now accepted and ready to land.Mar 20 2024, 12:28 AM