Currently the pru_send routines for TCP and PF_LOCAL sockets handle
PRUS_NOTREADY. This flag means that there is some external reference to
the mbuf chain and so it must not be freed (or transmitted) by the
protocol. This is contrary to the usual pru_send semantics, in which
the protocol owns the mbuf chain and is responsible for freeing it upon
a failure.
PRUS_NOTREADY is used in two ways:
1. Asynchronous sendfile: if the requested data is not resident in
memory, an asynchronous read is dispatched to the filesystem; the
read will populate the mbuf data buffers with file data. In the
meantime, sendfile calls pru_send(PRUS_NOTREADY) to enqueue the chain
in the socket buffer. Once the I/O is complete, a callback invokes
pru_ready to mark the chain as ready for transmission.
2. KTLS: the output data is supposed to be placed in the send buffer
prior to encryption. The PRUS_NOTREADY flag ensures that the
not-ready data will not be transmitted until encryption is complete,
at which point KTLS calls pru_ready to mark the chain as being owned
by the protocol.
PRUS_NOTREADY is used in two places:
1. sendfile with async I/O and/or KTLS.
2. send(2)/sendto(2)/sendmsg(2) with a KTLS-enabled TCP socket, via
sosend_generic().
There are several problems with the second usage:
- In many cases tcp_usr_send() did not respect PRUS_NOTREADY in error
paths, and would free the mbuf. However, KTLS assumes that pru_send
never frees the mbuf. It is not really a problem with sendfile, as
sendfile controls the parameters to pru_send such that problematic
error paths are not hit. With send(2) et al., though, the caller can
easily trigger errors that result in a use-after-free.
- If pru_send returns an error, the mbuf may or may not have been placed
in the send buffer, and the connection may still be live. Once
encryption is complete, KTLS calls pru_ready assuming that either the
connection has been dropped or that the mbuf is in the send buffer,
and so this assumption is incorrect.
In an attempt to fix these problems, this diff makes the following
changes:
- Centralize the bit of tcp_usr_send() which frees the input mbuf
instead of duplicating it inconsistently for each error case.
- In tcp_usr_send(), if we are doing an implied open, remove the mbuf
from the socket buffer if we fail to create the connection.
- In sosend_generic(), free the mbuf if KTLS is enabled and pru_send
fails. At this point it is safe to do so, and we don't need to begin
encryption in this case.
- Similarly, in vn_sendfile(), if no I/O is required and KTLS is
enabled, free the mbuf upon a failure. There is no point in
encrypting.
- In vn_sendfile(), if pru_send fails after some async I/O is initiated,
pass the error to sendfile_iodone() so that the connection is dropped
once I/O is complete.