Page MenuHomeFreeBSD

sockets: limit use of watermarks to SOCK_STREAM sockets
AcceptedPublic

Authored by glebius on Jun 2 2022, 6:39 PM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 25 2024, 3:32 AM
Unknown Object (File)
Nov 21 2024, 8:20 AM
Unknown Object (File)
Nov 21 2024, 8:09 AM
Unknown Object (File)
Nov 18 2024, 1:58 AM
Unknown Object (File)
Nov 17 2024, 7:55 PM
Unknown Object (File)
Nov 17 2024, 7:15 PM
Unknown Object (File)
Nov 17 2024, 7:08 PM
Unknown Object (File)
Nov 9 2024, 2:11 PM

Details

Summary

Make our implementation return error for such setsockopt(2) and
document that. This unties our hands with coding protocol
independent socket buffers.

The specification [1] doesn't explicitly limit use of these options
to SOCK_STREAM. However, the wording suggests that context belongs
to a stream socket. For example:

"Receive calls may still return less than the low water mark if an

error occurs, a signal is caught, or the type of data next in the
receive queue is different from that returned (for example,
out-of-band data)."

For a datagram/packet socket, this can't be applied, as a read on
such socket must return full datagram.

Until this change current implementation favored watermarks for
datagram sockets, resulting in quite odd behavior. You will get
notification for a socket only if overall length of datagrams is
above watermark. Once you read enough datagrams to get below
watermark, notifications would stop. In practice this yields in an
indefinite delay in receiving data.

Modern Linux implementation ignores watermarks for datagram sockets,
however doesn't return error for setsockopt().

It is very unlikely there exist an application that would use
watermarks for non-stream sockets.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 45820
Build 42708: arc lint + arc unit

Event Timeline

Test code that would indefinitely select(2) on FreeBSD, would quickly pass through on Linux and will return with EINVAL after this change.

Just a data pointer for SCTP:

  • On the receive side, messages can be partially delivered on SOCK_SEQPACKET sockets. Without this, you would not be able to receive messages which are larger than the receive buffer size and the protocol would deadlock. This support of this is required.
  • On the sending side the explicit message end of record mode can be used. The application will explicitly mark the end of the message. This allows sending of messages larger than the send buffer side. If only atomic send calls are allowed, the send buffer size limits the user message size. Support this mode is optional, but it is supported by FreeBSD.
This revision is now accepted and ready to land.Jun 2 2022, 8:43 PM

Just a data pointer for SCTP:

  • On the receive side, messages can be partially delivered on SOCK_SEQPACKET sockets. Without this, you would not be able to receive messages which are larger than the receive buffer size and the protocol would deadlock. This support of this is required.
  • On the sending side the explicit message end of record mode can be used. The application will explicitly mark the end of the message. This allows sending of messages larger than the send buffer side. If only atomic send calls are allowed, the send buffer size limits the user message size. Support this mode is optional, but it is supported by FreeBSD.

Can you imagine a practical use of watermarks for SCTP?

pauamma_gundo.com added inline comments.
lib/libc/sys/getsockopt.2
592–595
This revision now requires review to proceed.Jun 3 2022, 3:03 AM

Just a data pointer for SCTP:

  • On the receive side, messages can be partially delivered on SOCK_SEQPACKET sockets. Without this, you would not be able to receive messages which are larger than the receive buffer size and the protocol would deadlock. This support of this is required.
  • On the sending side the explicit message end of record mode can be used. The application will explicitly mark the end of the message. This allows sending of messages larger than the send buffer side. If only atomic send calls are allowed, the send buffer size limits the user message size. Support this mode is optional, but it is supported by FreeBSD.

Can you imagine a practical use of watermarks for SCTP?

Not for the receiving side. But for the sending side: Assume you want to send a message of size s. Then you might want to wait until you can write a message of size s without blocking. Doesn't this make sense for all sockets with atomic send calls?
For SCTP this applies also, but a perfect solution (not available yet) would be to be able to sleep until you can write a message of size s on a particular stream. But this is not possible with select()/poll(). Maybe one can do this by using kqueue(). Looking into this is on my ToDo list. But this might no use the lowwatermark socket option, but some kqueue() specific data.

So we are up to the above use case. Doesn't that also apply to the sending side of SOCK_DGRAM? How would you efficiently send a message of size s on a SOCK_DGRAM socket?

This revision is now accepted and ready to land.Jun 3 2022, 9:44 AM

So we are up to the above use case. Doesn't that also apply to the sending side of SOCK_DGRAM? How would you efficiently send a message of size s on a SOCK_DGRAM socket?

With the current implementation send buffer lowat is set to net.local.dgram.maxdgram when a socket is initialized. You may modify sb_lowat with setsockopt(SO_SNDLOWAT). However, since the current implementation fully bypasses send buffer, it is empty for the lifetime of a socket, check is always true, so this watermark doesn't guarantee you anything. If the receiving buffer is full, you'll get ENOBUFS.

In my new implementation I ignore the sb_lowat, but the select(2) call would compare send buffer space straight against net.local.dgram.maxdgram. For connected sockets, the send buffer is used. This makes return from select(2) useful. It would sleep if send buffer has less space than net.local.dgram.maxdgram, and it would return once space is available. And that would guarantee that send() will succeed. Again, this is true only for connected sockets. You can imagine a very synthetic use case of setting lowat above net.local.dgram.maxdgram, so that you want select(2) to wakeup only when we can send several datagrams. I don't think such practice exist at all. If I would write such a damn sophisticated application, I'd rather use kevent(2) with EVFILT_WRITE, as you suggested above.