Page MenuHomeFreeBSD

tcp: improve ref count handling when processing SYN
ClosedPublic

Authored by tuexen on Sep 26 2024, 6:19 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 2, 9:44 PM
Unknown Object (File)
Thu, Oct 31, 5:53 PM
Unknown Object (File)
Tue, Oct 29, 3:00 PM
Unknown Object (File)
Sat, Oct 26, 1:40 AM
Unknown Object (File)
Wed, Oct 23, 5:16 PM
Unknown Object (File)
Fri, Oct 18, 11:13 PM
Unknown Object (File)
Oct 16 2024, 10:37 AM
Unknown Object (File)
Oct 16 2024, 2:27 AM

Details

Summary

Don't leak a reference count for so->so_cred when processing the incoming SYN segment with an on stack syncache entry.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This change looks ok to me, but see the inline question.

sys/netinet/tcp_syncache.c
1395

Would it be a good idea to defer the crhold() instead, until we know we're going to keep the reference? In general, I'd expect the crhold() call here to take the slow path, so avoiding extra crhold() calls might be profitable.

Only get a reference counter to so->so_cred when it is actually needed as suggested by Mark.

tuexen added inline comments.
sys/netinet/tcp_syncache.c
1395

Sure. That is also possible. I changed the patch.

This looks right to me, but I'm not an expert on this code.

Probably the MAC label allocation can also be deferred this way?

This looks right to me, but I'm not an expert on this code.

Probably the MAC label allocation can also be deferred this way?

The story is different for the MAC label, since the MAC label allocation can result in an error and we need a MAC label to send a response. Of course, you can delay the allocation of the MAC label, but then you have to undo things you allocated before. Right now the sequence is MAC label first, ipopts next, and syncache entry last... If you prefer, we can change that. But my goal was to fix the leaks...

This looks right to me, but I'm not an expert on this code.

Probably the MAC label allocation can also be deferred this way?

The story is different for the MAC label, since the MAC label allocation can result in an error and we need a MAC label to send a response. Of course, you can delay the allocation of the MAC label, but then you have to undo things you allocated before. Right now the sequence is MAC label first, ipopts next, and syncache entry last... If you prefer, we can change that. But my goal was to fix the leaks...

Ok, fair enough. It was just a question, I already clicked the magic "accept" button.

This looks right to me, but I'm not an expert on this code.

Probably the MAC label allocation can also be deferred this way?

The story is different for the MAC label, since the MAC label allocation can result in an error and we need a MAC label to send a response. Of course, you can delay the allocation of the MAC label, but then you have to undo things you allocated before. Right now the sequence is MAC label first, ipopts next, and syncache entry last... If you prefer, we can change that. But my goal was to fix the leaks...

Ok, fair enough. It was just a question, I already clicked the magic "accept" button.

Always good to have suggestions for improvements. Really appreciated! I'll consider delaying the allocation for the ipopts, since that also has no error handling. But I want to get in this fix first.

Does the summary section need to be updated? I didn't find the mentioned leaking part in code. Or am I missing something?

sys/netinet/tcp_syncache.c
1628

If my understanding is correct, a comment would be helpful to explain why we assign the so_cred on both conditions:

  • when the sc is allocated, and
  • when the see_other sysctl is off.
1764

Here cred is always NULL, so the reference count will never be de-referenced.

If my understanding is correct, the sc->sc_cred reference count will be de-referenced by syncache_free(). So no leaking? Just dead code? Which means the summary section needs to be updated?

In D46793#1067415, @cc wrote:

Does the summary section need to be updated? I didn't find the mentioned leaking part in code. Or am I missing something?

Further read makes me thinking that once an existing syncache is found (between line#1515 and #1570), the cred does not need to be assigned. So it is possible cred is not NULL at the end of this function around the donenoprobe: label.

So I have no problem with this change.

This revision is now accepted and ready to land.Sep 27 2024, 8:50 PM
tuexen added inline comments.
sys/netinet/tcp_syncache.c
1628

The sc_cred field is only used in syncache_pcblist() for building a list of TCP endpoints in the TCPS_SYN_RECEIVED state. Therefore, we only need this if the sc is going to be queued and the see_other sysctl is off.

1764

The bug is that cred was always taking a reference count (assuming the V_tcp_syncache.see_other is false), then it was assigned to sc->sc_cred. If sc was on the stack (sc == &scs), it was never decremented, since it is only decremented in syncache_free(), which is never called for on an stack allocated sc. Since sc->sc_cred is not needed in that case, just don't take a reference count in that case.
I'll provide a description in the commit message.

tuexen added inline comments.
sys/netinet/tcp_syncache.c
1628

I'll add a comment.