Page MenuHomeFreeBSD

libalias: Restructure searching
AbandonedPublic

Authored by donner on May 15 2021, 11:46 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 16, 6:44 PM
Unknown Object (File)
Wed, Jan 15, 8:02 AM
Unknown Object (File)
Tue, Jan 14, 2:20 AM
Unknown Object (File)
Tue, Jan 14, 1:57 AM
Unknown Object (File)
Mon, Jan 13, 2:23 PM
Unknown Object (File)
Sun, Jan 12, 12:44 AM
Unknown Object (File)
Sat, Jan 11, 5:44 PM
Unknown Object (File)
Sat, Jan 11, 2:44 AM

Details

Reviewers
lev
kp
Group Reviewers
network
Summary

In order to switch to a faster data structure for searching, the
current algorithm needs to be understood.

In this step the mix of full qualified entries (real data flow) and
general ones (i.e. port forwardings) is separated. This allows to
simplify the search procedures and factor out the common, relevant
search fields.

Test Plan

As long as there is no mature test suite for the library, only a test
in production could show failures. Tests must contain a huge workload
of active connections and various, unusual protocols.

No commit before test suite is complete.

1_instance:1_singleinit  ->  passed  [0.003s]
1_instance:2_destroynull  ->  expected_failure: Code expects valid pointer.  [0.013s]
1_instance:3_multiinit  ->  passed  [0.005s]
1_instance:4_multiinstance  ->  passed  [0.037s]
2_natout:1_simplemasq  ->  passed  [0.003s]
2_natout:2_unregistered  ->  passed  [0.003s]
2_natout:3_cgn  ->  passed  [0.004s]
2_natout:4_udp  ->  passed  [0.003s]
2_natout:5_sameport  ->  passed  [0.003s]
2_natout:6_cleartable  ->  passed  [0.003s]
2_natout:7_stress  ->  passed  [0.048s]
3_natin:1_portforward  ->  passed  [0.003s]
3_natin:2_portoverlap  ->  passed  [0.003s]
3_natin:3_redirectany  ->  passed  [0.003s]
3_natin:4_redirectaddr  ->  passed  [0.003s]
3_natin:5_lsnat  ->  passed  [0.003s]
3_natin:6_oneshot  ->  passed  [0.003s]

Performance before this patch:

Running perfomance test with parameters:
  Maximum Runtime (max_seconds) = 10
  Amount of valid connections (batch_size) = 2000
  Amount of random, incoming packets (batch_size) = 1000
  Repeat count of a random, incoming packet (attack_size) = 1000
  Amount of open port forwardings (redir_size) = 2000

RND SECOND newNAT RANDOM ATTACK useNAT
  1    0.0   0.59   0.18   0.06   0.10
  2    0.0   0.66   0.16   0.06   0.11
  3    0.0   0.77   0.20   0.06   0.11
  4    0.1   0.85   0.25   0.06   0.10
  5    0.1   0.91   0.23   0.06   0.12
...
135    9.4  54.76  16.33   1.24   0.10
136    9.6  54.52  16.50   0.97   0.10
137    9.7  56.18  16.97   1.11   0.10
138    9.8  58.04  16.60   1.23   0.11
139   10.0

Results
   Rounds  :       138
newNAT ok  :    276410
newNAT fail:         0
useNAT ok  :   2282413 (out)
useNAT fail:         0 (out)
useNAT ok  :  18462687 (in)
useNAT fail:         0 (in)
RANDOM ok  :      1806
RANDOM fail:    136194
ATTACK ok  :         0
ATTACK fail:    138000
             ---------
      Total:  21297510

And now:

Running perfomance test with parameters:
  Maximum Runtime (max_seconds) = 10
  Amount of valid connections (batch_size) = 2000
  Amount of random, incoming packets (batch_size) = 1000
  Repeat count of a random, incoming packet (attack_size) = 1000
  Amount of open port forwardings (redir_size) = 2000

RND SECOND newNAT RANDOM ATTACK useNAT
  1    0.0   0.59   0.14   0.06   0.10
  2    0.0   0.71   0.14   0.05   0.10
  3    0.0   0.78   0.17   0.06   0.11
  4    0.1   0.90   0.18   0.06   0.11
  5    0.1   0.95   0.18   0.06   0.10
...
145    9.6  52.49   1.80   0.11   0.17
146    9.7  51.62   1.91   0.13   0.19
147    9.8  48.77   1.87   0.13   0.17
148   10.0

Results
   Rounds  :       147
newNAT ok  :    295178
newNAT fail:         0
useNAT ok  :   2431666 (out)
useNAT fail:         0 (out)
useNAT ok  :  19669446 (in)
useNAT fail:         0 (in)
RANDOM ok  :      1940
RANDOM fail:    145060
ATTACK ok  :         0
ATTACK fail:    147000
             ---------
      Total:  22690290

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 39535
Build 36424: arc lint + arc unit

Event Timeline

  • NO_ADDR for fragments
  • Factor out common search pattens in in and out direction.
  • Reclassify SCTP links when specifying missing parts.
donner added a subscriber: lev.

Added @lev primary for testing of functionality.
There should be some performance improvements due to:

  • some early breaks in the loops with many active flows
  • separation of "complete" flows from configured "forwardings"
  • use of NO_ADDR when possible

I did run the tests from D30335 against this patch:

1_instance:1_singleinit  ->  passed  [0.004s]
1_instance:2_destroynull  ->  expected_failure: Code expects valid pointer.  [0.014s]
1_instance:3_multiinit  ->  passed  [0.004s]
1_instance:4_multiinstance  ->  passed  [0.036s]
2_natout:1_simplemasq  ->  passed  [0.003s]
2_natout:2_unregistered  ->  passed  [0.003s]
2_natout:3_cgn  ->  passed  [0.003s]
2_natout:4_udp  ->  passed  [0.003s]
2_natout:5_sameport  ->  passed  [0.003s]
2_natout:6_cleartable  ->  passed  [0.003s]
2_natout:7_stress  ->  passed  [8.880s]

Compared to the original version, a performance boost in the stress test is observed: Down from 11 seconds to 9 seconds. This is mainly due to early break of lookup loops.

Rebase

Tests show several broken issues:

2_natout:2_unregistered  ->  failed: 2 checks failed
*** Check failed: 2_natout.c:88: addr_eq(cgn, pip->ip_src) not met
*** Check failed: 2_natout.c:89: addr_eq(pub, pip->ip_src) not met
2_natout:3_cgn  ->  failed: 1 checks failed
*** Check failed: 2_natout.c:122: addr_eq(pub, pip->ip_src) not met
3_natin:4_redirectaddr  ->  failed: 3 checks failed
*** Check failed: 3_natin.c:277: addr_eq(pub, p->ip_src) not met
*** Check failed: 3_natin.c:278: addr_eq(pub, p->ip_src) not met
*** Check failed: 3_natin.c:293: addr_eq(pub, p->ip_src) not met

And the new code is terribly slow, simply becaus the 4001er hash table
is gone:

RND SECOND newNAT RANDOM ATTACK useNAT
  1    0.0  33.25  21.74  21.61   3.90
  2    0.7  34.59  22.30  21.68   3.96
  3    1.4  37.09  23.20  21.84   4.11
  4    2.1  38.59  24.23  22.12   4.25
  5    2.9  39.57  25.06  22.08   4.41
  6    3.7  42.10  26.54  22.59   4.80
  7    4.5  44.15  27.81  22.33   4.78
  8    5.4  47.03  28.96  22.68   4.93
  9    6.3  48.78  30.56  23.03   5.15
 10    7.2  52.07  31.30  22.99   5.28
 11    8.1  54.44  32.57  23.00   5.46
 12    9.1  56.62  33.90  23.26

Results
   Rounds  :        11
newNAT ok  :     24000
newNAT fail:         0
useNAT ok  :         0 (out)
useNAT fail:    194573 (out)
useNAT ok  :   1576057 (in)
useNAT fail:         0 (in)
RANDOM ok  :       157
RANDOM fail:     11843
ATTACK ok  :         0
ATTACK fail:     12000
             ---------
      Total:   1818630

I'm writing this comment to this change almost arbitrary, as it is meta-comments on your (great!) work.

(1) I think, that performance baseline with 4001-sized (default) hashtable is almost meaningless, as this size is laughable even for SOHO router (in my experience), Replacing 4001 with 65537 gives almost +50% on low-end hardware without any other changes.

(2) According to my profiling with dtrace and flamegraphs, all performance is lost in contention over single mutex, not in data structures processing, like search or update, as soon as mutex is locked.

In D30283#683929, @lev wrote:

I'm writing this comment to this change almost arbitrary, as it is meta-comments on your (great!) work.

Thanks.

(1) I think, that performance baseline with 4001-sized (default) hashtable is almost meaningless, as this size is laughable even for SOHO router (in my experience), Replacing 4001 with 65537 gives almost +50% on low-end hardware without any other changes.

They will be replaced anyway.

(2) According to my profiling with dtrace and flamegraphs, all performance is lost in contention over single mutex, not in data structures processing, like search or update, as soon as mutex is locked.

That's right, so I have 64 or 128 libalias instances in parallel. Other data structures will probably allow shared access.

Rework the whole patch to get rid of failures of the test suite and to
only improve the performance.

  • Rename NO_ADDR to ANY_ADDR
  • Factor out the common Out filter
  • Factor out the common In filter
  • Search fully specified links first.
  • Separate the partially specified links into a separate data structure.
  • Use a (smaller) hash table to speed up the partially link access.
  • Clean program flow in _FindLinkIn
  • Factor out the outgoing search function.
  • Fix small error in termination.
  • Factor out Cleanup and Use for links.
  • Cleanup the partial find procedure
  • Rework the incoming search over full know links.
  • Just another early exit of the search loop
  • Reorder incoming links by grouping of common search terms.

Major performance improvement for unmatched incoming flows (RANDOM or
ATTACK).

@lev may you consider testing this patch in your environment? It should speed up you use case.

  • Remove LSNAT from outgoing search.

Slight speedup due to less comparsions in the loop.

  • Touch links for PPTP on use, too.

@lev may you consider testing this patch in your environment? It should speed up you use case.

I'll test this at this weekend or later next week, payed job takes too much time now, sorry.

BTW, I'm testing with BSD Router Project methodology:

"beefy computer N1 with netmap+pkt-gen" - 1G Link -> DUT - 1G Link -> "beefy computer N2 with traffic mirror" - 10G Link -> "beefy computer N1 with pkt-gen as receiver", and script search for highest packet rate which has no drops. It allows to get exact maximum PPS, but can not benchmark situation when part of traffic is dropped by firewall/nat (because it is impossible to distinguish reasons of packet drop with pkt-gen and shell scripts) and it can use only UDP.

So, my testing scenario will not notice some changes, as a lot of failed lookups ("attack") or TCP-specific processing.

  • Let PPTP use their own data structure.
  • Fix "freeing wrong structure" bug
  • Factor out pptpLists
  • Factor out the search for a new port. Improves the perfomance a bit.
  • Get rid of PORT_BASE. Replace by AliasRange.
  • Rename pptpTable to pptpList

This is too complex.
I'll split it into smaller parts.