Page MenuHomeFreeBSD

D32693.diff
No OneTemporary

D32693.diff

diff --git a/UPDATING b/UPDATING
--- a/UPDATING
+++ b/UPDATING
@@ -27,6 +27,19 @@
world, or to merely disable the most expensive debugging functionality
at runtime, run "ln -s 'abort:false,junk:false' /etc/malloc.conf".)
+20211110:
+ Commit xxxxxx changed the TCP congestion control framework so
+ that any of the included congestion control modules could be
+ the single module built into the kernel. Previously newreno
+ was automatically built in through direct reference. Has of
+ this commit you are required to declare at least one congestion
+ control module (e.g. 'options CC_NEWRENO') and to also delcare a
+ default using the CC_DEFAULT option (e.g. options CC_DEFAULT="newreno\").
+ The GENERIC configuation includes CC_NEWRENO and defines newreno
+ as the default. If no congestion control option is built into the
+ kernel and you are including networking, the kernel compile will
+ fail. Also if no default is declared the kernel compile will fail.
+
20211106:
Commit f0c9847a6c47 changed the arguments for VOP_ALLOCATE.
The NFS modules must be rebuilt from sources and any out
diff --git a/share/man/man4/cc_newreno.4 b/share/man/man4/cc_newreno.4
--- a/share/man/man4/cc_newreno.4
+++ b/share/man/man4/cc_newreno.4
@@ -75,7 +75,33 @@
.Va net.inet.tcp.cc.abe=1
per: cwnd = (cwnd * CC_NEWRENO_BETA_ECN) / 100.
Default is 80.
+.It Va CC_NEWRENO_ENABLE_HYSTART
+will enable or disable the application of Hystart++.
+The current implementation allows the values 0, 1, 2 and 3.
+A value of 0 (the default) disables the use of Hystart++.
+Setting the value to 1 enables Hystart++.
+Setting the value to 2 enables Hystart++ but also will cause, on exit from Hystart++'s CSS, to
+set the cwnd to the value of where the increase in RTT first began as
+well as setting ssthresh to the flight at send when we exit CSS.
+Setting a value of 3 will keep the setting of the cwnd the same as 2, but will cause ssthresh
+to be set to the average value between the lowest fas rtt (the value cwnd is
+set to) and the fas value at exit of CSS.
+.PP
+Note that currently the only way to enable
+hystart++ is to enable it via socket option.
+When enabling it a value of 1 will enable precise internet-draft behavior
+(subject to any MIB variable settings), other setting (2 and 3) are experimental.
.El
+.PP
+Note that hystart++ requires the TCP stack be able to call to the congestion
+controller with both the
+.Va newround
+function as well as the
+.Va rttsample
+function.
+Currently the only TCP stacks that provide this feedback to the
+congestion controller is rack.
+.Pp
.Sh MIB Variables
The algorithm exposes these variables in the
.Va net.inet.tcp.cc.newreno
@@ -94,6 +120,32 @@
.Va net.inet.tcp.cc.abe=1
per: cwnd = (cwnd * beta_ecn) / 100.
Default is 80.
+.It Va hystartplusplus.bblogs
+This boolean controls if black box logging will be done for hystart++ events. If set
+to zero (the default) no logging is performed.
+If set to one then black box logs will be generated on all hystart++ events.
+.It Va hystartplusplus.css_rounds
+This value controls the number of rounds that CSS runs for.
+The default value matches the current internet-draft of 5.
+.It Va hystartplusplus.css_growth_div
+This value controls the divisor applied to slowstart during CSS.
+The default value matches the current internet-draft of 4.
+.It Va hystartplusplus.n_rttsamples
+This value controls how many rtt samples must be collected in each round for
+hystart++ to be active.
+The default value matches the current internet-draft of 8.
+.It Va hystartplusplus.maxrtt_thresh
+This value controls the maximum rtt variance clamp when considering if CSS is needed.
+The default value matches the current internet-draft of 16000 (in microseconds).
+For further explanation please see the internet-draft.
+.It Va hystartplusplus.minrtt_thresh
+This value controls the minimum rtt variance clamp when considering if CSS is needed.
+The default value matches the current internet-draft of 4000 (in microseconds).
+For further explanation please see the internet-draft.
+.It Va hystartplusplus.lowcwnd
+This value controls what is the lowest congestion window that the tcp
+stack must be at before hystart++ engages.
+The default value matches the current internet-draft of 16.
.El
.Sh SEE ALSO
.Xr cc_cdg 4 ,
diff --git a/share/man/man4/mod_cc.4 b/share/man/man4/mod_cc.4
--- a/share/man/man4/mod_cc.4
+++ b/share/man/man4/mod_cc.4
@@ -67,6 +67,16 @@
for details).
Callers must pass a pointer to an algorithm specific data, and specify
its size.
+.Pp
+Unloading a congestion control module will fail if it is used as a
+default by any Vnet.
+When unloading a module, the Vnet default is
+used to switch a connection to an alternate congestion control.
+Note that the new congestion control module may fail to initialize its
+internal memory, if so it will fail the module unload.
+If this occurs often times retrying the unload will succeed since the temporary
+memory shortage as the new CC module malloc's memory, that prevented the
+switch is often transient.
.Sh MIB Variables
The framework exposes the following variables in the
.Va net.inet.tcp.cc
@@ -93,6 +103,44 @@
If non-zero, apply standard beta instead of ABE-beta during ECN-signalled
congestion recovery episodes if loss also needs to be repaired.
.El
+.Pp
+Each congestion control module may also expose other MIB variables
+to control their behaviour.
+.Sh Kernel Configuration
+.Pp
+All of the available congestion control modules may also be loaded
+via kernel configutation options.
+A kernel configuration is required to have at least one congestion control
+algorithm built into it via kernel option and a system default specified.
+Compilation of the kernel will fail if these two conditions are not met.
+.Sh Kernel Configuration Options
+The framework exposes the following kernel configuration options.
+.Bl -tag -width ".Va CC_NEWRENO"
+.It Va CC_NEWRENO
+This directive loads the newreno congestion control algorithm and is included
+in GENERIC by default.
+.It Va CC_CUBIC
+This directive loads the cubic congestion control algorithm.
+.It Va CC_VEGAS
+This directive loads the vegas congestion control algorithm, note that
+this algorithm also requires the TCP_HHOOK option as well.
+.It Va CC_CDG
+This directive loads the cdg congestion control algorithm, note that
+this algorithm also requires the TCP_HHOOK option as well.
+.It Va CC_DCTCP
+This directive loads the dctcp congestion control algorithm.
+.It Va CC_HD
+This directive loads the hd congestion control algorithm, note that
+this algorithm also requires the TCP_HHOOK option as well.
+.It Va CC_CHD
+This directive loads the chd congestion control algorithm, note that
+this algorithm also requires the TCP_HHOOK option as well.
+.It Va CC_HTCP
+This directive loads the htcp congestion control algorithm.
+.It Va CC_DEFAULT
+This directive specifies the string that represents the name of the system default algorithm, the GENERIC kernel
+defaults this to newreno.
+.El
.Sh SEE ALSO
.Xr cc_cdg 4 ,
.Xr cc_chd 4 ,
@@ -103,6 +151,8 @@
.Xr cc_newreno 4 ,
.Xr cc_vegas 4 ,
.Xr tcp 4 ,
+.Xr config 5 ,
+.Xr config 8 ,
.Xr mod_cc 9
.Sh ACKNOWLEDGEMENTS
Development and testing of this software were made possible in part by grants
diff --git a/share/man/man9/mod_cc.9 b/share/man/man9/mod_cc.9
--- a/share/man/man9/mod_cc.9
+++ b/share/man/man9/mod_cc.9
@@ -68,7 +68,8 @@
char name[TCP_CA_NAME_MAX];
int (*mod_init) (void);
int (*mod_destroy) (void);
- int (*cb_init) (struct cc_var *ccv);
+ size_t (*cc_data_sz)(void);
+ int (*cb_init) (struct cc_var *ccv, void *ptr);
void (*cb_destroy) (struct cc_var *ccv);
void (*conn_init) (struct cc_var *ccv);
void (*ack_received) (struct cc_var *ccv, uint16_t type);
@@ -76,6 +77,8 @@
void (*post_recovery) (struct cc_var *ccv);
void (*after_idle) (struct cc_var *ccv);
int (*ctl_output)(struct cc_var *, struct sockopt *, void *);
+ void (*rttsample)(struct cc_var *, uint32_t, uint32_t, uint32_t);
+ void (*newround)(struct cc_var *, uint32_t);
};
.Ed
.Pp
@@ -104,6 +107,17 @@
The return value is currently ignored.
.Pp
The
+.Va cc_data_sz
+function is called by the socket option code to get the size of
+data that the
+.Va cb_init
+function needs.
+The socket option code then preallocates the modules memory so that the
+.Va cb_init
+function will not fail (the socket option code uses M_WAITOK with
+no locks held to do this).
+.Pp
+The
.Va cb_init
function is called when a TCP control block
.Vt struct tcpcb
@@ -114,6 +128,9 @@
.Va cb_init
will cause the connection set up to be aborted, terminating the connection as a
result.
+Note that the ptr argument passed to the function should be checked to
+see if it is non-NULL, if so it is preallocated memory that the cb_init function
+must use instead of calling malloc itself.
.Pp
The
.Va cb_destroy
@@ -182,6 +199,30 @@
pointer to algorithm specific argument.
.Pp
The
+.Va rttsample
+function is called to pass round trip time information to the
+congestion controller.
+The additional arguments to the function include the microsecond RTT
+that is being noted, the number of times that the data being
+acknowledged was retransmitted as well as the flightsize at send.
+For transports that do not track flightsize at send, this variable
+will be the current cwnd at the time of the call.
+.Pp
+The
+.Va newround
+function is called each time a new round trip time begins.
+The montonically increasing round number is also passed to the
+congestion controller as well.
+This can be used for various purposes by the congestion controller (e.g Hystart++).
+.Pp
+Note that currently not all TCP stacks call the
+.Va rttsample
+and
+.Va newround
+function so dependancy on these functions is also
+dependant upon which TCP stack is in use.
+.Pp
+The
.Fn DECLARE_CC_MODULE
macro provides a convenient wrapper around the
.Xr DECLARE_MODULE 9
@@ -203,8 +244,23 @@
.Vt struct cc_algo ,
but are only required to set the name field, and optionally any of the function
pointers.
+Note that if a module defines the
+.Va cb_init
+function it also must define a
+.Va cc_data_sz
+function.
+This is because when switching from one congestion control
+module to another the socket option code will preallocate memory for the
+.Va cb_init
+function. If no memory is allocated by the modules
+.Va cb_init
+then the
+.Va cc_data_sz
+function should return 0.
+.Pp
The stack will skip calling any function pointer which is NULL, so there is no
-requirement to implement any of the function pointers.
+requirement to implement any of the function pointers (with the exception of
+the cb_init <-> cc_data_sz dependancy noted above).
Using the C99 designated initialiser feature to set fields is encouraged.
.Pp
Each function pointer which deals with congestion control state is passed a
@@ -222,6 +278,8 @@
struct tcpcb *tcp;
struct sctp_nets *sctp;
} ccvc;
+ uint16_t nsegs;
+ uint8_t labc;
};
.Ed
.Pp
@@ -305,6 +363,19 @@
by the value of the congestion window.
Algorithms should use the absence of this flag being set to avoid accumulating
a large difference between the congestion window and send window.
+.Pp
+The
+.Va nsegs
+variable is used to pass in how much compression was done by the local
+LRO system.
+So for example if LRO pushed three in-order acknowledgements into
+one acknowledgement the variable would be set to three.
+.Pp
+The
+.Va labc
+variable is used in conjunction with the CCF_USE_LOCAL_ABC flag
+to override what labc variable the congestion controller will use
+for this particular acknowledgement.
.Sh SEE ALSO
.Xr cc_cdg 4 ,
.Xr cc_chd 4 ,
diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC
--- a/sys/amd64/conf/GENERIC
+++ b/sys/amd64/conf/GENERIC
@@ -30,6 +30,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET # InterNETworking
options INET6 # IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
options ROUTE_MPATH # Multipath routing support
options FIB_ALGO # Modular fib lookups
diff --git a/sys/arm/conf/std.armv6 b/sys/arm/conf/std.armv6
--- a/sys/arm/conf/std.armv6
+++ b/sys/arm/conf/std.armv6
@@ -8,6 +8,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET # InterNETworking
options INET6 # IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options TCP_HHOOK # hhook(9) framework for TCP
device crypto # core crypto support
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
diff --git a/sys/arm/conf/std.armv7 b/sys/arm/conf/std.armv7
--- a/sys/arm/conf/std.armv7
+++ b/sys/arm/conf/std.armv7
@@ -8,6 +8,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET # InterNETworking
options INET6 # IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options TCP_HHOOK # hhook(9) framework for TCP
device crypto # core crypto support
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
diff --git a/sys/arm64/conf/std.arm64 b/sys/arm64/conf/std.arm64
--- a/sys/arm64/conf/std.arm64
+++ b/sys/arm64/conf/std.arm64
@@ -11,6 +11,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET # InterNETworking
options INET6 # IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
options ROUTE_MPATH # Multipath routing support
options FIB_ALGO # Modular fib lookups
diff --git a/sys/conf/NOTES b/sys/conf/NOTES
--- a/sys/conf/NOTES
+++ b/sys/conf/NOTES
@@ -646,7 +646,26 @@
#
options INET #Internet communications protocols
options INET6 #IPv6 communications protocols
-
+#
+# Note if you include INET/INET6 or both options
+# You *must* define at least one of the congestion control
+# options or the compile will fail. Generic defines
+# options CC_NEWRENO. You also will need to specify
+# a default or the compile of your kernel will fail
+# as well. The string in default is the name of the
+# cc module as it would appear in the sysctl for
+# setting the default. Generic defines newreno
+# as shown below.
+#
+options CC_CDG
+options CC_CHD
+options CC_CUBIC
+options CC_DCTCP
+options CC_HD
+options CC_HTCP
+options CC_NEWRENO
+options CC_VEGAS
+options CC_DEFAULT=\"newreno\"
options RATELIMIT # TX rate limiting support
options ROUTETABLES=2 # allocated fibs up to 65536. default is 1.
diff --git a/sys/conf/files b/sys/conf/files
--- a/sys/conf/files
+++ b/sys/conf/files
@@ -4351,8 +4351,20 @@
netinet/ip_output.c optional inet
netinet/ip_reass.c optional inet
netinet/raw_ip.c optional inet | inet6
-netinet/cc/cc.c optional inet | inet6
-netinet/cc/cc_newreno.c optional inet | inet6
+netinet/cc/cc.c optional cc_newreno inet | cc_vegas inet | \
+ cc_htcp inet | cc_hd inet | cc_dctcp inet | cc_cubic inet | \
+ cc_chd inet | cc_cdg inet | cc_newreno inet6 | cc_vegas inet6 | \
+ cc_htcp inet6 | cc_hd inet6 |cc_dctcp inet6 | cc_cubic inet6 | \
+ cc_chd inet6 | cc_cdg inet6
+netinet/cc/cc_cdg.c optional inet cc_cdg tcp_hhook
+netinet/cc/cc_chd.c optional inet cc_chd tcp_hhook
+netinet/cc/cc_cubic.c optional inet cc_cubic | inet6 cc_cubic
+netinet/cc/cc_dctcp.c optional inet cc_dctcp | inet6 cc_dctcp
+netinet/cc/cc_hd.c optional inet cc_hd tcp_hhook
+netinet/cc/cc_htcp.c optional inet cc_htcp | inet6 cc_htcp
+netinet/cc/cc_newreno.c optional inet cc_newreno | inet6 cc_newreno
+netinet/cc/cc_vegas.c optional inet cc_vegas tcp_hhook
+netinet/khelp/h_ertt.c optional inet tcp_hhook
netinet/sctp_asconf.c optional inet sctp | inet6 sctp
netinet/sctp_auth.c optional inet sctp | inet6 sctp
netinet/sctp_bsd_addr.c optional inet sctp | inet6 sctp
diff --git a/sys/conf/options b/sys/conf/options
--- a/sys/conf/options
+++ b/sys/conf/options
@@ -81,6 +81,15 @@
CALLOUT_PROFILING
CAPABILITIES opt_capsicum.h
CAPABILITY_MODE opt_capsicum.h
+CC_CDG opt_global.h
+CC_CHD opt_global.h
+CC_CUBIC opt_global.h
+CC_DEFAULT opt_cc.h
+CC_DCTCP opt_global.h
+CC_HD opt_global.h
+CC_HTCP opt_global.h
+CC_NEWRENO opt_global.h
+CC_VEGAS opt_global.h
COMPAT_43 opt_global.h
COMPAT_43TTY opt_global.h
COMPAT_FREEBSD4 opt_global.h
diff --git a/sys/i386/conf/GENERIC b/sys/i386/conf/GENERIC
--- a/sys/i386/conf/GENERIC
+++ b/sys/i386/conf/GENERIC
@@ -31,6 +31,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET # InterNETworking
options INET6 # IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
options ROUTE_MPATH # Multipath routing support
options TCP_HHOOK # hhook(9) framework for TCP
diff --git a/sys/modules/cc/Makefile b/sys/modules/cc/Makefile
--- a/sys/modules/cc/Makefile
+++ b/sys/modules/cc/Makefile
@@ -1,6 +1,7 @@
# $FreeBSD$
-SUBDIR= cc_cubic \
+SUBDIR= cc_newreno \
+ cc_cubic \
cc_dctcp \
cc_htcp
diff --git a/sys/modules/cc/cc_newreno/Makefile b/sys/modules/cc/cc_newreno/Makefile
new file mode 100644
--- /dev/null
+++ b/sys/modules/cc/cc_newreno/Makefile
@@ -0,0 +1,7 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/netinet/cc
+KMOD= cc_newreno
+SRCS= cc_newreno.c
+
+.include <bsd.kmod.mk>
diff --git a/sys/netinet/cc/cc.h b/sys/netinet/cc/cc.h
--- a/sys/netinet/cc/cc.h
+++ b/sys/netinet/cc/cc.h
@@ -53,10 +53,11 @@
#ifdef _KERNEL
+MALLOC_DECLARE(M_CC_MEM);
+
/* Global CC vars. */
extern STAILQ_HEAD(cc_head, cc_algo) cc_list;
extern const int tcprexmtthresh;
-extern struct cc_algo newreno_cc_algo;
/* Per-netstack bits. */
VNET_DECLARE(struct cc_algo *, default_cc_ptr);
@@ -139,8 +140,19 @@
/* Cleanup global module state on kldunload. */
int (*mod_destroy)(void);
- /* Init CC state for a new control block. */
- int (*cb_init)(struct cc_var *ccv);
+ /* Return the size of the void pointer the CC needs for state */
+ size_t (*cc_data_sz)(void);
+
+ /*
+ * Init CC state for a new control block. The CC
+ * module may be passed a NULL ptr indicating that
+ * it must allocate the memory. If it is passed a
+ * non-null pointer it is pre-allocated memory by
+ * the caller and the cb_init is expected to use that memory.
+ * It is not expected to fail if memory is passed in and
+ * all currently defined modules do not.
+ */
+ int (*cb_init)(struct cc_var *ccv, void *ptr);
/* Cleanup CC state for a terminating control block. */
void (*cb_destroy)(struct cc_var *ccv);
@@ -176,8 +188,11 @@
int (*ctl_output)(struct cc_var *, struct sockopt *, void *);
STAILQ_ENTRY (cc_algo) entries;
+ uint8_t flags;
};
+#define CC_MODULE_BEING_REMOVED 0x01 /* The module is being removed */
+
/* Macro to obtain the CC algo's struct ptr. */
#define CC_ALGO(tp) ((tp)->cc_algo)
@@ -185,7 +200,7 @@
#define CC_DATA(tp) ((tp)->ccv->cc_data)
/* Macro to obtain the system default CC algo's struct ptr. */
-#define CC_DEFAULT() V_default_cc_ptr
+#define CC_DEFAULT_ALGO() V_default_cc_ptr
extern struct rwlock cc_list_lock;
#define CC_LIST_LOCK_INIT() rw_init(&cc_list_lock, "cc_list")
@@ -198,5 +213,16 @@
#define CC_ALGOOPT_LIMIT 2048
+/*
+ * These routines give NewReno behavior to the caller
+ * they require no state and can be used by any other CC
+ * module that wishes to use NewReno type behaviour (along
+ * with anything else they may add on, pre or post call).
+ */
+void newreno_cc_post_recovery(struct cc_var *);
+void newreno_cc_after_idle(struct cc_var *);
+void newreno_cc_cong_signal(struct cc_var *, uint32_t );
+void newreno_cc_ack_received(struct cc_var *, uint16_t);
+
#endif /* _KERNEL */
#endif /* _NETINET_CC_CC_H_ */
diff --git a/sys/netinet/cc/cc.c b/sys/netinet/cc/cc.c
--- a/sys/netinet/cc/cc.c
+++ b/sys/netinet/cc/cc.c
@@ -50,7 +50,7 @@
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");
-
+#include <opt_cc.h>
#include <sys/param.h>
#include <sys/kernel.h>
#include <sys/libkern.h>
@@ -70,11 +70,15 @@
#include <netinet/in.h>
#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
+#include <netinet/tcp_seq.h>
#include <netinet/tcp_var.h>
+#include <netinet/tcp_log_buf.h>
+#include <netinet/tcp_hpts.h>
#include <netinet/cc/cc.h>
-
#include <netinet/cc/cc_module.h>
+MALLOC_DEFINE(M_CC_MEM, "CC Mem", "Congestion Control State memory");
+
/*
* List of available cc algorithms on the current system. First element
* is used as the system default CC algorithm.
@@ -84,7 +88,10 @@
/* Protects the cc_list TAILQ. */
struct rwlock cc_list_lock;
-VNET_DEFINE(struct cc_algo *, default_cc_ptr) = &newreno_cc_algo;
+VNET_DEFINE(struct cc_algo *, default_cc_ptr) = NULL;
+
+VNET_DEFINE(uint32_t, newreno_beta) = 50;
+#define V_newreno_beta VNET(newreno_beta)
/*
* Sysctl handler to show and change the default CC algorithm.
@@ -98,7 +105,10 @@
/* Get the current default: */
CC_LIST_RLOCK();
- strlcpy(default_cc, CC_DEFAULT()->name, sizeof(default_cc));
+ if (CC_DEFAULT_ALGO() != NULL)
+ strlcpy(default_cc, CC_DEFAULT_ALGO()->name, sizeof(default_cc));
+ else
+ memset(default_cc, 0, TCP_CA_NAME_MAX);
CC_LIST_RUNLOCK();
error = sysctl_handle_string(oidp, default_cc, sizeof(default_cc), req);
@@ -108,7 +118,6 @@
goto done;
error = ESRCH;
-
/* Find algo with specified name and set it to default. */
CC_LIST_RLOCK();
STAILQ_FOREACH(funcs, &cc_list, entries) {
@@ -141,7 +150,9 @@
nalgos++;
}
CC_LIST_RUNLOCK();
-
+ if (nalgos == 0) {
+ return (ENOENT);
+ }
s = sbuf_new(NULL, NULL, nalgos * TCP_CA_NAME_MAX, SBUF_FIXEDLEN);
if (s == NULL)
@@ -176,12 +187,13 @@
}
/*
- * Reset the default CC algo to NewReno for any netstack which is using the algo
- * that is about to go away as its default.
+ * Return the number of times a proposed removal_cc is
+ * being used as the default.
*/
-static void
-cc_checkreset_default(struct cc_algo *remove_cc)
+static int
+cc_check_default(struct cc_algo *remove_cc)
{
+ int cnt = 0;
VNET_ITERATOR_DECL(vnet_iter);
CC_LIST_LOCK_ASSERT();
@@ -189,12 +201,16 @@
VNET_LIST_RLOCK_NOSLEEP();
VNET_FOREACH(vnet_iter) {
CURVNET_SET(vnet_iter);
- if (strncmp(CC_DEFAULT()->name, remove_cc->name,
- TCP_CA_NAME_MAX) == 0)
- V_default_cc_ptr = &newreno_cc_algo;
+ if ((CC_DEFAULT_ALGO() != NULL) &&
+ strncmp(CC_DEFAULT_ALGO()->name,
+ remove_cc->name,
+ TCP_CA_NAME_MAX) == 0) {
+ cnt++;
+ }
CURVNET_RESTORE();
}
VNET_LIST_RUNLOCK_NOSLEEP();
+ return (cnt);
}
/*
@@ -218,31 +234,36 @@
err = ENOENT;
- /* Never allow newreno to be deregistered. */
- if (&newreno_cc_algo == remove_cc)
- return (EPERM);
-
/* Remove algo from cc_list so that new connections can't use it. */
CC_LIST_WLOCK();
STAILQ_FOREACH_SAFE(funcs, &cc_list, entries, tmpfuncs) {
if (funcs == remove_cc) {
- cc_checkreset_default(remove_cc);
- STAILQ_REMOVE(&cc_list, funcs, cc_algo, entries);
- err = 0;
+ if (cc_check_default(remove_cc)) {
+ err = EBUSY;
+ break;
+ }
+ /* Add a temp flag to stop new adds to it */
+ funcs->flags |= CC_MODULE_BEING_REMOVED;
+ break;
+ }
+ }
+ CC_LIST_WUNLOCK();
+ err = tcp_ccalgounload(remove_cc);
+ /*
+ * Now back through and we either remove the temp flag
+ * or pull the registration.
+ */
+ CC_LIST_WLOCK();
+ STAILQ_FOREACH_SAFE(funcs, &cc_list, entries, tmpfuncs) {
+ if (funcs == remove_cc) {
+ if (err == 0)
+ STAILQ_REMOVE(&cc_list, funcs, cc_algo, entries);
+ else
+ funcs->flags &= ~CC_MODULE_BEING_REMOVED;
break;
}
}
CC_LIST_WUNLOCK();
-
- if (!err)
- /*
- * XXXLAS:
- * - We may need to handle non-zero return values in future.
- * - If we add CC framework support for protocols other than
- * TCP, we may want a more generic way to handle this step.
- */
- tcp_ccalgounload(remove_cc);
-
return (err);
}
@@ -263,19 +284,218 @@
*/
CC_LIST_WLOCK();
STAILQ_FOREACH(funcs, &cc_list, entries) {
- if (funcs == add_cc || strncmp(funcs->name, add_cc->name,
- TCP_CA_NAME_MAX) == 0)
+ if (funcs == add_cc ||
+ strncmp(funcs->name, add_cc->name,
+ TCP_CA_NAME_MAX) == 0) {
err = EEXIST;
+ break;
+ }
}
-
- if (!err)
+ /*
+ * The first loaded congestion control module will become
+ * the default until we find the "CC_DEFAULT" defined in
+ * the config (if we do).
+ */
+ if (!err) {
STAILQ_INSERT_TAIL(&cc_list, add_cc, entries);
-
+ if (strcmp(add_cc->name, CC_DEFAULT) == 0) {
+ V_default_cc_ptr = add_cc;
+ } else if (V_default_cc_ptr == NULL) {
+ V_default_cc_ptr = add_cc;
+ }
+ }
CC_LIST_WUNLOCK();
return (err);
}
+/*
+ * Perform any necessary tasks before we exit congestion recovery.
+ */
+void
+newreno_cc_post_recovery(struct cc_var *ccv)
+{
+ int pipe;
+
+ if (IN_FASTRECOVERY(CCV(ccv, t_flags))) {
+ /*
+ * Fast recovery will conclude after returning from this
+ * function. Window inflation should have left us with
+ * approximately snd_ssthresh outstanding data. But in case we
+ * would be inclined to send a burst, better to do it via the
+ * slow start mechanism.
+ *
+ * XXXLAS: Find a way to do this without needing curack
+ */
+ if (V_tcp_do_newsack)
+ pipe = tcp_compute_pipe(ccv->ccvc.tcp);
+ else
+ pipe = CCV(ccv, snd_max) - ccv->curack;
+ if (pipe < CCV(ccv, snd_ssthresh))
+ /*
+ * Ensure that cwnd does not collapse to 1 MSS under
+ * adverse conditons. Implements RFC6582
+ */
+ CCV(ccv, snd_cwnd) = max(pipe, CCV(ccv, t_maxseg)) +
+ CCV(ccv, t_maxseg);
+ else
+ CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh);
+ }
+}
+
+void
+newreno_cc_after_idle(struct cc_var *ccv)
+{
+ uint32_t rw;
+ /*
+ * If we've been idle for more than one retransmit timeout the old
+ * congestion window is no longer current and we have to reduce it to
+ * the restart window before we can transmit again.
+ *
+ * The restart window is the initial window or the last CWND, whichever
+ * is smaller.
+ *
+ * This is done to prevent us from flooding the path with a full CWND at
+ * wirespeed, overloading router and switch buffers along the way.
+ *
+ * See RFC5681 Section 4.1. "Restarting Idle Connections".
+ *
+ * In addition, per RFC2861 Section 2, the ssthresh is set to the
+ * maximum of the former ssthresh or 3/4 of the old cwnd, to
+ * not exit slow-start prematurely.
+ */
+ rw = tcp_compute_initwnd(tcp_maxseg(ccv->ccvc.tcp));
+
+ CCV(ccv, snd_ssthresh) = max(CCV(ccv, snd_ssthresh),
+ CCV(ccv, snd_cwnd)-(CCV(ccv, snd_cwnd)>>2));
+
+ CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd));
+}
+
+/*
+ * Perform any necessary tasks before we enter congestion recovery.
+ */
+void
+newreno_cc_cong_signal(struct cc_var *ccv, uint32_t type)
+{
+ uint32_t cwin, factor;
+ u_int mss;
+
+ cwin = CCV(ccv, snd_cwnd);
+ mss = tcp_fixed_maxseg(ccv->ccvc.tcp);
+ /*
+ * Other TCP congestion controls use newreno_cong_signal(), but
+ * with their own private cc_data. Make sure the cc_data is used
+ * correctly.
+ */
+ factor = V_newreno_beta;
+
+ /* Catch algos which mistakenly leak private signal types. */
+ KASSERT((type & CC_SIGPRIVMASK) == 0,
+ ("%s: congestion signal type 0x%08x is private\n", __func__, type));
+
+ cwin = max(((uint64_t)cwin * (uint64_t)factor) / (100ULL * (uint64_t)mss),
+ 2) * mss;
+
+ switch (type) {
+ case CC_NDUPACK:
+ if (!IN_FASTRECOVERY(CCV(ccv, t_flags))) {
+ if (!IN_CONGRECOVERY(CCV(ccv, t_flags)))
+ CCV(ccv, snd_ssthresh) = cwin;
+ ENTER_RECOVERY(CCV(ccv, t_flags));
+ }
+ break;
+ case CC_ECN:
+ if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) {
+ CCV(ccv, snd_ssthresh) = cwin;
+ CCV(ccv, snd_cwnd) = cwin;
+ ENTER_CONGRECOVERY(CCV(ccv, t_flags));
+ }
+ break;
+ case CC_RTO:
+ CCV(ccv, snd_ssthresh) = max(min(CCV(ccv, snd_wnd),
+ CCV(ccv, snd_cwnd)) / 2 / mss,
+ 2) * mss;
+ CCV(ccv, snd_cwnd) = mss;
+ break;
+ }
+}
+
+void
+newreno_cc_ack_received(struct cc_var *ccv, uint16_t type)
+{
+ if (type == CC_ACK && !IN_RECOVERY(CCV(ccv, t_flags)) &&
+ (ccv->flags & CCF_CWND_LIMITED)) {
+ u_int cw = CCV(ccv, snd_cwnd);
+ u_int incr = CCV(ccv, t_maxseg);
+
+ /*
+ * Regular in-order ACK, open the congestion window.
+ * Method depends on which congestion control state we're
+ * in (slow start or cong avoid) and if ABC (RFC 3465) is
+ * enabled.
+ *
+ * slow start: cwnd <= ssthresh
+ * cong avoid: cwnd > ssthresh
+ *
+ * slow start and ABC (RFC 3465):
+ * Grow cwnd exponentially by the amount of data
+ * ACKed capping the max increment per ACK to
+ * (abc_l_var * maxseg) bytes.
+ *
+ * slow start without ABC (RFC 5681):
+ * Grow cwnd exponentially by maxseg per ACK.
+ *
+ * cong avoid and ABC (RFC 3465):
+ * Grow cwnd linearly by maxseg per RTT for each
+ * cwnd worth of ACKed data.
+ *
+ * cong avoid without ABC (RFC 5681):
+ * Grow cwnd linearly by approximately maxseg per RTT using
+ * maxseg^2 / cwnd per ACK as the increment.
+ * If cwnd > maxseg^2, fix the cwnd increment at 1 byte to
+ * avoid capping cwnd.
+ */
+ if (cw > CCV(ccv, snd_ssthresh)) {
+ if (V_tcp_do_rfc3465) {
+ if (ccv->flags & CCF_ABC_SENTAWND)
+ ccv->flags &= ~CCF_ABC_SENTAWND;
+ else
+ incr = 0;
+ } else
+ incr = max((incr * incr / cw), 1);
+ } else if (V_tcp_do_rfc3465) {
+ /*
+ * In slow-start with ABC enabled and no RTO in sight?
+ * (Must not use abc_l_var > 1 if slow starting after
+ * an RTO. On RTO, snd_nxt = snd_una, so the
+ * snd_nxt == snd_max check is sufficient to
+ * handle this).
+ *
+ * XXXLAS: Find a way to signal SS after RTO that
+ * doesn't rely on tcpcb vars.
+ */
+ uint16_t abc_val;
+
+ if (ccv->flags & CCF_USE_LOCAL_ABC)
+ abc_val = ccv->labc;
+ else
+ abc_val = V_tcp_abc_l_var;
+ if (CCV(ccv, snd_nxt) == CCV(ccv, snd_max))
+ incr = min(ccv->bytes_this_ack,
+ ccv->nsegs * abc_val *
+ CCV(ccv, t_maxseg));
+ else
+ incr = min(ccv->bytes_this_ack, CCV(ccv, t_maxseg));
+
+ }
+ /* ABC is on by default, so incr equals 0 frequently. */
+ if (incr > 0)
+ CCV(ccv, snd_cwnd) = min(cw + incr,
+ TCP_MAXWIN << CCV(ccv, snd_scale));
+ }
+}
+
/*
* Handles kld related events. Returns 0 on success, non-zero on failure.
*/
@@ -290,6 +510,15 @@
switch(event_type) {
case MOD_LOAD:
+ if ((algo->cc_data_sz == NULL) && (algo->cb_init != NULL)) {
+ /*
+ * A module must have a cc_data_sz function
+ * even if it has no data it should return 0.
+ */
+ printf("Module Load Fails, it lacks a cc_data_sz() function but has a cb_init()!\n");
+ err = EINVAL;
+ break;
+ }
if (algo->mod_init != NULL)
err = algo->mod_init();
if (!err)
diff --git a/sys/netinet/cc/cc_cdg.c b/sys/netinet/cc/cc_cdg.c
--- a/sys/netinet/cc/cc_cdg.c
+++ b/sys/netinet/cc/cc_cdg.c
@@ -67,6 +67,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
#include <netinet/tcp_seq.h>
#include <netinet/tcp_timer.h>
@@ -197,10 +201,6 @@
32531,32533,32535,32537,32538,32540,32542,32544,32545,32547};
static uma_zone_t qdiffsample_zone;
-
-static MALLOC_DEFINE(M_CDG, "cdg data",
- "Per connection data required for the CDG congestion control algorithm");
-
static int ertt_id;
VNET_DEFINE_STATIC(uint32_t, cdg_alpha_inc);
@@ -222,10 +222,11 @@
static int cdg_mod_init(void);
static int cdg_mod_destroy(void);
static void cdg_conn_init(struct cc_var *ccv);
-static int cdg_cb_init(struct cc_var *ccv);
+static int cdg_cb_init(struct cc_var *ccv, void *ptr);
static void cdg_cb_destroy(struct cc_var *ccv);
static void cdg_cong_signal(struct cc_var *ccv, uint32_t signal_type);
static void cdg_ack_received(struct cc_var *ccv, uint16_t ack_type);
+static size_t cdg_data_sz(void);
struct cc_algo cdg_cc_algo = {
.name = "cdg",
@@ -235,7 +236,10 @@
.cb_init = cdg_cb_init,
.conn_init = cdg_conn_init,
.cong_signal = cdg_cong_signal,
- .mod_destroy = cdg_mod_destroy
+ .mod_destroy = cdg_mod_destroy,
+ .cc_data_sz = cdg_data_sz,
+ .post_recovery = newreno_cc_post_recovery,
+ .after_idle = newreno_cc_after_idle,
};
/* Vnet created and being initialised. */
@@ -271,10 +275,6 @@
CURVNET_RESTORE();
}
VNET_LIST_RUNLOCK();
-
- cdg_cc_algo.post_recovery = newreno_cc_algo.post_recovery;
- cdg_cc_algo.after_idle = newreno_cc_algo.after_idle;
-
return (0);
}
@@ -286,15 +286,25 @@
return (0);
}
+static size_t
+cdg_data_sz(void)
+{
+ return (sizeof(struct cdg));
+}
+
static int
-cdg_cb_init(struct cc_var *ccv)
+cdg_cb_init(struct cc_var *ccv, void *ptr)
{
struct cdg *cdg_data;
- cdg_data = malloc(sizeof(struct cdg), M_CDG, M_NOWAIT);
- if (cdg_data == NULL)
- return (ENOMEM);
-
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ cdg_data = malloc(sizeof(struct cdg), M_CC_MEM, M_NOWAIT);
+ if (cdg_data == NULL)
+ return (ENOMEM);
+ } else {
+ cdg_data = ptr;
+ }
cdg_data->shadow_w = 0;
cdg_data->max_qtrend = 0;
cdg_data->min_qtrend = 0;
@@ -350,7 +360,7 @@
qds = qds_n;
}
- free(ccv->cc_data, M_CDG);
+ free(ccv->cc_data, M_CC_MEM);
}
static int
@@ -484,7 +494,7 @@
ENTER_RECOVERY(CCV(ccv, t_flags));
break;
default:
- newreno_cc_algo.cong_signal(ccv, signal_type);
+ newreno_cc_cong_signal(ccv, signal_type);
break;
}
}
@@ -714,5 +724,5 @@
"the window backoff for loss based CC compatibility");
DECLARE_CC_MODULE(cdg, &cdg_cc_algo);
-MODULE_VERSION(cdg, 1);
+MODULE_VERSION(cdg, 2);
MODULE_DEPEND(cdg, ertt, 1, 1, 1);
diff --git a/sys/netinet/cc/cc_chd.c b/sys/netinet/cc/cc_chd.c
--- a/sys/netinet/cc/cc_chd.c
+++ b/sys/netinet/cc/cc_chd.c
@@ -69,6 +69,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
#include <netinet/tcp_seq.h>
#include <netinet/tcp_timer.h>
@@ -89,10 +93,11 @@
static void chd_ack_received(struct cc_var *ccv, uint16_t ack_type);
static void chd_cb_destroy(struct cc_var *ccv);
-static int chd_cb_init(struct cc_var *ccv);
+static int chd_cb_init(struct cc_var *ccv, void *ptr);
static void chd_cong_signal(struct cc_var *ccv, uint32_t signal_type);
static void chd_conn_init(struct cc_var *ccv);
static int chd_mod_init(void);
+static size_t chd_data_sz(void);
struct chd {
/*
@@ -126,8 +131,6 @@
#define V_chd_loss_fair VNET(chd_loss_fair)
#define V_chd_use_max VNET(chd_use_max)
-static MALLOC_DEFINE(M_CHD, "chd data",
- "Per connection data required for the CHD congestion control algorithm");
struct cc_algo chd_cc_algo = {
.name = "chd",
@@ -136,7 +139,10 @@
.cb_init = chd_cb_init,
.cong_signal = chd_cong_signal,
.conn_init = chd_conn_init,
- .mod_init = chd_mod_init
+ .mod_init = chd_mod_init,
+ .cc_data_sz = chd_data_sz,
+ .after_idle = newreno_cc_after_idle,
+ .post_recovery = newreno_cc_post_recovery,
};
static __inline void
@@ -304,18 +310,27 @@
static void
chd_cb_destroy(struct cc_var *ccv)
{
+ free(ccv->cc_data, M_CC_MEM);
+}
- free(ccv->cc_data, M_CHD);
+size_t
+chd_data_sz(void)
+{
+ return (sizeof(struct chd));
}
static int
-chd_cb_init(struct cc_var *ccv)
+chd_cb_init(struct cc_var *ccv, void *ptr)
{
struct chd *chd_data;
- chd_data = malloc(sizeof(struct chd), M_CHD, M_NOWAIT);
- if (chd_data == NULL)
- return (ENOMEM);
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ chd_data = malloc(sizeof(struct chd), M_CC_MEM, M_NOWAIT);
+ if (chd_data == NULL)
+ return (ENOMEM);
+ } else
+ chd_data = ptr;
chd_data->shadow_w = 0;
ccv->cc_data = chd_data;
@@ -374,7 +389,7 @@
break;
default:
- newreno_cc_algo.cong_signal(ccv, signal_type);
+ newreno_cc_cong_signal(ccv, signal_type);
}
}
@@ -403,10 +418,6 @@
printf("%s: h_ertt module not found\n", __func__);
return (ENOENT);
}
-
- chd_cc_algo.after_idle = newreno_cc_algo.after_idle;
- chd_cc_algo.post_recovery = newreno_cc_algo.post_recovery;
-
return (0);
}
@@ -493,5 +504,5 @@
"as the basic delay measurement for the algorithm.");
DECLARE_CC_MODULE(chd, &chd_cc_algo);
-MODULE_VERSION(chd, 1);
+MODULE_VERSION(chd, 2);
MODULE_DEPEND(chd, ertt, 1, 1, 1);
diff --git a/sys/netinet/cc/cc_cubic.c b/sys/netinet/cc/cc_cubic.c
--- a/sys/netinet/cc/cc_cubic.c
+++ b/sys/netinet/cc/cc_cubic.c
@@ -62,6 +62,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
#include <netinet/tcp_seq.h>
#include <netinet/tcp_timer.h>
@@ -72,7 +76,7 @@
static void cubic_ack_received(struct cc_var *ccv, uint16_t type);
static void cubic_cb_destroy(struct cc_var *ccv);
-static int cubic_cb_init(struct cc_var *ccv);
+static int cubic_cb_init(struct cc_var *ccv, void *ptr);
static void cubic_cong_signal(struct cc_var *ccv, uint32_t type);
static void cubic_conn_init(struct cc_var *ccv);
static int cubic_mod_init(void);
@@ -80,6 +84,7 @@
static void cubic_record_rtt(struct cc_var *ccv);
static void cubic_ssthresh_update(struct cc_var *ccv, uint32_t maxseg);
static void cubic_after_idle(struct cc_var *ccv);
+static size_t cubic_data_sz(void);
struct cubic {
/* Cubic K in fixed point form with CUBIC_SHIFT worth of precision. */
@@ -114,9 +119,6 @@
int t_last_cong_prev;
};
-static MALLOC_DEFINE(M_CUBIC, "cubic data",
- "Per connection data required for the CUBIC congestion control algorithm");
-
struct cc_algo cubic_cc_algo = {
.name = "cubic",
.ack_received = cubic_ack_received,
@@ -127,6 +129,7 @@
.mod_init = cubic_mod_init,
.post_recovery = cubic_post_recovery,
.after_idle = cubic_after_idle,
+ .cc_data_sz = cubic_data_sz
};
static void
@@ -149,7 +152,7 @@
if (CCV(ccv, snd_cwnd) <= CCV(ccv, snd_ssthresh) ||
cubic_data->min_rtt_ticks == TCPTV_SRTTBASE) {
cubic_data->flags |= CUBICFLAG_IN_SLOWSTART;
- newreno_cc_algo.ack_received(ccv, type);
+ newreno_cc_ack_received(ccv, type);
} else {
if ((cubic_data->flags & CUBICFLAG_RTO_EVENT) &&
(cubic_data->flags & CUBICFLAG_IN_SLOWSTART)) {
@@ -243,25 +246,34 @@
cubic_data->max_cwnd = ulmax(cubic_data->max_cwnd, CCV(ccv, snd_cwnd));
cubic_data->K = cubic_k(cubic_data->max_cwnd / CCV(ccv, t_maxseg));
- newreno_cc_algo.after_idle(ccv);
+ newreno_cc_after_idle(ccv);
cubic_data->t_last_cong = ticks;
}
static void
cubic_cb_destroy(struct cc_var *ccv)
{
- free(ccv->cc_data, M_CUBIC);
+ free(ccv->cc_data, M_CC_MEM);
+}
+
+static size_t
+cubic_data_sz(void)
+{
+ return (sizeof(struct cubic));
}
static int
-cubic_cb_init(struct cc_var *ccv)
+cubic_cb_init(struct cc_var *ccv, void *ptr)
{
struct cubic *cubic_data;
- cubic_data = malloc(sizeof(struct cubic), M_CUBIC, M_NOWAIT|M_ZERO);
-
- if (cubic_data == NULL)
- return (ENOMEM);
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ cubic_data = malloc(sizeof(struct cubic), M_CC_MEM, M_NOWAIT|M_ZERO);
+ if (cubic_data == NULL)
+ return (ENOMEM);
+ } else
+ cubic_data = ptr;
/* Init some key variables with sensible defaults. */
cubic_data->t_last_cong = ticks;
@@ -484,4 +496,4 @@
}
DECLARE_CC_MODULE(cubic, &cubic_cc_algo);
-MODULE_VERSION(cubic, 1);
+MODULE_VERSION(cubic, 2);
diff --git a/sys/netinet/cc/cc_dctcp.c b/sys/netinet/cc/cc_dctcp.c
--- a/sys/netinet/cc/cc_dctcp.c
+++ b/sys/netinet/cc/cc_dctcp.c
@@ -50,6 +50,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
#include <netinet/tcp_seq.h>
#include <netinet/tcp_var.h>
@@ -76,18 +80,16 @@
uint32_t num_cong_events; /* # of congestion events */
};
-static MALLOC_DEFINE(M_dctcp, "dctcp data",
- "Per connection data required for the dctcp algorithm");
-
static void dctcp_ack_received(struct cc_var *ccv, uint16_t type);
static void dctcp_after_idle(struct cc_var *ccv);
static void dctcp_cb_destroy(struct cc_var *ccv);
-static int dctcp_cb_init(struct cc_var *ccv);
+static int dctcp_cb_init(struct cc_var *ccv, void *ptr);
static void dctcp_cong_signal(struct cc_var *ccv, uint32_t type);
static void dctcp_conn_init(struct cc_var *ccv);
static void dctcp_post_recovery(struct cc_var *ccv);
static void dctcp_ecnpkt_handler(struct cc_var *ccv);
static void dctcp_update_alpha(struct cc_var *ccv);
+static size_t dctcp_data_sz(void);
struct cc_algo dctcp_cc_algo = {
.name = "dctcp",
@@ -99,6 +101,7 @@
.post_recovery = dctcp_post_recovery,
.ecnpkt_handler = dctcp_ecnpkt_handler,
.after_idle = dctcp_after_idle,
+ .cc_data_sz = dctcp_data_sz,
};
static void
@@ -117,10 +120,10 @@
*/
if (IN_CONGRECOVERY(CCV(ccv, t_flags))) {
EXIT_CONGRECOVERY(CCV(ccv, t_flags));
- newreno_cc_algo.ack_received(ccv, type);
+ newreno_cc_ack_received(ccv, type);
ENTER_CONGRECOVERY(CCV(ccv, t_flags));
} else
- newreno_cc_algo.ack_received(ccv, type);
+ newreno_cc_ack_received(ccv, type);
if (type == CC_DUPACK)
bytes_acked = min(ccv->bytes_this_ack, CCV(ccv, t_maxseg));
@@ -158,7 +161,13 @@
SEQ_GT(ccv->curack, dctcp_data->save_sndnxt))
dctcp_update_alpha(ccv);
} else
- newreno_cc_algo.ack_received(ccv, type);
+ newreno_cc_ack_received(ccv, type);
+}
+
+static size_t
+dctcp_data_sz(void)
+{
+ return (sizeof(struct dctcp));
}
static void
@@ -179,25 +188,27 @@
dctcp_data->num_cong_events = 0;
}
- newreno_cc_algo.after_idle(ccv);
+ newreno_cc_after_idle(ccv);
}
static void
dctcp_cb_destroy(struct cc_var *ccv)
{
- free(ccv->cc_data, M_dctcp);
+ free(ccv->cc_data, M_CC_MEM);
}
static int
-dctcp_cb_init(struct cc_var *ccv)
+dctcp_cb_init(struct cc_var *ccv, void *ptr)
{
struct dctcp *dctcp_data;
- dctcp_data = malloc(sizeof(struct dctcp), M_dctcp, M_NOWAIT|M_ZERO);
-
- if (dctcp_data == NULL)
- return (ENOMEM);
-
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ dctcp_data = malloc(sizeof(struct dctcp), M_CC_MEM, M_NOWAIT|M_ZERO);
+ if (dctcp_data == NULL)
+ return (ENOMEM);
+ } else
+ dctcp_data = ptr;
/* Initialize some key variables with sensible defaults. */
dctcp_data->bytes_ecn = 0;
dctcp_data->bytes_total = 0;
@@ -292,7 +303,7 @@
break;
}
} else
- newreno_cc_algo.cong_signal(ccv, type);
+ newreno_cc_cong_signal(ccv, type);
}
static void
@@ -312,7 +323,7 @@
static void
dctcp_post_recovery(struct cc_var *ccv)
{
- newreno_cc_algo.post_recovery(ccv);
+ newreno_cc_post_recovery(ccv);
if (CCV(ccv, t_flags2) & TF2_ECN_PERMIT)
dctcp_update_alpha(ccv);
@@ -468,4 +479,4 @@
"half CWND reduction after the first slow start");
DECLARE_CC_MODULE(dctcp, &dctcp_cc_algo);
-MODULE_VERSION(dctcp, 1);
+MODULE_VERSION(dctcp, 2);
diff --git a/sys/netinet/cc/cc_hd.c b/sys/netinet/cc/cc_hd.c
--- a/sys/netinet/cc/cc_hd.c
+++ b/sys/netinet/cc/cc_hd.c
@@ -84,6 +84,7 @@
static void hd_ack_received(struct cc_var *ccv, uint16_t ack_type);
static int hd_mod_init(void);
+static size_t hd_data_sz(void);
static int ertt_id;
@@ -97,9 +98,19 @@
struct cc_algo hd_cc_algo = {
.name = "hd",
.ack_received = hd_ack_received,
- .mod_init = hd_mod_init
+ .mod_init = hd_mod_init,
+ .cc_data_sz = hd_data_sz,
+ .after_idle = newreno_cc_after_idle,
+ .cong_signal = newreno_cc_cong_signal,
+ .post_recovery = newreno_cc_post_recovery,
};
+static size_t
+hd_data_sz(void)
+{
+ return (0);
+}
+
/*
* Hamilton backoff function. Returns 1 if we should backoff or 0 otherwise.
*/
@@ -150,14 +161,14 @@
* half cwnd and behave like an ECN (ie
* not a packet loss).
*/
- newreno_cc_algo.cong_signal(ccv,
+ newreno_cc_cong_signal(ccv,
CC_ECN);
return;
}
}
}
}
- newreno_cc_algo.ack_received(ccv, ack_type); /* As for NewReno. */
+ newreno_cc_ack_received(ccv, ack_type);
}
static int
@@ -169,11 +180,6 @@
printf("%s: h_ertt module not found\n", __func__);
return (ENOENT);
}
-
- hd_cc_algo.after_idle = newreno_cc_algo.after_idle;
- hd_cc_algo.cong_signal = newreno_cc_algo.cong_signal;
- hd_cc_algo.post_recovery = newreno_cc_algo.post_recovery;
-
return (0);
}
@@ -251,5 +257,5 @@
"minimum queueing delay threshold (qmin) in ticks");
DECLARE_CC_MODULE(hd, &hd_cc_algo);
-MODULE_VERSION(hd, 1);
+MODULE_VERSION(hd, 2);
MODULE_DEPEND(hd, ertt, 1, 1, 1);
diff --git a/sys/netinet/cc/cc_htcp.c b/sys/netinet/cc/cc_htcp.c
--- a/sys/netinet/cc/cc_htcp.c
+++ b/sys/netinet/cc/cc_htcp.c
@@ -64,6 +64,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
#include <netinet/tcp_seq.h>
#include <netinet/tcp_timer.h>
@@ -137,7 +141,7 @@
static void htcp_ack_received(struct cc_var *ccv, uint16_t type);
static void htcp_cb_destroy(struct cc_var *ccv);
-static int htcp_cb_init(struct cc_var *ccv);
+static int htcp_cb_init(struct cc_var *ccv, void *ptr);
static void htcp_cong_signal(struct cc_var *ccv, uint32_t type);
static int htcp_mod_init(void);
static void htcp_post_recovery(struct cc_var *ccv);
@@ -145,6 +149,7 @@
static void htcp_recalc_beta(struct cc_var *ccv);
static void htcp_record_rtt(struct cc_var *ccv);
static void htcp_ssthresh_update(struct cc_var *ccv);
+static size_t htcp_data_sz(void);
struct htcp {
/* cwnd before entering cong recovery. */
@@ -175,9 +180,6 @@
#define V_htcp_adaptive_backoff VNET(htcp_adaptive_backoff)
#define V_htcp_rtt_scaling VNET(htcp_rtt_scaling)
-static MALLOC_DEFINE(M_HTCP, "htcp data",
- "Per connection data required for the HTCP congestion control algorithm");
-
struct cc_algo htcp_cc_algo = {
.name = "htcp",
.ack_received = htcp_ack_received,
@@ -186,6 +188,8 @@
.cong_signal = htcp_cong_signal,
.mod_init = htcp_mod_init,
.post_recovery = htcp_post_recovery,
+ .cc_data_sz = htcp_data_sz,
+ .after_idle = newreno_cc_after_idle,
};
static void
@@ -214,7 +218,7 @@
*/
if (htcp_data->alpha == 1 ||
CCV(ccv, snd_cwnd) <= CCV(ccv, snd_ssthresh))
- newreno_cc_algo.ack_received(ccv, type);
+ newreno_cc_ack_received(ccv, type);
else {
if (V_tcp_do_rfc3465) {
/* Increment cwnd by alpha segments. */
@@ -238,18 +242,27 @@
static void
htcp_cb_destroy(struct cc_var *ccv)
{
- free(ccv->cc_data, M_HTCP);
+ free(ccv->cc_data, M_CC_MEM);
+}
+
+static size_t
+htcp_data_sz(void)
+{
+ return(sizeof(struct htcp));
}
static int
-htcp_cb_init(struct cc_var *ccv)
+htcp_cb_init(struct cc_var *ccv, void *ptr)
{
struct htcp *htcp_data;
- htcp_data = malloc(sizeof(struct htcp), M_HTCP, M_NOWAIT);
-
- if (htcp_data == NULL)
- return (ENOMEM);
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ htcp_data = malloc(sizeof(struct htcp), M_CC_MEM, M_NOWAIT);
+ if (htcp_data == NULL)
+ return (ENOMEM);
+ } else
+ htcp_data = ptr;
/* Init some key variables with sensible defaults. */
htcp_data->alpha = HTCP_INIT_ALPHA;
@@ -333,16 +346,12 @@
static int
htcp_mod_init(void)
{
-
- htcp_cc_algo.after_idle = newreno_cc_algo.after_idle;
-
/*
* HTCP_RTT_REF is defined in ms, and t_srtt in the tcpcb is stored in
* units of TCP_RTT_SCALE*hz. Scale HTCP_RTT_REF to be in the same units
* as t_srtt.
*/
htcp_rtt_ref = (HTCP_RTT_REF * TCP_RTT_SCALE * hz) / 1000;
-
return (0);
}
@@ -535,4 +544,4 @@
"enable H-TCP RTT scaling");
DECLARE_CC_MODULE(htcp, &htcp_cc_algo);
-MODULE_VERSION(htcp, 1);
+MODULE_VERSION(htcp, 2);
diff --git a/sys/netinet/cc/cc_newreno.c b/sys/netinet/cc/cc_newreno.c
--- a/sys/netinet/cc/cc_newreno.c
+++ b/sys/netinet/cc/cc_newreno.c
@@ -71,6 +71,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/in.h>
#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
@@ -82,22 +86,20 @@
#include <netinet/cc/cc_module.h>
#include <netinet/cc/cc_newreno.h>
-static MALLOC_DEFINE(M_NEWRENO, "newreno data",
- "newreno beta values");
-
static void newreno_cb_destroy(struct cc_var *ccv);
static void newreno_ack_received(struct cc_var *ccv, uint16_t type);
static void newreno_after_idle(struct cc_var *ccv);
static void newreno_cong_signal(struct cc_var *ccv, uint32_t type);
-static void newreno_post_recovery(struct cc_var *ccv);
static int newreno_ctl_output(struct cc_var *ccv, struct sockopt *sopt, void *buf);
static void newreno_newround(struct cc_var *ccv, uint32_t round_cnt);
static void newreno_rttsample(struct cc_var *ccv, uint32_t usec_rtt, uint32_t rxtcnt, uint32_t fas);
-static int newreno_cb_init(struct cc_var *ccv);
+static int newreno_cb_init(struct cc_var *ccv, void *);
+static size_t newreno_data_sz(void);
-VNET_DEFINE(uint32_t, newreno_beta) = 50;
-VNET_DEFINE(uint32_t, newreno_beta_ecn) = 80;
+
+VNET_DECLARE(uint32_t, newreno_beta);
#define V_newreno_beta VNET(newreno_beta)
+VNET_DEFINE(uint32_t, newreno_beta_ecn) = 80;
#define V_newreno_beta_ecn VNET(newreno_beta_ecn)
struct cc_algo newreno_cc_algo = {
@@ -106,11 +108,12 @@
.ack_received = newreno_ack_received,
.after_idle = newreno_after_idle,
.cong_signal = newreno_cong_signal,
- .post_recovery = newreno_post_recovery,
+ .post_recovery = newreno_cc_post_recovery,
.ctl_output = newreno_ctl_output,
.newround = newreno_newround,
.rttsample = newreno_rttsample,
.cb_init = newreno_cb_init,
+ .cc_data_sz = newreno_data_sz,
};
static uint32_t hystart_lowcwnd = 16;
@@ -167,14 +170,24 @@
}
}
+static size_t
+newreno_data_sz(void)
+{
+ return (sizeof(struct newreno));
+}
+
static int
-newreno_cb_init(struct cc_var *ccv)
+newreno_cb_init(struct cc_var *ccv, void *ptr)
{
struct newreno *nreno;
- ccv->cc_data = malloc(sizeof(struct newreno), M_NEWRENO, M_NOWAIT);
- if (ccv->cc_data == NULL)
- return (ENOMEM);
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ ccv->cc_data = malloc(sizeof(struct newreno), M_CC_MEM, M_NOWAIT);
+ if (ccv->cc_data == NULL)
+ return (ENOMEM);
+ } else
+ ccv->cc_data = ptr;
nreno = (struct newreno *)ccv->cc_data;
/* NB: nreno is not zeroed, so initialise all fields. */
nreno->beta = V_newreno_beta;
@@ -201,7 +214,7 @@
static void
newreno_cb_destroy(struct cc_var *ccv)
{
- free(ccv->cc_data, M_NEWRENO);
+ free(ccv->cc_data, M_CC_MEM);
}
static void
@@ -209,13 +222,7 @@
{
struct newreno *nreno;
- /*
- * Other TCP congestion controls use newreno_ack_received(), but
- * with their own private cc_data. Make sure the cc_data is used
- * correctly.
- */
- nreno = (CC_ALGO(ccv->ccvc.tcp) == &newreno_cc_algo) ? ccv->cc_data : NULL;
-
+ nreno = ccv->cc_data;
if (type == CC_ACK && !IN_RECOVERY(CCV(ccv, t_flags)) &&
(ccv->flags & CCF_CWND_LIMITED)) {
u_int cw = CCV(ccv, snd_cwnd);
@@ -249,8 +256,7 @@
* avoid capping cwnd.
*/
if (cw > CCV(ccv, snd_ssthresh)) {
- if ((nreno != NULL) &&
- (nreno->newreno_flags & CC_NEWRENO_HYSTART_IN_CSS)) {
+ if (nreno->newreno_flags & CC_NEWRENO_HYSTART_IN_CSS) {
/*
* We have slipped into CA with
* CSS active. Deactivate all.
@@ -284,8 +290,7 @@
abc_val = ccv->labc;
else
abc_val = V_tcp_abc_l_var;
- if ((nreno != NULL) &&
- (nreno->newreno_flags & CC_NEWRENO_HYSTART_ALLOWED) &&
+ if ((nreno->newreno_flags & CC_NEWRENO_HYSTART_ALLOWED) &&
(nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED) &&
((nreno->newreno_flags & CC_NEWRENO_HYSTART_IN_CSS) == 0)) {
/*
@@ -323,8 +328,7 @@
incr = min(ccv->bytes_this_ack, CCV(ccv, t_maxseg));
/* Only if Hystart is enabled will the flag get set */
- if ((nreno != NULL) &&
- (nreno->newreno_flags & CC_NEWRENO_HYSTART_IN_CSS)) {
+ if (nreno->newreno_flags & CC_NEWRENO_HYSTART_IN_CSS) {
incr /= hystart_css_growth_div;
newreno_log_hystart_event(ccv, nreno, 3, incr);
}
@@ -340,39 +344,10 @@
newreno_after_idle(struct cc_var *ccv)
{
struct newreno *nreno;
- uint32_t rw;
-
- /*
- * Other TCP congestion controls use newreno_after_idle(), but
- * with their own private cc_data. Make sure the cc_data is used
- * correctly.
- */
- nreno = (CC_ALGO(ccv->ccvc.tcp) == &newreno_cc_algo) ? ccv->cc_data : NULL;
- /*
- * If we've been idle for more than one retransmit timeout the old
- * congestion window is no longer current and we have to reduce it to
- * the restart window before we can transmit again.
- *
- * The restart window is the initial window or the last CWND, whichever
- * is smaller.
- *
- * This is done to prevent us from flooding the path with a full CWND at
- * wirespeed, overloading router and switch buffers along the way.
- *
- * See RFC5681 Section 4.1. "Restarting Idle Connections".
- *
- * In addition, per RFC2861 Section 2, the ssthresh is set to the
- * maximum of the former ssthresh or 3/4 of the old cwnd, to
- * not exit slow-start prematurely.
- */
- rw = tcp_compute_initwnd(tcp_maxseg(ccv->ccvc.tcp));
-
- CCV(ccv, snd_ssthresh) = max(CCV(ccv, snd_ssthresh),
- CCV(ccv, snd_cwnd)-(CCV(ccv, snd_cwnd)>>2));
- CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd));
- if ((nreno != NULL) &&
- (nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED) == 0) {
+ nreno = ccv->cc_data;
+ newreno_cc_after_idle(ccv);
+ if ((nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED) == 0) {
if (CCV(ccv, snd_cwnd) <= (hystart_lowcwnd * tcp_fixed_maxseg(ccv->ccvc.tcp))) {
/*
* Re-enable hystart if our cwnd has fallen below
@@ -396,12 +371,7 @@
cwin = CCV(ccv, snd_cwnd);
mss = tcp_fixed_maxseg(ccv->ccvc.tcp);
- /*
- * Other TCP congestion controls use newreno_cong_signal(), but
- * with their own private cc_data. Make sure the cc_data is used
- * correctly.
- */
- nreno = (CC_ALGO(ccv->ccvc.tcp) == &newreno_cc_algo) ? ccv->cc_data : NULL;
+ nreno = ccv->cc_data;
beta = (nreno == NULL) ? V_newreno_beta : nreno->beta;;
beta_ecn = (nreno == NULL) ? V_newreno_beta_ecn : nreno->beta_ecn;
/*
@@ -426,8 +396,7 @@
switch (type) {
case CC_NDUPACK:
- if ((nreno != NULL) &&
- (nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED)) {
+ if (nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED) {
/* Make sure the flags are all off we had a loss */
nreno->newreno_flags &= ~CC_NEWRENO_HYSTART_ENABLED;
nreno->newreno_flags &= ~CC_NEWRENO_HYSTART_IN_CSS;
@@ -445,8 +414,7 @@
}
break;
case CC_ECN:
- if ((nreno != NULL) &&
- (nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED)) {
+ if (nreno->newreno_flags & CC_NEWRENO_HYSTART_ENABLED) {
/* Make sure the flags are all off we had a loss */
nreno->newreno_flags &= ~CC_NEWRENO_HYSTART_ENABLED;
nreno->newreno_flags &= ~CC_NEWRENO_HYSTART_IN_CSS;
@@ -466,41 +434,6 @@
}
}
-/*
- * Perform any necessary tasks before we exit congestion recovery.
- */
-static void
-newreno_post_recovery(struct cc_var *ccv)
-{
- int pipe;
-
- if (IN_FASTRECOVERY(CCV(ccv, t_flags))) {
- /*
- * Fast recovery will conclude after returning from this
- * function. Window inflation should have left us with
- * approximately snd_ssthresh outstanding data. But in case we
- * would be inclined to send a burst, better to do it via the
- * slow start mechanism.
- *
- * XXXLAS: Find a way to do this without needing curack
- */
- if (V_tcp_do_newsack)
- pipe = tcp_compute_pipe(ccv->ccvc.tcp);
- else
- pipe = CCV(ccv, snd_max) - ccv->curack;
-
- if (pipe < CCV(ccv, snd_ssthresh))
- /*
- * Ensure that cwnd does not collapse to 1 MSS under
- * adverse conditons. Implements RFC6582
- */
- CCV(ccv, snd_cwnd) = max(pipe, CCV(ccv, t_maxseg)) +
- CCV(ccv, t_maxseg);
- else
- CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh);
- }
-}
-
static int
newreno_ctl_output(struct cc_var *ccv, struct sockopt *sopt, void *buf)
{
@@ -723,4 +656,4 @@
DECLARE_CC_MODULE(newreno, &newreno_cc_algo);
-MODULE_VERSION(newreno, 1);
+MODULE_VERSION(newreno, 2);
diff --git a/sys/netinet/cc/cc_vegas.c b/sys/netinet/cc/cc_vegas.c
--- a/sys/netinet/cc/cc_vegas.c
+++ b/sys/netinet/cc/cc_vegas.c
@@ -71,6 +71,10 @@
#include <net/vnet.h>
+#include <net/route.h>
+#include <net/route/nhop.h>
+
+#include <netinet/in_pcb.h>
#include <netinet/tcp.h>
#include <netinet/tcp_timer.h>
#include <netinet/tcp_var.h>
@@ -87,10 +91,11 @@
static void vegas_ack_received(struct cc_var *ccv, uint16_t ack_type);
static void vegas_cb_destroy(struct cc_var *ccv);
-static int vegas_cb_init(struct cc_var *ccv);
+static int vegas_cb_init(struct cc_var *ccv, void *ptr);
static void vegas_cong_signal(struct cc_var *ccv, uint32_t signal_type);
static void vegas_conn_init(struct cc_var *ccv);
static int vegas_mod_init(void);
+static size_t vegas_data_sz(void);
struct vegas {
int slow_start_toggle;
@@ -103,9 +108,6 @@
#define V_vegas_alpha VNET(vegas_alpha)
#define V_vegas_beta VNET(vegas_beta)
-static MALLOC_DEFINE(M_VEGAS, "vegas data",
- "Per connection data required for the Vegas congestion control algorithm");
-
struct cc_algo vegas_cc_algo = {
.name = "vegas",
.ack_received = vegas_ack_received,
@@ -113,7 +115,10 @@
.cb_init = vegas_cb_init,
.cong_signal = vegas_cong_signal,
.conn_init = vegas_conn_init,
- .mod_init = vegas_mod_init
+ .mod_init = vegas_mod_init,
+ .cc_data_sz = vegas_data_sz,
+ .after_idle = newreno_cc_after_idle,
+ .post_recovery = newreno_cc_post_recovery,
};
/*
@@ -162,24 +167,33 @@
}
if (vegas_data->slow_start_toggle)
- newreno_cc_algo.ack_received(ccv, ack_type);
+ newreno_cc_ack_received(ccv, ack_type);
}
static void
vegas_cb_destroy(struct cc_var *ccv)
{
- free(ccv->cc_data, M_VEGAS);
+ free(ccv->cc_data, M_CC_MEM);
+}
+
+static size_t
+vegas_data_sz(void)
+{
+ return (sizeof(struct vegas));
}
static int
-vegas_cb_init(struct cc_var *ccv)
+vegas_cb_init(struct cc_var *ccv, void *ptr)
{
struct vegas *vegas_data;
- vegas_data = malloc(sizeof(struct vegas), M_VEGAS, M_NOWAIT);
-
- if (vegas_data == NULL)
- return (ENOMEM);
+ INP_WLOCK_ASSERT(ccv->ccvc.tcp->t_inpcb);
+ if (ptr == NULL) {
+ vegas_data = malloc(sizeof(struct vegas), M_CC_MEM, M_NOWAIT);
+ if (vegas_data == NULL)
+ return (ENOMEM);
+ } else
+ vegas_data = ptr;
vegas_data->slow_start_toggle = 1;
ccv->cc_data = vegas_data;
@@ -216,7 +230,7 @@
break;
default:
- newreno_cc_algo.cong_signal(ccv, signal_type);
+ newreno_cc_cong_signal(ccv, signal_type);
}
if (IN_RECOVERY(CCV(ccv, t_flags)) && !presignalrecov)
@@ -236,16 +250,11 @@
static int
vegas_mod_init(void)
{
-
ertt_id = khelp_get_id("ertt");
if (ertt_id <= 0) {
printf("%s: h_ertt module not found\n", __func__);
return (ENOENT);
}
-
- vegas_cc_algo.after_idle = newreno_cc_algo.after_idle;
- vegas_cc_algo.post_recovery = newreno_cc_algo.post_recovery;
-
return (0);
}
@@ -301,5 +310,5 @@
"vegas beta, specified as number of \"buffers\" (0 < alpha < beta)");
DECLARE_CC_MODULE(vegas, &vegas_cc_algo);
-MODULE_VERSION(vegas, 1);
+MODULE_VERSION(vegas, 2);
MODULE_DEPEND(vegas, ertt, 1, 1, 1);
diff --git a/sys/netinet/tcp_subr.c b/sys/netinet/tcp_subr.c
--- a/sys/netinet/tcp_subr.c
+++ b/sys/netinet/tcp_subr.c
@@ -2137,8 +2137,9 @@
*/
CC_LIST_RLOCK();
KASSERT(!STAILQ_EMPTY(&cc_list), ("cc_list is empty!"));
- CC_ALGO(tp) = CC_DEFAULT();
+ CC_ALGO(tp) = CC_DEFAULT_ALGO();
CC_LIST_RUNLOCK();
+
/*
* The tcpcb will hold a reference on its inpcb until tcp_discardcb()
* is called.
@@ -2147,7 +2148,7 @@
tp->t_inpcb = inp;
if (CC_ALGO(tp)->cb_init != NULL)
- if (CC_ALGO(tp)->cb_init(tp->ccv) > 0) {
+ if (CC_ALGO(tp)->cb_init(tp->ccv, NULL) > 0) {
if (tp->t_fb->tfb_tcp_fb_fini)
(*tp->t_fb->tfb_tcp_fb_fini)(tp, 1);
in_pcbrele_wlocked(inp);
@@ -2240,25 +2241,23 @@
}
/*
- * Switch the congestion control algorithm back to NewReno for any active
- * control blocks using an algorithm which is about to go away.
- * This ensures the CC framework can allow the unload to proceed without leaving
- * any dangling pointers which would trigger a panic.
- * Returning non-zero would inform the CC framework that something went wrong
- * and it would be unsafe to allow the unload to proceed. However, there is no
- * way for this to occur with this implementation so we always return zero.
+ * Switch the congestion control algorithm back to Vnet default for any active
+ * control blocks using an algorithm which is about to go away. If the algorithm
+ * has a cb_init function and it fails (no memory) then the operation fails and
+ * the unload will not succeed.
+ *
*/
int
tcp_ccalgounload(struct cc_algo *unload_algo)
{
- struct cc_algo *tmpalgo;
+ struct cc_algo *oldalgo, *newalgo;
struct inpcb *inp;
struct tcpcb *tp;
VNET_ITERATOR_DECL(vnet_iter);
/*
* Check all active control blocks across all network stacks and change
- * any that are using "unload_algo" back to NewReno. If "unload_algo"
+ * any that are using "unload_algo" back to its default. If "unload_algo"
* requires cleanup code to be run, call it.
*/
VNET_LIST_RLOCK();
@@ -2272,6 +2271,7 @@
* therefore don't enter the loop below until the connection
* list has stabilised.
*/
+ newalgo = CC_DEFAULT_ALGO();
CK_LIST_FOREACH(inp, &V_tcb, inp_list) {
INP_WLOCK(inp);
/* Important to skip tcptw structs. */
@@ -2280,24 +2280,48 @@
/*
* By holding INP_WLOCK here, we are assured
* that the connection is not currently
- * executing inside the CC module's functions
- * i.e. it is safe to make the switch back to
- * NewReno.
+ * executing inside the CC module's functions.
+ * We attempt to switch to the Vnets default,
+ * if the init fails then we fail the whole
+ * operation and the module unload will fail.
*/
if (CC_ALGO(tp) == unload_algo) {
- tmpalgo = CC_ALGO(tp);
- if (tmpalgo->cb_destroy != NULL)
- tmpalgo->cb_destroy(tp->ccv);
- CC_DATA(tp) = NULL;
- /*
- * NewReno may allocate memory on
- * demand for certain stateful
- * configuration as needed, but is
- * coded to never fail on memory
- * allocation failure so it is a safe
- * fallback.
- */
- CC_ALGO(tp) = &newreno_cc_algo;
+ struct cc_var cc_mem;
+ int err;
+
+ oldalgo = CC_ALGO(tp);
+ memset(&cc_mem, 0, sizeof(cc_mem));
+ cc_mem.ccvc.tcp = tp;
+ if (newalgo->cb_init == NULL) {
+ /*
+ * No init we can skip the
+ * dance around a possible failure.
+ */
+ CC_DATA(tp) = NULL;
+ goto proceed;
+ }
+ err = (newalgo->cb_init)(&cc_mem, NULL);
+ if (err) {
+ /*
+ * Presumably no memory the caller will
+ * need to try again.
+ */
+ INP_WUNLOCK(inp);
+ INP_INFO_WUNLOCK(&V_tcbinfo);
+ CURVNET_RESTORE();
+ VNET_LIST_RUNLOCK();
+ return (err);
+ }
+proceed:
+ if (oldalgo->cb_destroy != NULL)
+ oldalgo->cb_destroy(tp->ccv);
+ CC_ALGO(tp) = newalgo;
+ memcpy(tp->ccv, &cc_mem, sizeof(struct cc_var));
+ if (TCPS_HAVEESTABLISHED(tp->t_state) &&
+ (CC_ALGO(tp)->conn_init != NULL)) {
+ /* Yep run the connection init for the new CC */
+ CC_ALGO(tp)->conn_init(tp->ccv);
+ }
}
}
INP_WUNLOCK(inp);
@@ -2306,7 +2330,6 @@
CURVNET_RESTORE();
}
VNET_LIST_RUNLOCK();
-
return (0);
}
diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c
--- a/sys/netinet/tcp_usrreq.c
+++ b/sys/netinet/tcp_usrreq.c
@@ -2007,6 +2007,115 @@
}
#endif
+extern struct cc_algo newreno_cc_algo;
+
+static int
+tcp_congestion(struct socket *so, struct sockopt *sopt, struct inpcb *inp, struct tcpcb *tp)
+{
+ struct cc_algo *algo;
+ void *ptr = NULL;
+ struct cc_var cc_mem;
+ char buf[TCP_CA_NAME_MAX];
+ size_t mem_sz;
+ int error;
+
+ INP_WUNLOCK(inp);
+ error = sooptcopyin(sopt, buf, TCP_CA_NAME_MAX - 1, 1);
+ if (error)
+ return(error);
+ buf[sopt->sopt_valsize] = '\0';
+ CC_LIST_RLOCK();
+ STAILQ_FOREACH(algo, &cc_list, entries)
+ if (strncmp(buf, algo->name,
+ TCP_CA_NAME_MAX) == 0) {
+ if (algo->flags & CC_MODULE_BEING_REMOVED) {
+ /* We can't "see" modules being unloaded */
+ continue;
+ }
+ break;
+ }
+ if (algo == NULL) {
+ CC_LIST_RUNLOCK();
+ return(ESRCH);
+ }
+do_over:
+ if (algo->cb_init != NULL) {
+ /* We can now pre-get the memory for the CC */
+ mem_sz = (*algo->cc_data_sz)();
+ if (mem_sz == 0) {
+ goto no_mem_needed;
+ }
+ CC_LIST_RUNLOCK();
+ ptr = malloc(mem_sz, M_CC_MEM, M_WAITOK);
+ CC_LIST_RLOCK();
+ STAILQ_FOREACH(algo, &cc_list, entries)
+ if (strncmp(buf, algo->name,
+ TCP_CA_NAME_MAX) == 0)
+ break;
+ if (algo == NULL) {
+ if (ptr)
+ free(ptr, M_CC_MEM);
+ CC_LIST_RUNLOCK();
+ return(ESRCH);
+ }
+ } else {
+no_mem_needed:
+ mem_sz = 0;
+ ptr = NULL;
+ }
+ /*
+ * Make sure its all clean and zero and also get
+ * back the inplock.
+ */
+ memset(&cc_mem, 0, sizeof(cc_mem));
+ if (mem_sz != (*algo->cc_data_sz)()) {
+ if (ptr)
+ free(ptr, M_CC_MEM);
+ goto do_over;
+ }
+ if (ptr) {
+ memset(ptr, 0, mem_sz);
+ INP_WLOCK_RECHECK_CLEANUP(inp, free(ptr, M_CC_MEM));
+ } else
+ INP_WLOCK_RECHECK(inp);
+ CC_LIST_RUNLOCK();
+ cc_mem.ccvc.tcp = tp;
+ /*
+ * We once again hold a write lock over the tcb so it's
+ * safe to do these things without ordering concerns.
+ * Note here we init into stack memory.
+ */
+ if (algo->cb_init != NULL)
+ error = algo->cb_init(&cc_mem, ptr);
+ else
+ error = 0;
+ /*
+ * The CC algorithms, when given their memory
+ * should not fail we could in theory have a
+ * KASSERT here.
+ */
+ if (error == 0) {
+ /*
+ * Touchdown, lets go ahead and move the
+ * connection to the new CC module by
+ * copying in the cc_mem after we call
+ * the old ones cleanup (if any).
+ */
+ if (CC_ALGO(tp)->cb_destroy != NULL)
+ CC_ALGO(tp)->cb_destroy(tp->ccv);
+ memcpy(tp->ccv, &cc_mem, sizeof(struct cc_var));
+ tp->cc_algo = algo;
+ /* Ok now are we where we have gotten past any conn_init? */
+ if (TCPS_HAVEESTABLISHED(tp->t_state) && (CC_ALGO(tp)->conn_init != NULL)) {
+ /* Yep run the connection init for the new CC */
+ CC_ALGO(tp)->conn_init(tp->ccv);
+ }
+ } else if (ptr)
+ free(ptr, M_CC_MEM);
+ INP_WUNLOCK(inp);
+ return (error);
+}
+
int
tcp_default_ctloutput(struct socket *so, struct sockopt *sopt, struct inpcb *inp, struct tcpcb *tp)
{
@@ -2016,7 +2125,6 @@
#ifdef KERN_TLS
struct tls_enable tls;
#endif
- struct cc_algo *algo;
char *pbuf, buf[TCP_LOG_ID_LEN];
#ifdef STATS
struct statsblob *sbp;
@@ -2223,46 +2331,7 @@
break;
case TCP_CONGESTION:
- INP_WUNLOCK(inp);
- error = sooptcopyin(sopt, buf, TCP_CA_NAME_MAX - 1, 1);
- if (error)
- break;
- buf[sopt->sopt_valsize] = '\0';
- INP_WLOCK_RECHECK(inp);
- CC_LIST_RLOCK();
- STAILQ_FOREACH(algo, &cc_list, entries)
- if (strncmp(buf, algo->name,
- TCP_CA_NAME_MAX) == 0)
- break;
- CC_LIST_RUNLOCK();
- if (algo == NULL) {
- INP_WUNLOCK(inp);
- error = EINVAL;
- break;
- }
- /*
- * We hold a write lock over the tcb so it's safe to
- * do these things without ordering concerns.
- */
- if (CC_ALGO(tp)->cb_destroy != NULL)
- CC_ALGO(tp)->cb_destroy(tp->ccv);
- CC_DATA(tp) = NULL;
- CC_ALGO(tp) = algo;
- /*
- * If something goes pear shaped initialising the new
- * algo, fall back to newreno (which does not
- * require initialisation).
- */
- if (algo->cb_init != NULL &&
- algo->cb_init(tp->ccv) != 0) {
- CC_ALGO(tp) = &newreno_cc_algo;
- /*
- * The only reason init should fail is
- * because of malloc.
- */
- error = ENOMEM;
- }
- INP_WUNLOCK(inp);
+ error = tcp_congestion(so, sopt, inp, tp);
break;
case TCP_REUSPORT_LB_NUMA:
diff --git a/sys/powerpc/conf/GENERIC b/sys/powerpc/conf/GENERIC
--- a/sys/powerpc/conf/GENERIC
+++ b/sys/powerpc/conf/GENERIC
@@ -38,6 +38,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET #InterNETworking
options INET6 #IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
options TCP_HHOOK # hhook(9) framework for TCP
options TCP_RFC7413 # TCP Fast Open
diff --git a/sys/riscv/conf/GENERIC b/sys/riscv/conf/GENERIC
--- a/sys/riscv/conf/GENERIC
+++ b/sys/riscv/conf/GENERIC
@@ -29,6 +29,8 @@
options VIMAGE # Subsystem virtualization, e.g. VNET
options INET # InterNETworking
options INET6 # IPv6 communications protocols
+options CC_NEWRENO # include newreno congestion control
+options CC_DEFAULT=\"newreno\" # define our default CC module it should be compiled in.
options TCP_HHOOK # hhook(9) framework for TCP
options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5
options ROUTE_MPATH # Multipath routing support

File Metadata

Mime Type
text/plain
Expires
Sun, Feb 23, 9:23 AM (14 h, 58 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
16792467
Default Alt Text
D32693.diff (67 KB)

Event Timeline