Skip to content

Commit

Permalink
Merge pull request #887 from iqiyi/devel
Browse files Browse the repository at this point in the history
merge v1.9.4 to master
  • Loading branch information
ywc689 authored Apr 21, 2023
2 parents b524e36 + 2298b0c commit 30e5588
Show file tree
Hide file tree
Showing 56 changed files with 4,575 additions and 4,537 deletions.
3 changes: 2 additions & 1 deletion conf/dpvs.bond.conf.sample
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode off
! pdump off
! kni on
! pdump off
}

! netif config
Expand Down
1 change: 1 addition & 0 deletions conf/dpvs.conf.items
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ global_defs {
<init> log_async_mode off <off, on|off>
<init> log_async_pool_size 16383 <16383, 1023-unlimited>
<init> pdump off <off, on|off>
<init> kni on <on, on|off>
}

! netif config
Expand Down
3 changes: 2 additions & 1 deletion conf/dpvs.conf.sample
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! pdump off
! kni on
! pdump off
}

! netif config
Expand Down
1 change: 1 addition & 0 deletions conf/dpvs.conf.single-bond.sample
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! kni on
}

! netif config
Expand Down
1 change: 1 addition & 0 deletions conf/dpvs.conf.single-nic.sample
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ global_defs {
log_level WARNING
! log_file /var/log/dpvs.log
! log_async_mode on
! kni on
}

! netif config
Expand Down
108 changes: 107 additions & 1 deletion doc/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ DPVS Tutorial
* [UDP Option of Address (UOA)](#uoa)
* [Launch DPVS in Virtual Machine (Ubuntu)](#Ubuntu16.04)
* [Traffic Control(TC)](#tc)
* [Multiple Instances](#multi-instance)
* [Debug DPVS](#debug)
- [Debug with Log](#debug-with-log)
- [Packet Capture and Tcpdump](#packet-capture)
Expand Down Expand Up @@ -1191,6 +1192,111 @@ worker_defs {
Please refer to doc [tc.md](tc.md).
<a id='multi-instance'/>
# Multiple Instances
Generally, DPVS is a network process running on physical server which is usually equipped with dozens of CPUs and vast sufficient memory. DPVS is CPU/memory efficient, so the CPU/memory resources on a general physical server are usually far from fully used. Thus we may hope to run multiple independent DPVS instances on a server to make the most out of it. A DPVS instance may use 1~4 NIC ports, depending on if the ports are bonding and the network topology of two-arm or one-arm. Extra NICs are needed if we want to run multiple DPVS instances because one NIC port should be managed only by one DPVS instance. Now let's make insights into the details of multiple DPVS instances.
#### CPU Isolation
The CPUs used by DPVS are always busy loop. If a CPU is assigned to two DPVS instances simultaneously, then both instances are to suffer from dramatic processing delay. So different instances must run on different CPUs, which is achieved by the procedures below.
- Start DPVS with EAL options `-l CORELIST` or `--lcores COREMAP` or `-c COREMASK` to specify on which CPUs the instance is to run.
- Configure corresponding CPUs into DPVS config file (config key: worker_defs/worker */cpu_id).
It's suggested we select the CPUs and NIC ports on the same numa node on numa-aware platform. Performance degrades if the NIC ports and CPUs of a DPVS instance are on different numa nodes.
#### Memory Isolation
As is known, DPVS takes advantage of hugepage memory. The hugepage memory of different DPVS instances can be isolated by using different memory mapping files. The DPDK EAL option `--file-prefix` specifies the name prefix of memory mapping file. Thus multiple DPVS instances can run simultaneously by specifying unique name prefixes of hugepage memory with this EAL option.
#### Process Isolation
* DPVS Process Isolation
Every DPVS instance must have an unique PID file, a config file, and an IPC socket file, which are specified by the following DPVS options respectively.
-p, --pid-file FILE
-c, --conf FILE
-x, --ipc-file FILE
For example,
```sh
./bin/dpvs -c /etc/dpvs1.conf -p /var/run/dpvs1.pid -x /var/run/dpvs1.ipc -- --file-prefix=dpvs1 -a 0000:4b:00.0 -a 0000:4b:00.1 -l 0-8 --main-lcore 0
```
* Keepalived Process Isolation
One DPVS instance corresponds to one keepalived instance, and vice versa. Similarly, different keepalived processes must have unique config files and PID files. Note that depending on the configurations, keepalived for DPVS may consist of 3 daemon processes, i.e, the main process, the health check subprocess, and the vrrp subprocess. The config files and PID files for different keepalived instances can be specified by the following options, respectively.
-f, --use-file=FILE
-p, --pid=FILE
-c, --checkers_pid=FILE
-r, --vrrp_pid=FILE
For example,
```sh
./bin/keepalived -D -f etc/keepalived/keepalived1.conf --pid=/var/run/keepalived1.pid --vrrp_pid=/var/run/vrrp1.pid --checkers_pid=/var/run/checkers1.pid
```
#### Talk to different DPVS instances with dpip/ipvsadm
`Dpip` and `ipvsadm` are the utility tools used to configure DPVS. By default, they works well on the single DPVS instance server without any extra settings. But on the multiple DPVS instance server, an envrionment variable `DPVS_IPC_FILE` should be preset as the DPVS's IPC socket file to which ipvsadm/dpip wants to talk. Refer to the the previous part "DPVS Process Isolation" for how to specify different IPC socket files for multiple DPVS instances. For example,
```sh
DPVS_IPC_FILE=/var/run/dpvs1.ipc ipvsadm -ln
# or equivalently,
export DPVS_IPC_FILE=/var/run/dpvs1.ipc
ipvsadm -ln
```
#### NIC Ports, KNI and Routes
The multiple DPVS instances running on a server are independent, that is DPVS adopts the deployment model [Running Multiple Independent DPDK Applications](https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#running-multiple-independent-dpdk-applications), which requires the instances cannot share any NIC ports. We can use the EAL options "-a, --allow" or "-b, --block" to allow/disable the NIC ports for a DPVS instance. However, Linux KNI kernel module only supports one DPVS instance in a specific network namespace (refer to [kernel/linux/kni/kni_misc.c](https://github.com/DPDK/dpdk/tree/main/kernel/linux/kni)). Basically, DPVS provides two solutions to the problem.
* Solution 1: Disable KNI on all other DPVS instances except the first one. A global config item `kni` has been added to DPVS since now.
```
# dpvs.conf
global_defs {
...
<init> kni on <default on, on|off>
...
}
```
* Solution 2: Run DPVS instances in different network namespaces. It also resolves the route conflicts for multiple KNI network ports of different DPVS instances. A typical procedure to run a DPVS instance in a network namespace is shown below.
Firstly, create a new network namespace, "dpvsns" for example.
```sh
/usr/sbin/ip netns add dpvsns
```
Secondly, move the NIC ports for this DPVS instance to the newly created network namespace.
```sh
/usr/sbin/ip link set eth1 netns dpvsns
/usr/sbin/ip link set eth2 netns dpvsns
/usr/sbin/ip link set eth3 netns dpvsns
```
Lastly, start DPVS and all its related processes (such as keepalived, routing daemon) in the network namespace.
```sh
/usr/sbin/ip netns exec dpvsns ./bin/dpvs -c /etc/dpvs2.conf -p /var/run/dpvs2.pid -x /var/run/dpvs2.ipc -- --file-prefix=dpvs2 -a 0000:cb:00.1 -a 0000:ca:00.0 -a 0000:ca:00.1 -l 12-20 --main-lcore 12
/usr/sbin/ip netns exec dpvsns ./bin/keepalived -D --pid=/var/run/keepalived2.pid --vrrp_pid=/var/run/vrrp2.pid --checkers_pid=/var/run/checkers2.pid -f etc/keepalived/keepalived2.conf
/usr/sbin/ip netns exec dpvsns /usr/sbin/bird -f -c /etc/bird2.conf -s /var/run/bird2/bird.ctl
...
```
For performance improvement, we can enable multiple kthread mode when multiple DPVS instances are deployed on a server. In this mode, each KNI port is processed by a dedicated kthread rather than a shared kthread.
```sh
insmod rte_kni.ko kthread_mode=multiple carrier=on
```
<a id='debug'/>
# Debug DPVS
Expand Down Expand Up @@ -1361,7 +1467,7 @@ $
### dpdk-pdump
The `dpdk-pdump` runs as a DPDK secondary process and is capable of enabling packet capture on dpdk ports. DPVS works as the primary process for dpdk-pdump, which shoud enable the packet capture framework by setting `global_defs/pdump` to be `on` in `/etc/dpvs.conf` when DPVS starts up.
The `dpdk-pdump` runs as a DPDK secondary process and is capable of enabling packet capture on dpdk ports. DPVS works as the primary process for dpdk-pdump, which should enable the packet capture framework by setting `global_defs/pdump` to be `on` in `/etc/dpvs.conf` when DPVS starts up.
Refer to [dpdk-pdump doc](https://doc.dpdk.org/guides/tools/pdump.html) for its usage. DPVS extends dpdk-pdump with a [DPDK patch](../patch/dpdk-stable-18.11.2/0005-enable-pdump-and-change-dpdk-pdump-tool-for-dpvs.patch) to add some packet filtering features. Run `dpdk-pdump -- --help` to find all supported pdump params.
Expand Down
4 changes: 2 additions & 2 deletions include/conf/blklst.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ struct dp_vs_blklst_entry {
union inet_addr addr;
};

struct dp_vs_blklst_conf {
typedef struct dp_vs_blklst_conf {
/* identify service */
int af;
uint8_t proto;
Expand All @@ -39,7 +39,7 @@ struct dp_vs_blklst_conf {

/* for set */
union inet_addr blklst;
};
} dpvs_blklst_t;

struct dp_vs_blklst_conf_array {
int naddr;
Expand Down
82 changes: 34 additions & 48 deletions include/conf/dest.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#define __DPVS_DEST_CONF_H__

#include "conf/service.h"
#include "conf/match.h"
#include "conf/conn.h"

/*
Expand All @@ -40,72 +41,57 @@ enum {
DPVS_DEST_F_OVERLOAD = 0x1<<1,
};

struct dp_vs_dest_conf {
typedef struct dp_vs_dest_compat {
/* destination server address */
int af;
union inet_addr addr;
uint16_t port;
uint16_t proto;
uint32_t weight; /* destination weight */
union inet_addr addr;

unsigned conn_flags; /* connection flags */

enum dpvs_fwd_mode fwdmode;
/* real server options */
unsigned conn_flags; /* connection flags */
int weight; /* destination weight */

/* thresholds for active connections */
uint32_t max_conn; /* upper threshold */
uint32_t min_conn; /* lower threshold */
};

struct dp_vs_dest_entry {
int af;
union inet_addr addr; /* destination address */
uint16_t port;
unsigned conn_flags; /* connection flags */
int weight; /* destination weight */

uint32_t max_conn; /* upper threshold */
uint32_t min_conn; /* lower threshold */

uint32_t actconns; /* active connections */
uint32_t inactconns; /* inactive connections */
uint32_t persistconns; /* persistent connections */
uint32_t actconns; /* active connections */
uint32_t inactconns; /* inactive connections */
uint32_t persistconns; /* persistent connections */

/* statistics */
struct dp_vs_stats stats;
};

struct dp_vs_get_dests {
/* which service: user fills in these */
int af;
uint16_t proto;
union inet_addr addr; /* virtual address */
uint16_t port;
uint32_t fwmark; /* firwall mark of service */
#ifdef _HAVE_IPVS_TUN_TYPE_
int tun_type;
int tun_port;
#ifdef _HAVE_IPVS_TUN_CSUM_
int tun_flags;
#endif
#endif
} dpvs_dest_compat_t;

typedef struct dp_vs_dest_table {
int af;
uint16_t proto;
uint16_t port;
uint32_t fwmark;
union inet_addr addr;

/* number of real servers */
unsigned int num_dests;
unsigned int num_dests;

lcoreid_t cid;
struct dp_vs_match match;

char srange[256];
char drange[256];
char iifname[IFNAMSIZ];
char oifname[IFNAMSIZ];
lcoreid_t cid;
lcoreid_t index;

/* the real servers */
struct dp_vs_dest_entry entrytable[0];
};
dpvs_dest_compat_t entrytable[0];
} dpvs_dest_table_t;

struct dp_vs_dest_user {
int af;
union inet_addr addr;
uint16_t port;

unsigned conn_flags;
int weight;

uint32_t max_conn;
uint32_t min_conn;
};
#define dp_vs_get_dests dp_vs_dest_table
#define dp_vs_dest_entry dp_vs_dest_compat
#define dp_vs_dest_conf dp_vs_dest_compat

#endif /* __DPVS_DEST_CONF_H__ */
1 change: 1 addition & 0 deletions include/conf/inet.h
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ static inline int inet_addr_range_parse(const char *param,
port1 = port2 = NULL;
}

*af = 0;
memset(range, 0, sizeof(*range));

if (strlen(ip1) && inet_pton(AF_INET6, ip1, &range->min_addr.in6) > 0) {
Expand Down
12 changes: 6 additions & 6 deletions include/conf/laddr.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

#include "inet.h"
#include "net/if.h"
#include "conf/match.h"
#include "conf/sockopts.h"

struct dp_vs_laddr_entry {
Expand All @@ -33,18 +34,17 @@ struct dp_vs_laddr_entry {
uint32_t nconns;
};

struct dp_vs_laddr_conf {
typedef struct dp_vs_laddr_conf {
/* identify service */
int af_s;
uint8_t proto;
union inet_addr vaddr;
uint16_t vport;
uint32_t fwmark;
char srange[256];
char drange[256];
char iifname[IFNAMSIZ];
char oifname[IFNAMSIZ];

struct dp_vs_match match;
lcoreid_t cid;
lcoreid_t index;

/* for set */
int af_l;
Expand All @@ -54,6 +54,6 @@ struct dp_vs_laddr_conf {
/* for get */
int nladdrs;
struct dp_vs_laddr_entry laddrs[0];
};
} dpvs_laddr_table_t;

#endif /* __DPVS_LADDR_CONF_H__ */
35 changes: 35 additions & 0 deletions include/conf/match.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,41 @@ static inline bool is_empty_match(const struct dp_vs_match *match)
return !memcmp(match, &zero_match, sizeof(*match));
}

static inline int dp_vs_match_parse(const char *srange, const char *drange,
const char *iifname, const char *oifname,
int af, struct dp_vs_match *match)
{
int s_af = 0, d_af = 0, err;

memset(match, 0, sizeof(*match));

if (srange && strlen(srange)) {
err = inet_addr_range_parse(srange, &match->srange, &s_af);
if (err != EDPVS_OK)
return err;
}

if (drange && strlen(drange)) {
err = inet_addr_range_parse(drange, &match->drange, &d_af);
if (err != EDPVS_OK)
return err;
}

if (s_af && d_af && s_af != d_af) {
return EDPVS_INVAL;
}
match->af = s_af | d_af;

if (af && match->af && af != match->af) {
return EDPVS_INVAL;
}

snprintf(match->iifname, IFNAMSIZ, "%s", iifname ? : "");
snprintf(match->oifname, IFNAMSIZ, "%s", oifname ? : "");

return EDPVS_OK;
}

static inline int parse_match(const char *pattern, uint8_t *proto,
struct dp_vs_match *match)
{
Expand Down
Loading

0 comments on commit 30e5588

Please sign in to comment.