Setup network¶
For network workload, the test environment must be prepared in advance.
Install high-speed network NIC and link multiple workers together using either a high-speed switch or point-to-point cross-link cables. The network interface can be either a physical interface or a logical bond of multiple physical interfaces.
For example, at least two Ice Lake (or Sapphire Rapids) Linux hosts running CentOS Stream. On each Linux Host, you shall install one Intel E810-CQDA2 or E810-2CQDA2 100GE NIC adaptor (each adaptor has two 100GE ports) under CPU0 IIO root port PCIe bus (DO NOT install the 100GE NIC adaptor under PCH PCIe bus).
Network Setup¶
-
Use two 100GE copper(or optical) cables to connect those 100GE ports, connection method:
> Note: Make sure those 100GE ports is link up from NIC ports LED green light after machine power on again -
Make sure you Linux Hosts installed Linux kernel 100GE driver package "ice" for E810 adaptor:
Use Linux command to make sure ice driver/firmware version and 100GE ports link up Status/Duplex/Speed is 100GE full duplex link up.
- ethX is the interface name of each 100GE port in Linux host
ifconfigoutput. - you can use Linux kernel in-tree ice driver package. Just FYI: latest ice driver package download: https://sourceforge.net/projects/e1000/files/ice%20stable/
- ethX is the interface name of each 100GE port in Linux host
-
Use ifconfig command to config correct IP address with two different subnets(eg, 192.168.8.x and 192.168.9.x) for two 100GE connections, ping each other Linux Host to make sure those 100GE connections works fine;
For Example, you can configure IP address like this:
Host0 100GE physical port0(192.168.8.88) <-> Host1 100GE physical port0(192.168.8.99).
Host0 100GE physical port1(192.168.9.88) <-> Host1 100GE physical port1(192.168.9.99).- Host0
- Host1
- Host0
-
On each Linux host, use command to find out two E810 devices PCIe bus:device:function numbers, bind E810 PCIe devices to DPDK will use this PCIe bus:device:function number.
-
On each Linux host, install currently running kernel version kernel-devel package, then you can compile kernel modules .ko;
- make sure installed kernel-devel version equals
uname -rversion.
- make sure installed kernel-devel version equals
-
On each Linux host, we now suggest to bind NIC port to vfio-pci because igb_uio is not supported in newest OS.
To use vfio-pci we need to add intel_iommu=on iommu=pt to the Linux Boot Command line.
Also enable Intel vt-d in bios:EDKII Menu → Socket Configuration → IIO Configuration → Intel VT for directed IO (VT-d) → Intel VT for directed IO → Enable
Command to check Virtualization Technology and IOMMU enable:
Then plugin vfio-pci module:
7. On each Linux host, use command to set 100GE ports as ifconfig down state. * if E810 ports is ifconfig up state, it cannot bind to DPDK vfio-pci. -
On each Linux Host, download DPDK21.11.tar.xz, decompress it, use dpdk-devbind.py to bind 100GE ports to DPDK vfio-pci
wget http://fast.dpdk.org/rel/dpdk-21.11.tar.xz tar xf dpdk-21.11.tar.xz cd dpdk-stable-21.11/usertools ./dpdk-devbind.py --status ./dpdk-devbind.py --bind=vfio-pci bus:dev:func bus:dev:func ./dpdk-devbind.py --status-
Note: if your dpdk-devbind.py cannot running, run:
-
Note: if you want to bind 100GE port to kernel driver "ice" again run:
-
-
Use these commands to add labels to K8S nodes.
Currently workloads l3fwd/vppfib/ngfw/ovs-dpdk/pktgen need this label to schedule.
It indicates that the two 100GE ports have been bound to DPDK vfio-pci driver on the Linux host correctly.
-
On Linux grub boot up command line, you need to setup huge page numbers >= 2048.
The workload requests 2048 2MB hugepages. See Hugepage Setup for setup instructions.
After properly setup two Linux hosts 2MB hugepages, use these commands to make K8S two nodes label, currently l3fwd/pktgen k8s scheduling need this label.
-
Install K8S/Docker environment, configure one Linux host as k8s master and node, another Linux host as k8s node.
This workload can run with K8S backend; this workload cannot run with single host Docker backend;
-
Create configuration file named
network_env.confin the etc directory. This file contains 100G port or device information. The context should follow the following format:/etc/network_env.confdpdk_port1=0000:38:00.0 dpdk_port2=0000:38:00.1 dsa_dev1=0000:6a:01.0 dsa_dev2=0000:6f:01.0 dpdk_port1_srcmac_l3fwd=0xb4,0x96,0x91,0xb2,0xa6,0x48 dpdk_port2_srcmac_l3fwd=0xb4,0x96,0x91,0xb2,0xa6,0x49 dpdk_port1_destmac_l3fwd=0xb4,0x96,0x91,0xc3,0x85,0xd8 dpdk_port1_destmac_l3fwd_tx=b4:96:91:c3:85:d8 dpdk_port2_destmac_l3fwd=0xb4,0x96,0x91,0xc3,0x85,0xd9 dpdk_port2_destmac_l3fwd_tx=b4:96:91:c3:85:d9 dpdk_port_mac1=0xb4,0x96,0x91,0xb2,0xa5,0x10 dpdk_port_mac2=0xb4,0x96,0x91,0xb2,0xa5,0x11 neigh_port_mac1=0xb4,0x96,0x91,0x9b,0x79,0x38 neigh_port_mac2=0xb4,0x96,0x91,0x9b,0x79,0x39-
dpdk_port[1-9] means 100G port pci number.This port is bound to vfio_pci and used by dpdk. You can use command
lspci | grep Eth | grep E810to get pci number. -
dsa_dev[1-9] means DSA device number.For Intel® DSA devices, they are currently (at time of writing) appearing as devices with type “0b25”, due to the absence of pci-id database entries for them at this point. This device is bound to vfio_pci and used by dpdk. You can use command
lspci | grep 0b25to get dsa device number. -
dpdk_port_mac[1-9]: MAC address of NIC port with format as example 0xb4,0x96,0x91,0xb2,0xa6,0xd8.
-
neigh_port_mac[1-9]: MAC address of NIC port which connected with dpdk_port on another test machine.
-
dpdk_port[1-9]_srcmac_l3fwd: MAC address of NIC in this host port with format as example 0xb4,0x96,0x91,0xb2,0xa6,0x48.
-
dpdk_port[1-9]_destmac_l3fwd: MAC address of NIC port which connected with dpdk_port on another test machine.
-
dpdk_port[1-9]_destmac_l3fwd_tx: MAC address of NIC port which connected with dpdk_port on another test machine. Format should be b4:96:91:c3:85:d8
-
A workload can request network as follows:
- HAS-SETUP-NETWORK-SPEC-1: The worker node must have a set of network, whose specification, netwrok_spec_1, is specified in the terraform configuration files.
Node Labels:¶
Label the worker nodes with the following node labels:
- HAS-SETUP-NIC-25G=yes: Optional.
- HAS-SETUP-NIC-40G=yes: Optional.
- HAS-SETUP-NIC-100G=yes: Optional.
- HAS-SETUP-DPDK=yes: Required.
- HAS-SETUP-HUGEPAGE-2048kB-2048=yes: Required.
- HAS-SETUP-NETWORK-SPEC-1=yes: The worker node is equipped with the network described in network_spec_1.