User Guide (Enhanced Comprehensive)¶

TL;DR: Deterministic comprehensive guide assembled from all mirrored source markdown files.

User Guide (Enhanced Comprehensive)

Source Files¶

Collecting Results List Kpi¶

Source: doc/user-guide/collecting-results/list-kpi.md

list-kpi.sh is used to display KPI results of a workload execution by scanning ctest log files.

Execute this command in the workload folder to fetch KPIs:

./list-kpi.sh --all <testcase>

Options for `list-kpi.sh`¶

Usage: [options] [logs-directory]
--primary             List only the primary KPI.  
--all                 List all KPIs.  
--params              List workload configurations.  
--sutinfo             List sutinfo information.   
--format list|xls-list
                      Specify the output format.
--file <filename>     Specify the spread sheet filename. 
--uri                 Show the WSF portal URI if present.   
--intel_publish       Publish to the WSF dashboard. 
--export-trace        Re-run the trace export routines.  
--owner <name>        Set the publisher owner.  
--tags <tags>         Set the publisher tags.  
--recent              List KPIs for recent testcases.

Executing Workload Cmake¶

Source: doc/user-guide/executing-workload/cmake.md

This will help to generate native build tool that uses platform independent configuration files to generate native build tool files. You can execute inside build directory.

Build examples¶

cd build
cmake -DREGISTRY=xxyyzz.com:1234 ..

Customize the Build Process¶

You can use the following build options to customize the build process:

PLATFORM: Specify the platform names. See platforms for the list of platforms.
REGISTRY: Must end with forward slash (/). Specify the privacy docker registry URL. If specified, all built images will be pushed to given docker registry.
REGISTRY_AUTH: Specify the registry authentication method. The only supported value is docker, which uses the docker configuration file.
RELEASE: Must begin with colon (:). Specify the release version. All built images will be tagged with it. Defaults to :latest
BACKEND: Specify the validation backend: docker, kubernetes, or terraform.
TERRAFORM_OPTIONS: Specify the terraform options.
TERRAFORM_SUT: Specify the target System Under Test (SUT) list.
TIMEOUT: Specify the validation timeout, which contains the execution timeout and docker pull timeout. Default to 28800,300 seconds.
BENCHMARK: Specify a workload pattern. Workloads not matching the pattern will be disabled. The workload pattern is in the format of <workload-path>/<workload-sub-target>, where <workload-path> is the workload path relative to the project root, and <workload-sub-target> is any sub-component defined in the workload. For example, some workload defines multiple versions thus have multiple builds and testcase targets.
SPOT_INSTANCE: If specified, overwrite the spot_instance variable in the Cloud configuration files.

cmake -DPLATFORM=xyz -DREGISTRY=xxyyzz.com:1234 -DBACKEND=xxyzz ..

cmake -DBENCHMARK=workload/SpecCpu-2017 ..  # all build targets of SpecCpu-2017
cmake -DBENCHMARK=workload/SpecCpu-2017/speccpu_2017_v119_gcc13_ubuntu24 .. # specific build target of SpecCpu-2017

Command Make Targets¶

bom: Print out the BOM list of each workload.
clean: Purge the logs.

cd build
cmake ..
make bom

Executing Workload Ctest¶

Source: doc/user-guide/executing-workload/ctest.md

Use ./ctest.sh to run a single test or batch of tests. You can do this at the top-level build directory or under each workload directory. In the latter case, only the tests of the workload will be executed.

cd build
cd workload/dummy
./ctest.sh -N

CTest Options¶

There is an extensive list of options in ./ctest.sh to control how tests can be executed. The followings are most common options which are inherited from ctest. See man ctest for all inherited ctest options. The ./ctest.sh extensions are listed below.

-R: Select tests based on a regular expression string.
-E: Exclude tests based on a regular expression string.
-V: Show test execution with details.
-N: List test vectors only.

Example: list tests with boringssl in name excluding those with _gated

./ctest.sh -R boringssl -E _gated -N

Example: run only test_static_boringssl (exact match)

./ctest.sh -R '^test_static_boringssl$'

Customize Configurations¶

It is possible to specify a test configuration file to overwrite any configuration parameter of a test case:

./ctest.sh --config=test_config.yaml -V

The configuration file uses the following format:

*_dummy_pi:
    SCALE: 3000

where *_dummy_pi specifies the test case name. You can use * to specify a wildcard match. The subsection underneath specifies the configuration variables and values. Any parameters specified in each test case validate.sh can be overwritten.

Use with caution as overwriting configuration parameters may lead to invalid parameter combinations.

Benchmark Scripts¶

A set of utility scripts are linked under your workload build directory to make it easy for workload benchmark activities.

`ctest.sh`¶

ctest.sh: This is an extended ctest script extending the following features, besides what ctest supports:

Usage: [options]
--nohup          Run the test case(s) in the daemon mode for long benchmark 
--daemon         Run the test case(s) with daemonize for long benchmark with cleaning of environments before workload execution.  
--noenv          Clean any external environment variables before proceeding with the tests.    
--loop           Run the benchmark multiple times sequentially.
--run            Run the benchmark multiple times on the same SUT(s).  
--burst          Run the benchmark multiple times simultaneously.
--config         Specify the test-config file.  
--options        Specify additional validation backend options.  
--set            Set the workload parameter values during loop and burst iterations.  
--stop [prefix]  Kill all ctest sessions without prefix or kill specified session with prefix input as workload benchmark namespace name.
--continue       Ignore any errors and continue the loop and burst iterations.  
--prepare-sut    Prepare cloud SUT instances for reuse.
--reuse-sut      Reuse previously prepared cloud SUT instances. 
--cleanup-sut    Cleanup cloud SUT instances. 
--dry-run        Generate the testcase configurations and then exit.  
--testcase       Specify the exact testcase name to be executed.  
--attach <file>  Specify a file to be attached under the logs directory.  
--check-docker-image             Check image availability before running the workload.  
--push-docker-image <registry>   Push the workload image(s) to the mirror registry.   
--testset                        Specify a testset yaml file.
--describe-params                Show workload parameter descriptions.

Examples¶

Run aws test cases 5 times sequentially (loop):
```
./ctest.sh -R aws --loop=5 --nohup
```
Run aws test cases 5 times simultaneously (burst):
```
./ctest.sh -R aws --burst=5 --nohup
```
Run aws test cases 4 times simultaneously with the SCALE value incremented linearly as 1000, 1300, 1600, 1900 in each iteration:

... uses three previous values to deduce the increment
```
./ctest.sh -R aws --set "SCALE=1000 1300 1600 ...2000" --burst=4 --nohup
```
Run aws test cases 4 times simultaneously with the SCALE value incremented linearly as 1000, 1600, 1000, 1600 in each iteration:

... uses three previous values to deduce the increment

|200 means the values must be divisible by 200
```
./ctest.sh -R aws --set "SCALE=1000 1300 1600 ...2000 |200" --burst=4 --nohup
```
Run aws test cases 4 times simultaneously with the SCALE value incremented linearly as 1000, 1600, 2000, 1000 in each iteration:

... uses three previous values to deduce the increment

8000| means the values must be a factor of 8000
```
./ctest.sh -R aws --set "SCALE=1000 1200 1400 ...2000 8000|" --burst=4 --nohup
```
Run aws test cases 4 times simultaneously with the SCALE value incremented exponentially as 1000, 2000, 4000, 8000 in each iteration:

... uses three previous values to deduce the multiplication factor
```
./ctest.sh -R aws --set "SCALE=1000 2000 4000 ...10000" --burst=4 --nohup  
```
Run aws test cases 6 times simultaneously with the SCALE value enumerated repeatedly as 1000, 1500, 1700, 1000, 1500, 1700 in each iteration:
```
./ctest.sh -R aws --set "SCALE=1000 1500 1700" --burst=6 --nohup
```
Run aws test cases 6 times simultaneously with the SCALE and BATCH_SIZE values enumerated separately as (1000,1), (1500,2), (1700,4), (1000,8) in each iteration:

Values are repeated if needed.
```
./ctest.sh -R aws --set "SCALE=1000 1500 1700" --set BATCH_SIZE="1 2 4 8" --burst=6 --nohup
```
Run aws test cases 8 times simultaneously with the SCALE and BATCH_SIZE values permutated as (1000,1), (1000,2), (1000,4), (1000,8), (1500,1), (1500, 2), (1500, 4), (1500, 8) in each iteration:
```
./ctest.sh -R aws --set "SCALE=1000 1500 1700/BATCH_SIZE=1 2 4 8" --burst=8 --nohup
```
For cloud instances, it is possible to test different machine types by enumerating the <CSP>_MACHINE_TYPE values (<CSP> is Cloud Service Provider's abbreviation, e.g. AWS_MACHINE_TYPE or GCP_MACHINE_TYPE):
```
./ctest.sh -R aws --set "AWS_MACHINE_TYPE=m6i.xlarge m6i.2xlarge m6i.4xlarge" --loop 3 --nohup
```

For aws with specified:

type of disk

./ctest.sh -R aws --set "AWS_DISK_TYPE=io1 io2" --loop 2 --nohup

size of disk

./ctest.sh -R aws --set "AWS_DISK_SIZE=500 1000" --loop 2 --nohup

disk's IOPS

./ctest.sh -R aws --set "AWS_IOPS=16000 32000" --loop 2 --nohup

number of striped disks

./ctest.sh -R aws --set "AWS_NUM_STRIPED_DISKS=1 2" --loop 2 --nohup

Cloud SUT Reuse¶

It is possible to reuse the Cloud SUT instances during the benchmark process. This is especially useful in tuning parameters for any workload.

To reuse any SUT instances, you need to first prepare (provision) the Cloud instances, using the ctest.sh --prepare-sut command as follows:

./ctest.sh -R aws_kafka_3n_pkm -V --prepare-sut

The --prepare-sut command provisions and prepares the Cloud instances suitable for running the aws_kafka_3n_pkm test case. The preparation includes installing docker/Kubernetes and labeling the worker nodes. The SUT details are stored under the sut-logs-aws_kafka_3n_pkm directory.

Next, you can run any iterations of the test cases, reusing the prepared SUT instances with the --reuse-sut command, as follows:

./ctest.sh -R aws_kafka_3n_pkm -V --reuse-sut

If --reuse-sut is set, --burst is disabled.

Finally, to cleanup the SUT instances, use the --cleanup-sut command:

./ctest.sh -R aws_kafka_3n_pkm -V --cleanup-sut

SUT reuse is subject to the following limitations: - The SUT instances are provisioned and prepared for a specific test case. Different test cases cannot share SUT instances.
- It is possible to change workload parameters, provided that such changes do not: - The changes do not affect the worker node numbers.
- The changes do not affect the worker node machine types, disk storage, or network topologies.
- The changes do not affect worker node labeling.
- The changes do not introduce any new container images.

After using the Cloud instances, please clean them up.

Running Testcases using Testset YAML¶

You can specify a testset configuration file to sequentially run a few test cases with ctest.sh:

PLATFORM: SPR
BENCHMARK: dummy
TERRAFORM_OPTIONS: "--docker"
TERRAFORM_SUT: kvm
testcase: "test_kvm_dummy_pi_pkm"
options: "--sutinfo --intel_publish"
#test-config: "test-config.yaml"
SCALE: 2000

---

PLATFORM: ICX
BENCHMARK: dummy
TERRAFORM_OPTIONS: "--docker"
TERRAFORM_SUT: kvm
testcase: "test_kvm_dummy_pi_pkm"
options: "--sutinfo --intel_publish"
#test-config: "test-config.yaml"
SCALE: 2000

where two testcases are executed: - Supported cmake options: PLATFORM, BENCHMARK, REGISTERY, REGISTRY_AUTH, TIMEOUT, SPOT_INSTANCE, TERRAFORM_OPTIONS, and TERRAFORM_SUT.
- Supported ctest options: testcase, test-config, config, loop, burst, run, and options. - testcase and options can be either a string or a list of strings.
- If testcase starts with and ends with /, the testcase name is a regular expression. If testcase starts with !/ and ends with /, the testcase name is an anti regular expression. Otherwise, the testcase name is the exact name.
- Any other specified parameters are passed to ctest.sh via --set.

Executing Workload Terraform Options¶

Source: doc/user-guide/executing-workload/terraform-options.md

The Terraform validation backend runs any testcases in stages.

Stages¶

flowchart LR;
  provision{{Provision VMs}};;
  setup{{Setup VMs}};;
  exec{{Workload Execution}};;
  cleanup{{Restore/Cleanup}};;
  provision --> setup --> exec --> cleanup;;

CSP Provisioning: Terraform scripts are used to provision any CSP VMs. For on-premises clusters, this step is skipped.

See Terraform Configuration Parameters.
VM Setup and Workload Execution: Ansible scripts are used to install software and execute the workloads.

See Ansible Configuration Parameters.
Cleanup: Terraform and ansible scripts are used to restore the VM settings and to destroy the VMs. There is no configuration in this stage.

Terraform Configuration Parameters¶

You can configure the CSP resources during the terraform VM provisioning stage:

./ctest.sh --set AWS_ZONE=us-east-2 --set AWS_CUSTOM_TAGS=team=my,batch=test -R throughput -V

CSP Common Parameters¶

<CSP>_COMPARTMENT: Specify the compartment id of the Oracle Cloud resources.
<CSP>_CUSTOM_TAGS: Specify custom resource tags to be attached to any newly created CSP resources. The value should be a set of comma delimited key=value pairs, i.e., a=b,c=d,e=f.
<CSP>_REGION: Specify the CSP region value. If not specified, the region value will be parsed from the zone value.
<CSP>_RESOURCE_GROUP_ID: Specify the resource group id of the Alibaba* Cloud resources.
<CSP>_SPOT_INSTANCE: Specify whether to use the CSP spot instance for cost saving. The default value is true.
<CSP>_ZONE: Specify the CSP availability zone. The zone value must be prefixed with the region string.

VM Work Group Parameters¶

<CSP>_<workgroup>_CPU_CORE_COUNT: Specify the visible CPU core number.
<CSP>_<workgroup>_CPU_MODEL_REGEX: Specify a regular expression pattern that the SUT cpu model must match. The SUT instance will be replaced if there is a mismatch.
<CSP>_<workgroup>_INSTANCE_TYPE: Specify workgroup instance type. The instance type is CSP specific.
<CSP>_<workgroup>_MEMORY_SIZE: Specify the memory size in GB.
<CSP>_<workgroup>_MIN_CPU_PLATFORM: Specify the minimum CPU platform value for Google* Cloud compute instances. Replace any whitespace with %20. For example, use Intel%20Ice%20Lake to specify a minimum platform of Intel Ice Lake.

See GCP for possible values.
<CSP>_<workgroup>_NIC_TYPE: Specify the Google Cloud NIC type. Possible values: GVNIC or VIRTIO_NET. The default is GVNIC.
<CSP>_<workgroup>_OS_DISK_IOPS: Specify the OS disk I/O performance numbers in I/O per second.
<CSP>_<workgroup>_OS_DISK_SIZE: Specify the OS disk size in GB.
<CSP>_<workgroup>_OS_DISK_THROUGHPUT: Specify the I/O throughput in MB/s.
<CSP>_<workgroup>_OS_DISK_TYPE: Specify the OS disk type.

See AWS, GCP, Azure, Tencent, and AliCloud.
<CSP>_<workgroup>_OS_IMAGE: Specify the OS virtual machine custom image. If specified, the value will void OS_TYPE and OS_DISK values.
<CSP>_<workgroup>_OS_TYPE: Specify the OS type. Possible values: ubuntu2004, ubuntu2204, or debian11. Note that debian11 may not work on all CSPs.
where <workgroup can be any of worker, client, and controller.
<CSP>_<workgroup>_THERADS_PER_CORE: Specify the thread number per CPU core.

Data Disks Parameters¶

<CSP>_DISK_SPEC_<n>_DISK_COUNT: Specify the number of data disks to be mounted.
<CSP>_DISK_SPEC_<n>_DISK_FORMAT: Specify the data disk format as part of the disk_spec_<n> definition. The value depends on the OS image. ext4 is a common format.
<CSP>_DISK_SPEC_<n>_DISK_SIZE: Specify the data disk size in GB as part of the disk_spec_<n> definition.
<CSP>_DISK_SPEC_<n>_DISK_TYPE: Specify the data disk type as per CSP definition. Use the value local to use the instance local storage.

See AWS, GCP, Azure, Tencent, and AliCloud.
<CSP>_DISK_SPEC_<n>_DISK_IOPS: Specify the IOPS value of the data disks.
<CSP>_DISK_SPEC_<n>_DISK_PERFORMANCE: Specify the AliCloud performance level of the data disks.

See AliCloud.
<CSP>_DISK_SPEC_<n>_DISK_THROUGHPUT: Specify the I/O throughput value of the data disks.

See Azure.

Ansible Configuration Parameters¶

You can further configure the test parameters during the test execution as follows:

Use cmake -DTERRAFORM_OPTIONS= to define the TERRAFORM_OPTIONS options.

cmake -DTERRAFORM_OPTIONS="--docker --svrinfo --intel_publish" ..

Use ./ctest.sh --options= to add extra configurations to TERRAFORM_OPTIONS.

./ctest.sh --options="--docker --svrinfo --intel_publish" -R throughput -V

Common Parameters¶

docker_auth_reuse: Copy the docker authentication information to SUTs.
nosvrinfo/svrinfo: Disable/enable svrinfo SUT information detection.
svrinfo_options: Specify svrinfo options. Replace any whitespace in options with %20. The default is '' (no options).
nomsrinfo/msrinfo: Disable/enable msrinfo SUT information detection.
run_stage_iterations: Specify the number of iterations to repeat the workload exuections. The default is 1.
skopeo_insecure_registries: Specify a list of insecure docker registries (comma delimited). Any access to the registries will use http.
skopeo_sut_accessible_registries: Specify a list of docker registries (comma delimited) that SUT can directly access to. The workload images are not copied to the SUT assuming the SUT can directly pull the images.
terraform_delay: Specify the CSP provisioning retry delay in seconds, if any provision step failed. Default 10 seconds if terraform apply failed, or 0s if cpu model mismatched.
terraform_retries: Specify the retry times if cpu model mismatched. Default: 10.
wl_debug_timeout: Specify the debug breakpoint timeout value in seconds. The default is 3600.

SUT Parameters¶

sut_default_hugepagesz: When any hugepage is set, configure if the default hugepage size should be set. The default is false.
sut_kernel_args: Specify the list of additional kernel args to be applied on the SUT. The format is <group>:key=value[ key=value][,<group>:key=value...], where group is the SUT work group and key=value is the kernel arguments.
sut_kernel_modules: Specify additional kernel modules to be installed on the SUT. The format is <group>:module1 [module2...][,<group>:module...], where group is the SUT work group.
sut_reboot: Specify whether the SUT is allowed to be rebooted. The default is true.
sut_sudo: Specify whether the SUT is allowed to use sudo. The default is true.
sut_sysctl_options: Specify additional sysctl options to be set on the SUT. The format is <group>:key=value[ key=value][,<group>:key=value...], where group is the SUT work group and key=value is the sysctl setting.
sut_sysfs_options: Specify additional sysfs options to be set on the SUT. The format is <group>:key=value[ key=value][,<group>:key=value...], where group is the SUT work group and key=value is the sysfs seting.

Containerd Parameters¶

containerd_data_root: Specify the containerd data root directory. The default is /var/lib/containerd.
containerd_pause_registry: Specify the containerd pause image registry prefix. The default is registry.k8s.io.
containerd_version: Specify the containerd version. The default is Ubuntu:1.6.15, CentOS:1.6.10, or Debian:1.6.15.
containerd_reset: Reset and reinstall containerd. The default is false.
containerd_insecure_registry: Specify the list of insecure registries, separated with comma.

Docker Parameters¶

compose: Enable workload docker-compose execution.
docker: Enable workload docker execution.
docker_compose_plugin_version: Specify the docker compose plugin version. The default is 2.18.1.
docker_data_root: Specify the docker daemon data root directory. The default is /var/lib/docker.
docker_dist_repo: Specify the docker repository URL. The default is https://download.docker.com.
docker_version: Specify the docker version. The default is 20.10.23.
native: Enable workload native execution over docker image.

Kubernetes Parameters¶

k8s_cni: Specify the Kubernetes CNI. The default is flannel.
k8s_apiserver_ip: Specify the kubernetes api server ip. The default is controller's private_ip of terraform applied outputs.
k8s_apiserver_port: Specify the kubernetes api server port. The default is 6443.
k8s_calico_encapsulation: Specify the Calico CNI overlay networking. The default is VXLAN.
k8s_calico_version: Specify the Calico CNI version. The default is v3.25.
k8s_calico_mtu: Specify the Specify MTU, value can be 1500 or 9000. Default is 1500.
k8s_calicoctl_version: Specify the Calico CNI operator version. The default is v3.25.
k8s_calicovpp_version: Specify the Calicovpp operator version. THe default is v3.25.
k8s_calicovpp_buffer_data_size: Specify Calico-vpp data-size buffer in Calicovpp configuration. The default is 2048.
k8s_calicovpp_cores: Specify how many CPU cores will be used for the l3fwd and calicovpp pod, respectively. Default is 1.
k8s_calicovpp_dsa_enable: Specify testing mode, value can be true, false for DSA memif, SW memif testing. Default is true.
k8s_delete_namespace_timeout: Specify the timeout value when deleting the Kubernetes namespace. The default is 10m (10 minutes).
k8s_enable_registry: Install a docker registry within the Kubernetes cluster to serve the workers. The workload images are copied to the docker registry. The default value is true.
k8s_flannel_version: Specify the flannel CNI version. The default is v0.21.5.
k8s_istio_install_dist_repo: Specify the istio distribution repository. The default is https://istio.io/downloadIstio.
k8s_istio_version: Specify the istio version. The default is 1.15.3.
k8s_nfd_registry: Specify the NFD image repository. The default is k8s.gcr.io/nfd.
k8s_nfd_version: Specify the NFD version. The default is v0.13.2.
k8s_pod_cidr: Specify the kubernetes pod subnet. The default is 10.244.0.0/16.
k8s_registry_port: Specify the in-cluster registry port. The default is 20668.
k8s_install: If True, force Kubernetes installation playbook to be run. Default False. Images for upload should be defined using wl_docker_images in validate.sh and passed as a string with , separator using TERRAFORM_OPTIONS.

Example: TERRAFORM_OPTIONS="${TERRAFORM_OPTIONS} --wl_docker_images=${REGISTRY}image-name-1${RELEASE},${REGISTRY}image-name-2${RELEASE}"

k8s_reset: Reset Kubernetes, if detected, and reinstall Kubernetes. The default is false.
k8s_service_cidr: Specify the kubernetes service subnet. The default is 10.96.0.0/12.
k8s_version: Specify the Kubernetes version. The default is 1.26.6.
k8s_plugins: Specify a list of additonal Kubernetes devices plugins, supported options are nfd. The default is None.

Trace Module Parameters¶

collectd: Enable the collectd tracer.
collectd_interval: Specify the collectd sample time interval. The default is 10 seconds.
emon: Enable the emon tracer.
emon_post_processing: Specify whether to enable/disable Emon post-processing. The default is true.
emon_view: There are 3 optional views to be selected --socket-view --core-view --thread-view you can select one or more of them or use --no-detail-views to just generate the summary by default system/core/thread views are generated. Replace any white space in options with %20. The default is --socket-view%20--core-view%20--thread-view; also can use emon_view="" to just generate the summary by system.
gprofiler: Enable the gprofiler tracer.
gprofiler_image: Specify the gprofiler docker image. The default is docker.io/granulate/gprofiler.
gprofiler_options: Specify the gprofiler options. Replace any white space in options with %20. The default is --profiling-frequency=11%20--profiling-duration=2.
gprofiler_version: Specify the gprofiler version. The default is latest.
perf: Enable the perf tracer.
perf_action: Specify the perf action. The default is record.
perf_collection_time: Specify the perf record time. The default is infinity.
perf_fetch_data: Specify whether to retrieve the raw perf record data back to the logs directory. The default is false.
perf_flamegraph: Specify whether to generate flamegraph during post-processing. The default is false.
perf_flamegraph_collapse_options: Specify the flamegraph collapse command options. Replace any whitespace in options with %20. The default is --all.
perf_flamegraph_svg_options: Specify the flamegraph generation options. Replace any whitespace in options with %20. The default is --color=java%20--hash.
perf_record_options: Specify the perf record command options. Replace any whitespace in options with %20. The default is -a%20-g.
perf_stat_options: Specify the perf record command options. Replace any whitespace in options with %20. The default is -a%20-I%20500%20-e%20cycles%20-e%20instructions.
perf_script_options: Specify the perf script command options. Replace any whitespace in options with %20. The default is `` (no options).
sar: Enable the sar tracer.
sar_options: Specify the sar command line options. Replace any whitespace in options with %20. The default is -B%20-b%20-d%20-p%20-H%20-I%20ALL%20-m%20ALL%20-n%20ALL%20-q%20-r%20-u%20ALL%20-P%20ALL%20-v%20-W%20-w%205.

Instance Watch Parameters¶

The instance watch feature monitors a SUT instance uptime and CPU utilization. Best for managing Cloud VM instances. If the uptime of the SUT instance exceeds a threshold and then the CPU load is consequtively measured to be low, the instance will be automatically shutdown (powered off).

instance_watch: Enable/disable instance watch. The default is false.
instance_watch_cpu_load: Specify the CPU load in percentage. The instance is considered low utilization if the load is below the threashold. The default is 10.
instance_watch_cpu_load_count: Specify the number of times that the CPU load must be consequtively below the threshold before considering the CPU as low utilized. The default is 3.
instance_watch_cpu_load_span: Specify the time between two CPU load measures. The default is 15s.
instance_watch_interval: Specify the instance uptime watch interval. The default is 30m.
instance_watch_shutdown_postpone: Specify the shutdown postpone time in minutes. The default is 30.
instance_watch_uptime: Specify the maximum instance uptime in minutes. The default is 360, i.e., 6 hours.

Executing Workload Testcase¶

Source: doc/user-guide/executing-workload/testcase.md

Each workload defines a set of testcases, i.e., typical usage scenarios. Internally, each testcase encapsulates default values of workload configuration parameters. For example, the dummy workload's testcase dummy_pi_pkm calculates pi (π) to the 2000 digits.

Naming patterns¶

You can list the testcases of a workload as follows:

cd build/workload/<WORKLOAD>
./ctest.sh -N

Testcases typically follow the pattern of test_[<sut>_]<workload>_<testcase>, where <sut> is optional and specific to the validation backend setting. The testcase name should be descriptive for the workload usage conditions.

For example, use _1n to indicate that the workload runs on a single worker node, and _3n to indicate that the workload runs on 3 worker nodes.

Special Test Cases¶

The following testcase suffixes are reserved: - _gated: A testcase suffixed with _gated is designed for CI commit validation. The testcase is expected to be a quick test of the workload software stack. To improve CI efficiency, design the testcase such that the workload completes within 5 minutes. - _preswa: A testcase suffixed with _preswa is designed for Pre-Si performance analysis. The testcase is expected to run with reduced complexity such that the workload can complete in a reasonable time in Simics and still represent the main characteristics of the full workload execution. The workload must define EVENT_TRACE_PARAMS and PRESWA_POI_PARAMS to enable windowed emon trace collection. - _pkm: A testcase suffixed with _pkm is designed for Post-Si performance analysis. The testcase is expected to represent the common use case of the workload. If the _pkm testcase runs relatively long (say, more than 10 minutes), the workload must define EVENT_TRACE_PARAMS to enable windowed emon trace collection. Compared with _preswa, there is no complexity restriction in _pkm. Each workload should have at least 1 _pkm testcase and no more than 2 _pkm testcases. - _pdt, _pnp: A testcase suffixed with _pnp or _pdt indicates that the test case contains recipes and configurations that have been approved by either the Performance PDT or the PNPJet forum. The software recipe versions and the workload configurations are frozen. This is designed for the users to repeat any previously approved performance data.

Testcase Configurations¶

To improve usability, define the configuration parameters of a testcase such that the workload can run on any typical platforms. Here the typical platform is defined as being the AWS xlarge instance equivalent, which includes 4 virtual cores and 16GB of memory. If a workload must use more cores or memory, the workload must declare the minimum platform requirement in cluster-config.yaml.m4 and README.md.

In addition to the testcases defined in CMakeLists.txt, a workload may define additional testcase configuration files to customize the testcase parameters for reproducing performance data in the workload performance report.

The testcase configuration files use the following convention: test-config-<sut|arch>[-<identifier>].yaml, where: - <sut|arch> defines the target platform. For Cloud instances, <sut> is usually the Cloud provider name, matching one of the validation backend configurations, for example, aws will be matched against terraform-config.aws.tf. For On-Premises platforms, <sut> specifies the platform architecture, for example, icx or spr. - <identifier> is an optional string to describe the purpose of the testcase overwrite, if there are multiple testcase configurations.

The testcase configuration for below mentioned file overwrites the dummy workload testcases:

# test-config-aws.yaml
# Overwrite the test cases for performance tunning on AWS instances.

*_pi_pass:
    SCALE: 5000
    AWS_MACHINE_TYPE: m6i.x4large

*_pi_fail:
    SCALE: 5000
    AWS_MACHINE_TYPE: m6i.x4large

Please note that it is important to include the Cloud instance type as part of the overwrite to completely describe the running environment.
Things become complicated when it comes to define an On-Premises platform setup. Please describe the complete setup as comments at the top of the configuration file.

Preparing Infrastructure Setup Ai¶

Source: doc/user-guide/preparing-infrastructure/setup-ai.md

Overview¶

This document is a guide for using AI workload.

AI Workload Naming¶

AI workload naming following this format: \<Model Name>-\<Framework>-[Platform][-Additions] e.g. "3DUNet-OpenVINO-MLPerf". [Platform] will be ignored if Platform="Xeon".

Model Name: Specify AI deep learning Model Name
Framework: Specify Framework used: TensorFlow, PyTorch ,OpenVINO (Intel CPU only) or TensorRT(Nvidia GPU only).
Platform: Specify Platform based: `(empty for Intel Xeon CPU),EPYC(AMD EPYC CPU),ARMv8/ARMv9(AWS Graviton based CPU),Nvidia(Nvidia GPU) orInferentia` (AWS inference accerlator card)
Additions: Specify Additions information for workload

AI Test Case Naming¶

\<WL name>_inference_throughput_gated
\<WL name>_inference_throughput_pkm
\<WL name>_inference_latency
\<WL name>_inference_accuracy
\<WL name>_training_throuphput
\<WL name>_training_accuracy

Configuration:¶

AI workload can be run on BareMetal and Cloud VM both. AWS, GCP and Azure cloud have been suggested to use.

Suggested cloud instance type:

Intel ICX:¶

AWS cloud: m6i
GCP cloud: n2-highmem-96
Azure cloud: Dv5-series

AMD Milan:¶

AWS cloud: m6a
GCP cloud:
Azure cloud: Dasv5 and Dadsv5-series

AMD Roma:¶

AWS cloud: m5a
GCP cloud:
Azure cloud: Dav4 and Dasv4-series

AWS Graviton2:¶

AWS cloud: m6g

AWS Graviton3:¶

AWS cloud: c7g

AWS Inferentia:¶

AWS cloud: inf

Nivida GPU:¶

AWS cloud: g4dn (T4)

Best Configuration:¶

For ICX platforms based AI workload: Tuning Guide for Deep Learning
For SPR platforms based AI workload: Tuning Guide for Deep Learning

Restriction¶

N/A

Node Labels¶

Setup the following node labels for AI workloads:

HAS-SETUP-BKC-AI=yes: Optional.

KPI output¶

KPI output example:

#================================================
#Workload Configuration
#================================================
##FRAMEWORK: PyTorch 1.13.0a0+gitd7607bd
##MODEL_NAME: DLRM
##MODEL_SIZE: 89137319
##MODEL_SOURCE: Facebook
##DATASET: Criteo 1TB Click Logs (terabyte)
##FUNCTION: inference
##MODE: throughput
##PRECISION: avx_fp32
##DATA_TYPE: real
##BATCH_SIZE: 1
##STEPS: 1
##INSTANCE_NUMBER: 2
##CORES_PER_INSTANCE: 56
#================================================
#Application Configuration
#================================================
##SCENARIO: offline
##SERVING_STACK: -
##MODEL_WORKERS: -
##REQUEST_PER_WORK: -
#================================================
#Metrics
#================================================
Average Throughput (samples/sec): 27168.18
Max Latency (ms): -1
Min Latency (ms): -1
Mean Latency (ms): 4.33
P50 Latency (ms): -1
P90 Latency (ms): -1
P95 Latency (ms): -1
P99 Latency (ms): -1
P999 Latency (ms): -1
TTT: -1
Samples: -1
Compute Utilization: -1
Memory Utilization: 89.79 GB
FLOPs: -1
Model Quality Metric Name: -1
Model Quality Value: -1
Cost Per Million Inferences: -1
#================================================
#Key KPI
#================================================
*Throughput (samples/sec): 27168.18

NOTE: Make sure gprofiler telemetry data is accurate. You need to use the _pkm case or set a larger STEPS.

Preparing Infrastructure Setup Auth¶

Source: doc/user-guide/preparing-infrastructure/setup-auth.md

A private docker registry is optional in most of the validation scenarios except if you want to run the workloads on an On-Premises Kubernetes cluster, or you explicitly setup a docker registry to store any newly built workload images.

This document describes how to authenticate to a docker registry if the registry requires authentication. Skip this document if there is no authentication requirement.

`REGISTRY_AUTH`¶

The cmake REGISTRY_AUTH option specifies how to authenticate to a private docker registry. Currently, docker and pass are the only supported values, which uses the docker authentication mechanism or docker-credential-pass with pass as password store.

Docker¶

Enable the REGISTRY_AUTH option:

cmake -DREGISTRY=<url> -DREGISTRY_AUTH=docker ..

With the above command, the validation scripts will upload the docker authentication information specified in .docker/config.json as a Kubernetes imagePullSecret to the validation cluster, On-Premises or in Cloud.

CredHelpers or CredStore in .docker/config.json is not supported.

Pass¶

Enable the REGISTRY_AUTH option:

cmake -DREGISTRY=<url> -DREGISTRY_AUTH=pass ..

Version for both pass and docker-credential-pass binaries is read automatically. With the above command, the validation scripts will make the pass and gpg session available on execution machine.

Authenticate to Cloud Private Registry¶

Amazon Elastic Container Registry¶

make aws
$ aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region>.amazonaws.com
$ exit

Note that the build script will auto-create the image repository namespaces.

Google Cloud Container Registry¶

make gcp
$ gcloud auth activate-service-account --key-file <key-file>
$ gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin <registry-url>
$ exit

Note that the Oauth2 access token will expire in an hour.

Azure Container Registry:¶

make azure
$ az acr login --name <registry-name> --expose-token --output tsv --query accessToken | docker login -username 00000000-0000-0000-0000-000000000000 --password-stdin <registry-url>
$ exit

Preparing Infrastructure Setup Bios¶

Source: doc/user-guide/preparing-infrastructure/setup-bios.md

Introduction¶

This code provides experimental support to change BIOS settings on SUT machines. The support is based on syscfg for most production systems.

As BIOS update can potential cause irreversible damage, you must explicitly enable the --sut_update_bios option and the --sut_reboot option for any BIOS related operations.

Probe BIOS Versions and Knobs¶

The BIOS knobs differ by different BIOS versions, which makes it difficult to provide universal BIOS setup. To probe the BIOS version on your SUT systems, do the following:

./ctest.sh -R <testcase> -V --options='--sut_update_bios --sut_reboot --sut_bios_probe'

After running the testcase, the current BIOS setup is saved under the test case logs directory under worker-0-bios/syscfg.ini, where you can obtain the BIOS version and the BIOS knobs.

`cluster.bios`¶

As a workload developer, you can configure your workload to mandate certain BIOS settings, such as setting HyperThreading to be Disabled.

Write such configurations in cluster-config.yaml.m4:

- labels: {}
  bios:
    SE5C620.86B:
      "Intel(R) Hyper-Threading Tech": Enabled          # Disabled
      "CPU Power and Performance Policy": Performance   # "Balanced Performance", "Balanced Power", or "Power"
    EGSDCRB1.86B:
      ProcessorHyperThreadingDisable: "ALL LPs"         # "Single LP"
      ProcessorEppProfile: Performance                  # "Balanced Performance", "Balanced Power", or "Power"

where SE5C620.86B and EGSDCRB1.86B are BIOS version substrings. If there are multiple matches, the longest match takes precedent (according to descendent string sort.)

The BIOS knobs such as Intel(R) Hyper-Threading Tech and their values Enabled must exactly match the BIOS definitions.

Global Options¶

You can overwrite any BIOS settings globally as follows:

./ctest.sh -R <testcase> -V --options="--sut_reboot --sut_update_bios --sut_bios_tool=syscfg --sut_bios_options=worker:ProcessorEppProfile=Performance"

where --sut_bios_options is a list of BIOS group:key=value string. Use , to separate the list items and %20 to escape any whitespaces.

Preparing Infrastructure Setup Cdn¶

Source: doc/user-guide/preparing-infrastructure/setup-cdn.md

This document is a guide for setting up CDN benchmark environment, including Hardware platform and Software configuration on network, storage and QAT.

HW Prerequisites¶

Setup 2 or 3 servers:
3-node: one client node; 2 CDN servers: worker-1, worker-2.
2-node: one client node; 1 CDN server: worker-1.
All servers support at least 100G network bandwidth, e.g. 1x 100G NIC
Connect all servers through a switch with at least 100G network capacity.
CDN server requires 4 NVMe disks, each has at least 1.8T size capacity. And it's better to support PCIe Gen4 x4 width.
Please consider NUMA balance for NVMe drive and NIC setup, this is important for performance tests.

flowchart TD;
  subgraph 3-node;
  subgraph Server_cluster;
  worker_1;
  worker_2;
  end

  100G_switch[[100G_switch]]-.-Client_Node;
  100G_switch[[100G_switch]]-.-worker_1;
  100G_switch[[100G_switch]]-.-worker_2;

  end

flowchart TD;
  subgraph 2-node;
    subgraph Server_cluster;
    worker_1;
    end

  100G_switch[[100G_switch]]-.-Client_Node;
  100G_switch[[100G_switch]]-.-worker_1;
  end

OS configuration¶

Install Ubuntu 22.04 server-version or latest version on CDN server.
Check the NVMe driver and NIC driver are all loaded and setup fine.
Setup network proxies if needed and append server (e.g. 192.168.2.200) and client (e.g. 192.168.2.100) 100G NIC IP to your no_proxy on client and server.

K8S Labels configuaration¶

Please finish the section Network configuration, Storage configuration, or QAT hardware configuration, then label the corresponding nodes.

Command examples:

Label:

kubectl label node node_name HAS-SETUP-NIC-100G=yes

Unlabel:

kubectl label node node_name HAS-SETUP-NIC-100G-

CDN server worker-1:

For ICX,

HAS-SETUP-DISK-SPEC-1=yes
HAS-SETUP-NIC-100G=yes

For SPR,

HAS-SETUP-DISK-SPEC-1=yes
HAS-SETUP-NIC-100G=yes
HAS-SETUP-QAT=yes
HAS-SETUP-HUGEPAGE-2048kB-4096=yes

CDN server worker-2:

HAS-SETUP-NIC-100G=yes

Network configuration¶

Specify 100G IP for servers. These are defined in validate.sh, please pass the real IP as parameters before testing.

client	worker-1	worker-2
192.168.2.100	192.168.2.200	192.168.2.201

modify in validate.sh

NICIP_W1=${NICIP_W1:-192.168.2.200}
NICIP_W2=${NICIP_W2:-192.168.2.201}
NICIP_W1="real IP of worker-1"
NICIP_W2="real IP of worker-2"

or pass with ctest.sh

./ctest.sh --set NICIP_W1="real IP" NICIP_W2="real IP" ...

Test the network speed after setting up
On worker-1
```
iperf -s
```
On client node
```
iperf -c 192.168.2.200 -P 4
```

Storage configuration¶

This should be done on worker-1.

Prepare cache disk for cache-nginx pod. nvme?n1 means repeat 4 times for 4 disks.
Check NVMe drives and Partition drives
```
ls /dev/nvme*
```
```
/dev/nvme?n1
```
Create a primary partition /dev/nvme?n1p1
- If disk is lower than 2 TB
```
sudo fdisk /dev/nvme?n1
```
- If disk size is higher than 2 TB
```
sudo parted /dev/nvme?n1
```
Change drive attributes
```
sudo chown nobody /dev/nvme?n1p1
```
Format drives as ext4 (or xfs):
```
mkfs.ext4 -F /dev/nvme?n1p1
```

Create cache mountpoints and mount to four pairs

mkdir /mnt/disk1 /mnt/disk2 /mnt/disk3 /mnt/disk4
mount -o defaults,noatime,nodiratime /dev/nvme?n1p1 /mnt/disk?

Add below content into /etc/fstab to auto-mount after reboot

/dev/nvme?n1p1 /mnt/disk? ext4 rw,noatime,seclabel,discard 0 0

Modify storage IO schedule method from default mq-deadline to none on
```
echo none > /sys/block/nvme?n1/queue/scheduler
```
Check the partition status
```
sudo fdisk -l /dev/nvme*n*
```

QAT hardware configuration¶

Set up QAT Hardware for SPR worker-1, please refer to setup-qat-in-tree.

Monitor runtime performance¶

Use sar to monitor runtime network interface performance

  sar -n DEV 3 -h   # probe every 3s

Use iostat to monitor drive IO performance.

  iostat 5    # probe every 3s

Others¶

Install Intel E810-C CVL Ethernet Adaptor Driver

Confirm the NIC model, pls run below command line:

lspci | grep Eth
17:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02)
17:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02)
4b:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
4b:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)

In this environment, Intel 100G E810-C NIC is used for CDN NGINX testing.

Install the kernel development package

To compile the driver on some kernel/arch combinations, you may need to install the kernel development package which has the same version with kernel. you can firstly try to install with:
```
sudo apt-get install linux-headers-$(uname -r)
```
- Intel E810 series devices Ethernet Adapter Driver Installation - Download the latest E810 series devices firmware update from https://www.intel.com/content/www/us/en/download/19626/non-volatile-memory-nvm-update-utility-for-intel-ethernet-network-adapters-e810-series-linux.html. - Download the latest E810 series devices driver from https://www.intel.com/content/www/us/en/download/19630/intel-network-adapter-driver-for-e810-series-devices-under-linux.html. - Build and install the NIC driver:
```
tar xvfz ice-1.6.7.tar.gz
cd ice-1.6.7/src
make clean
make
make install
rmmod ice; modprobe ice
```

Preparing Infrastructure Setup Containerd¶

Source: doc/user-guide/preparing-infrastructure/setup-containerd.md

Starting Kubernetes v1.20, Kubernetes deprecated docker as a runtime and used containerd instead. It is a prerequisite to install containerd before installing Kubernetes.

Installation¶

Install containerd from your OS packages:

apt-get install containerd       # Ubuntu or Debian
yum install containerd           # Centos

Setup Proxy¶

sudo mkdir -p /etc/systemd/system/containerd.service.d
printf "[Service]\nEnvironment=\"HTTP_PROXY=$http_proxy\" \"HTTPS_PROXY=$https_proxy\" \"NO_PROXY=$no_proxy\"\n" | sudo tee /etc/systemd/system/containerd.service.d/proxy.conf
sudo systemctl daemon-reload
sudo systemctl restart containerd

Setup Configuration Files¶

containerd config default | sudo tee /etc/containerd/config.toml
sed -i 's/SystemdCgroup = .*/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

Setup Insecure Registries¶

On-Premises workload validation based on Kubernetes requires to use a docker registry. If you need to setup any insecure registries with containerd, modify the containerd configuration as follows, assuming your private registry is foo.com:5000:

sudo sed -i 's|config_path =.*|config_path = "/etc/containerd/certs.d"|' /etc/containerd/config.toml
sudo mkdir -p /etc/containerd/certs.d/foo.com:5000
cat | sudo tee /etc/containerd/certs.d/foo.com:5000/hosts.toml <<EOF
server = "http://foo.com:5000"
[host."http://foo.com:5000"]
  capabilities = ["pull", "resolve"]
[plugin."io.containerd.grpc.v1.cri".registry.configs."foo.com:5000".tls]
  insecure_skip_verify = true
EOF 
sudo systemctl restart containerd

On-Premises workload validation based on RKE2 requires to use a docker registry. If you need to setup any insecure registries with containerd, modify the rke2 configuration as follows, assuming your private registry is foo.com:5000:

cat | sudo tee /etc/rancher/rke2/registries.yaml <<EOF
mirrors:
  "foo.com:5000":
    endpoint:
      - "http://foo.com:5000"
EOF 
sudo systemctl restart rke2-server

Setup Data Storage¶

Optionally, if you need to move the containerd storage location to, for example, /mnt/storage/containerd:

sed -i 's|^root =.*|root = "/mnt/storage/containerd"|' /etc/containerd/config.toml
sudo systemctl restart containerd

Preparing Infrastructure Setup Dataset¶

Source: doc/user-guide/preparing-infrastructure/setup-dataset.md

The dataset labels are used to check if certain dataset is available on the host. Certain dataset are installed automatically if they are missing from the host.

Assume the label is HAS-SETUP-DATASET-BERTLARGE-V2, the dataset is assumed to be available under /opt/dataset/bertlarge/v2.

Preparing Infrastructure Setup Disk Avail¶

Source: doc/user-guide/preparing-infrastructure/setup-disk-avail.md

The disk-avail labels are used to probe if the workload minimum disk availability threshold is met.

There are three label formats:

HAS-SETUP-DISK-AVAIL-192: This requires that a minimum of 192GB disk availability under either /tmp (native workloads), /var/lib/docker (docker/compose workloads), /var/lib/kubelet (Kubernetes workloads), or C (windows workloads).
HAS-SETUP-DISK-AVAIL-192-MNT-DISK1: This requires that a minimum of 192GB disk availability under /mnt/disk1. You can specify any disk path with / replaced with -.
HAS-SETUP-DISK-AVAIL-192-D: This requires that a minimum of 192GB disk availability under the drive letter D. This is specific to Windows.

Preparing Infrastructure Setup Dlb¶

Source: doc/user-guide/preparing-infrastructure/setup-dlb.md

DLB is supported in certain SPR SKUs. Please make sure your CPU sku (QDF) supports DLB.

And DLB is broken with latest BKC kernel 5.15.0-spr.bkc.pc.2.10.0.x86_64. Before running the workload, please make sure your SPR has DLB device by running the following command:

lspci | grep 2710

If there are devices listed, then please download the DLB driver from this link: https://www.intel.com/content/www/us/en/download/686372/intel-dynamic-load-balancer.html

Execute the following commands:

tar -xf dlb_linux_src_release_<dlb_driver_version>.txz
cd dlb/driver/dlb2/
make
sudo insmod dlb2.ko

Then you can run the workload on this machine.

Preparing Infrastructure Setup Docker¶

Source: doc/user-guide/preparing-infrastructure/setup-docker.md

Introduction¶

The docker validation backend runs any workloads locally on the development host, either through multiple docker sessions or through docker-compose.

Execute a Workload¶

Switch to the docker validation backend as follows:

cd build
cmake -DBACKEND=docker ..     # Select the docker backend

Then test any workload as follows:

cmake -DBENCHMARK=dummy ..                      # Select the dummy workload
./ctest.sh -N                                   # List all testcases
./ctest.sh -R _pkm -V                           # Run the _pkm testcase
./list-kpi.sh workload/dummy/logs-dummy_pi_pkm  # Show the KPIs

Setup arm64 Emulation¶

You can setup the development host as an arm64 emulator. To do so, run the setup.sh script:

script/march/setup.sh

Preparing Infrastructure Setup Hugepage¶

Source: doc/user-guide/preparing-infrastructure/setup-hugepage.md

Workloads that require to use hugepages must specify a HAS-SETUP-HUGEPAGE label in the format of HAS-SETUP-HUGEPAGE-<size>-<pages>, where <size> is the hugepage size and <pages> is the #pages required. The <size> value must exactly match the string, case sensitive, of the hugepage sizes supported under /sys/kernel/mm/hugepages. For example, to request 1024 pages of 2MB hugepages, use HAS-SETUP-HUGEPAGE-2048kB-1024.

If setting the default hugepage size is required, append -DEFAULTSZ to the label name. For example, HAS-SETUP-HUGEPAGE-2048kB-1024-DEFAULTSZ.

Node Labels¶

To avoid creating a lot of node labels, it is recommended to specify #pages only in the power 2 values. Label the worker node(s) with the following node labels:
- HAS-SETUP-HUGEPAGE-2048kB-512=yes Optional
- HAS-SETUP-HUGEPAGE-2048kB-1024=yes Optional
- HAS-SETUP-HUGEPAGE-2048kB-2048=yes Optional
- HAS-SETUP-HUGEPAGE-2048kB-4096=yes Optional

System Setup¶

Hugepage is setup automatically once the labels are in place. If for any reason you need to setup hugepages manually, setup hugepages through the kernel boot parameters, as follows:

sudo grubby --update-kernel=DEFAULT --args="hugepages=1024"

Then reboot the machine for the hugepages to take effect.

For Ubuntu, you need to edit sudo vi /etc/default/grub by adding the number of huge pages to GRUB_CMDLINE_LINUX, like this:

GRUB_CMDLINE_LINUX="hugepages=1024"

Then you need to do sudo update-grub and reboot.

To verify changes you can use this cat /proc/meminfo | grep Huge.

Kubernetes only recognizes hugepages if they are preallocated through boot parameters.

Preparing Infrastructure Setup Hyperv¶

Source: doc/user-guide/preparing-infrastructure/setup-hyperv.md

Introduction¶

This document describes the Hyper-V host setup and how to create Windows/Linux VM images. The VM images can be used as OS images in the terraform backend and the hyperv SUT type.

Setup Hyper-V Host¶

On the Hyper-V host, - Turn on the Virtualization option in BIOS on the Hyper-V host.
- If the Hyper-V host is Windows Server 2003, set the IPEnableRouter parameter to 1 in the registry key HKLM\SYSTEM\ControlSet001\Services\Tcpip\Parameters.
- Follow the instructions to enable WinRM (https) on the Hyper-V host. In addition, change the WinRM authentication method to be basic:

Set-WSManInstance WinRM/Config/Service/Auth -ValueSet @{Basic = $true}
Set-WSManInstance WinRM/Config/Service/Auth -ValueSet @{Negotiate = $false}
Set-Item -Path WSMan:\localhost\Service\MaxConcurrentOperationsPerUser -Value 4294967295
Set-Item -Path WSMan:\localhost\Service\MaxConnections -Value 4294967295

Create a few directories:
C:\Users\Public\Documents\Hyper-V\osimages: Store any pre-downloaded OS images. To save time, you can predownload Windows and Linux OS images into this directory:
- windows-server-2022-gen1.vhdx: See Prepare Windows OS Image.
- ubuntu-server-2204.vhdx: See Prepare Linux OS Image.
C:\Users\Public\Documents\Hyper-V\runs: Store any run-time disks and VM instances.
C:\Users\Public\Documents\Hyper-V\scripts: Store any run-time powershell scripts.

On the Linux dev machine, - Modify /etc/environment to add the HyperV host IP address to the no_proxy list. Source /etc/environment for the current ssh session. - Create a configuration file under script/csp/.hyperv/config.json with content as follows:

{
  "winrm_user": "Administrator",
  "winrm_password": "xyzxyz"
}

Execute Workloads¶

Modify script/terraform/terraform-config.hyperv.tf to specify the Hyper-V host IP address and winrm port.
Configure TERRAFORM_SUT=hyperv to use the hyper-v SUT type.

cd build
cmake -DTERRAFORM_SUT=hyperv -DBENCHMARK=<workload> ..
make
./ctest.sh -R _pkm -V

Trouble Shoot¶

If you need to login to the VM instances, - For Linux VM instances, use ./debug.sh to go into the terraform container, and then use ssh -i ssh_access.key -p <port> <user>@<ip> to login to the Linux VM instances, where <port>, <user>, and <ip> are VM instance information available in the inventory.yaml file.
- For Windows VM instances, set a temporary admin password by changing ${password} in script/terraform/template/terraform/hyperv/local/template/cloud-init.windows.tpl. Then you can connect to the Windows VM instances from the Windows HyperV manager. Please restore cloud-init.windows.tpl after trouble shooting.

users:
  -
    name: Administrator
    primary_group: Administrators
    passwd: '${password}'

- To test WinRM connections, use the following command:

winrm identify -r:https://127.0.0.1:5986 -auth:basic -u:<user> -p:<passwd> -encoding:utf-8 -skipCAcheck -skipCNcheck

Advanced Topic: Create Custom OS Images¶

If for any reason, you need to create custom OS images, use the following sections to create Windows/Linux OS Images.

Create Windows VM OS Images¶

Use the following steps to create Windows OS images:

Download the Windows Server 2022 iso image from the Microsoft* evaluation center.
Launch the Windows VM using the Hyper-V manager with the following configurations:
Use Generation-1.
Create a disk image windows-server-2022.vhdx of about 11MB in size.
Create a DVD drive with the content from the Windows Server 2022 iso.
During the setup, select Windows Server 2022 Standard Evaluation (Desktop Experience):

After the OS setup, continue the VM setup:
Set execution policy unrestricted with powershell:

Set-ExecutionPolicy Unrestricted

Use diskpart to delete any system recovery partition, and use Computer Management->Disk Management to expand disk0 to the full extent. This is critical so that the disk can be expanded later to other sizes.

diskpart
> select disk 0
> select partition 3
> delete partition override

Download cloudbase-init:

Invoke-WebRequest -Uri https://github.com/cloudbase/cloudbase-init/releases/download/1.1.4/CloudbaseInitSetup_1_1_4_x64.msi -OutFile CloudbaseInitSetup_1_1_4_x64.msi -Proxy http://proxy-dmz.intel.com:911

Install cloudbase-init. During the installation, change the username to Administrator at this screen, and then click Install to install cloudbase-init. Pause to modify a few files in the next step, before proceeding to the final screen.

Modify c:\Program Files\Cloudbase Solutions\Cloudbase-init\conf\cloudbase-init-unattended.conf:

metadata_services=cloudbaseinit.metadata.services.nocloudservice.NoCloudConfigDriveService,cloudbaseinit.metadata.services.base.EmptyMetadataService
plugins=cloudbaseinit.plugins.common.sethostname.SetHostNamePlugin,cloudbaseinit.plugins.windows.createuser.CreateUserPlugin,cloudbaseinit.plugins.windows.extendvolumes.ExtendVolumesPlugin,cloudbaseinit.plugins.windows.winrmlistener.ConfigWinRMListenerPlugin,cloudbaseinit.plugins.common.userdata.UserDataPlugin
allow_reboot=true

Fire up services and modify the cloudbase-init service properties (Startup Type -> Delayed Start):

Modify c:\Program Files\Cloudbase Solutions\Cloudbase-init\conf\Unattend.xml as follows:

...
    <RunSynchronousCommand wcm:action="add">
      <Order>1</Order>
      <Path>sc.exe config cloudbase-init start= delayed-auto</Path>
      <Description>Start cloudbase-init</Description>
      <WillReboot>Never</WillReboot>
    </RunSynchronousCommand>
    <RunSynchronousCommand wcm:action="add">
      <Order>2</Order>
      <Path>cmd.exe /c ""C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python...
      ...
    </RunSynchronousCommand>
...

Go back to the cloudbase-init installation screen. Select both sysprep and shutdown, and then click Finish to complete the installation. Wait until sysprep is completed and the VM is shutdown.

You can now copy your windows-server-2022.vhdx (under c:\Users\Public\Public Documents\Hyper-V\Virtual hard disks) to an antifactory for reuse with HyperV. Do not boot again with this VM image as booting the image may invalidate the cloud-init status, in which case you have to repeat the cloud-init steps.

Setup Linux VM Images¶

Use the following steps to create Linux VM OS images: - Linux ISO: Download the Ubuntu Server 2204 ISO image from Ubuntu* Cloud Images. - Launch the Linux VM using Hyper-V as follows: - Use Generation-1. - Create a disk image ubuntu-server-2204.vhdx of about 10GB in size. - Create a DVD drive with the content from the Ubuntu Server 2204 iso image. - Select the Ubuntu Server edition during the installation.
- During disk partitioning, de-select LVM to not use the LVM-managed disks. Make sure the root partition is the last partition and occupies the rest of the disk space.
- Select to install the openssh service.
- Create a ubuntu user and set an initial password.

After the OS setup, continue the VM setup:
Login to the ubuntu user and install the Linux Integration Service.

sudo apt-get install -y linux-virtual \
  linux-cloud-tools-virtual \
  linux-tools-virtual

Cleanup cloud-init and shutdown the system:

# nic must have "old" name, eg. eth0 and not ens18
#cat << EOF > /etc/udev/rules.d/70-persistent-net.rules
#SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", NAME="eth0"
#EOF

# this prevents network configuration from happening, says so right in the name
#rm -f /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg

# ConfigDrive needs to come first (I think)
#cat << EOF > /etc/cloud/cloud.cfg.d/99-pve.cfg
#datasource_list: [ConfigDrive, NoCloud]
#EOF

# this is super important, netplan files are not regenerated if they exist
#sudo rm -f /etc/netplan/00-installer-config.yaml
#sudo rm -f /etc/netplan/50-cloud-init.yaml

sudo cloud-init clean
sudo shutdown -h now

After shutting down, you can copy your linux-server-2204.vhdx (under c:\Users\Public\Documents\Hyper-V\Virtual hard disks) to an antifactory for reuse with Hyper-V. Do not boot again with this VM image as booting the image may invalidate the cloud-init status, in which case you have to repeat the cloud-init steps.

Preparing Infrastructure Setup Kubernetes¶

Source: doc/user-guide/preparing-infrastructure/setup-kubernetes.md

kubernetes is the default validation backend to run single- or multi-container workloads on a cluster of machines.

Prerequisites¶

Starting Kubernetes v1.20, Kubernetes deprecated docker as a runtime and used containerd instead. Follow the [instructions][instructions] to install and configure containerd on your system.

Setup Kubernetes¶

Follow the Ubuntu/CentOS instructions to setup a Kubernetes cluster. For full features, please install Kubernetes v1.21 or later.

You can build the workloads and run the workloads on the same machine by setting up a single-node Kubernetes cluster:

kubectl taint node --all node-role.kubernetes.io/master-
kubectl taint node --all node-role.kubernetes.io/control-plane-  # >= v1.20

Setup Node Feature Discovery (Ansible Automation)¶

Please refer to the execution role in the location below:

./script/terraform/template/ansible/kubernetes/roles/nfd/

Setup Node Feature Discovery (Manually)¶

Install node feature discovery as follows:

kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default

Preparing Infrastructure Setup Kvm¶

Source: doc/user-guide/preparing-infrastructure/setup-kvm.md

Introduction¶

A KVM host can serve as many VMs as the KVM host can afford the capacity (CPU, Memory, and Disk). The VMs connect to each other via virtual network and communicate to the Internet via NAT. When a single KVM host is not powerful enough, we need to setup multiple KVM hosts to share the load. This document describes the setup details.

Single KVM Host¶

The setup-sut-kvm.sh script can be used to setup any KVM host. It is as simple as running the following command:

script/setup/setup-sut-kvm.sh <user>@<kvm-host-ip> --reset

Then modify script/terraform/terraform-config.kvm.tf to specify the KVM host URL as follows:

...
variable "kvm_hosts" {
  default = [{
    user   = "<user>"
    host   = "<kvm-host-ip>"
    port   = 22
    # DHCP must be enabled on the network interface
    networks = [ "default" ]
    # if specified, os image will reuse the storage pool
    # images (with same image names.)
    pool   = null
  }]
}

where the pool value can be null if you don't have an existing KVM storage pool that caches all OS images. For better efficiency, please cerate a KVM storage pool and cache the following OS images: - jammy-server-cloudimg-amd64.img: Download from https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img - windows-server-2022.qcow2: See Prepare Windows VM

Multiple KVM Hosts via VXLAN¶

VXLAN creates a virtual network on top of the physical network to enable the communication among multiple KVM hosts. Use the setup-sut-kvm.sh script to setup each KVM host.

./setup-sut-kvm.sh <user>@<kvm-host1-ip> <user>@<kvm-host2-ip> --vxlan --reset

Modify the script/terraform/terraform-config.kvm.tf as follows:

...
# multiple KVM hosts
variable "kvm_hosts" {
  default = [{
    user = "<user>"
    host = "<kvm-host1-ip>"
    port = 22
    # DHCP must be enabled on the network interface
    networks = [ "wsfbr0", "default" ]
    # if specified, os image will reuse the storage pool
    # images (with same image names.)
    pool = null
  }, {
    user = "<user>"
    host = "<kvm-host2-ip>"
    port = 22
    # DHCP must be enabled on the network interface
    networks = [ "wsfbr0", "default" ]
    # if specified, os image will reuse the storage pool
    # images (with same image names.)
    pool = null
  }]
}

where the wsfbr0 network interface is used for inter-KVM host communication and the default network interface is used to communicate to the Internet.

Multiple KVM Hosts via Bridged Network¶

You can also use bridged networking among KVM hosts. The VMs created by each KVM host connect via bridged networking. Since most IT infrastructures discourage or prohibit bridged networking. We need to add a JumpHost to isolate the VM network from the IT network, as follows:

flowchart LR;
  net1((IT Network));;
  hub[Jump Host];;
  net2((VM Network));;
  kvm1[KVM Host1];;
  kvm2[KVM Host2];;
  kvm3[KVM Host3];;
  net1 <--> hub <--> |.101| net2;
  net2 <--> |.105| kvm1;
  net2 <--> |.106| kvm2;
  net2 <--> |.107| kvm3;

JumpHost Setup¶

The JumpHost is equipped with two network interfaces: a primary network interface connects to the IT network and a secondary network interface connects to the VM network.

Proxy Server¶

We need to install a squid proxy server on the JumpHost so that KVM-Host1-3 can access to the IT network through the socks5 proxy.

sudo apt-get install -y squid

Assume the VM network is 192.168.121.0/24, the JumpHost IP address is 192.168.121.101, the squid configuration is as follows:

# /etc/squid/squid.conf
acl localnet src 192.168.121.0/24       # RFC 1918 local private network (LAN)
acl localnet src fc00::/7               # RFC 4193 local private network range
acl localnet src fe80::/10              # RFC 4291 link-local (directly plugged) machines
acl SSL_ports port 443
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl intel dstdomain .intel.com
always_direct allow intel
never_direct allow !intel
cache_peer proxy-dmz.intel.com parent 911 0 default
http_access deny !Safe_ports
http_access allow localhost manager
http_access allow localnet
http_access deny manager
include /etc/squid/conf.d/*.conf
http_access allow localhost
http_access deny all
http_port 192.168.121.101:3128
coredump_dir /var/spool/squid
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern \/(Packages|Sources)(|\.bz2|\.gz|\.xz)$ 0 0% 0 refresh-ims
refresh_pattern \/Release(|\.gpg)$ 0 0% 0 refresh-ims
refresh_pattern \/InRelease$ 0 0% 0 refresh-ims
refresh_pattern \/(Translation-.*)(|\.bz2|\.gz|\.xz)$ 0 0% 0 refresh-ims
refresh_pattern .               0       20%     4320

DHCP Server¶

Optionally, if there is no DHCP server on the VM network, we can install a DHCP server on the JumpHost as well:

sudo apt-get install -y isc-dhcp-server

The DHCP configuration is as follows:

# /etc/dhcp/dhcpd.conf
option domain-name "iotg";
default-lease-time 600;
max-lease-time 7200;
ddns-update-style none;
authoritative;

subnet 192.168.121.0 netmask 255.255.255.0 {
  range 192.168.121.20 192.168.121.100;
  option routers 192.168.121.101;
}

SSH Tunnels¶

We need to create a few SSH tunnels, each corresponding to a KVM host. This is required by the terraform libvirtd plugin, which does not understand how to access KVM hosts behind a JumpHost. Thus we create SSL tunnels to allow direct access to the KVM hosts via different ports on the JumpHost.

Assume the KVM host IP addresses are 192.168.121.105-107:

ssh -fNL '*:12105:192.168.121.105:22' root@192.168.121.105
ssh -fNL '*:12106:192.168.121.106:22' root@192.168.121.106
ssh -fNL '*:12107:192.168.121.107:22' root@192.168.121.107

KVM Host Setup¶

We need to create a network bridge and bind the physical network to it:

This is the setup of the KVM host 192.168.121.105:

# /etc/netplan/bridge.yaml
network:
  version: 2

  ethernets:
    enp136s0f3:
      dhcp4: false
      dhcp6: false

  bridges:
    br0:
      interfaces: [enp136s0f3]
      addresses: [192.168.121.105/24]
      # gateway4 is deprecated, use routes instead
      routes:
      - to: default
        via: 192.168.121.101
        metric: 100
        on-link: true
      mtu: 1500
      nameservers:
        addresses: [8.8.8.8]
      parameters:
        stp: true
        forward-delay: 4
      dhcp4: yes
      dhcp6: no

# bridge.yaml
<network>
  <name>wsfbr0</name>
  <forward bridge="br0" /> 
  <bridge name='wsfbr0' stp='on' delay='0'/>
  <dns enable='no'/>
  <ip address='172.22.0.2' prefix='24'/>
</network>

$ virsh net-create --file bridge.yaml

Proxy Setting¶

Finally, add the proxy settings to /etc/environment:

# /etc/environment
http_proxy=http://192.168.121.101:3128
https_proxy=http://192.168.121.101:3128
no_proxy=localhost,127.0.0.1

Control Where VMs Are Created¶

With multiple KVMs, you can control where each VM is provisioned by changing the kvm_hosts list:

variable "worker_profile" {
  default = {
    name = "worker"
    cpu_core_count = 2
    memory_size = 2
    vm_count = 1

    os_image = null
    os_type = "ubuntu2204"
    os_disk_size = 50

    data_disk_spec = null
    network_spec = null

    # Specifies which KVM host must be used to host
    # different workers.
    kvm_hosts = [ 0, 1, 2, 0, 1, 2 ]
  }
}

where each VM group has a separate kvm_hosts definition. The line kvm_hosts = [ 0, 1, 2, 0, 1, 2 ] specifies that the first worker-0 should be created on the first KVM host, the next worker-1 will be created on the second KVM host, etc.

Trouble Shoot¶

If you need to login to the VM instances, - For Linux VM instances, use ./debug.sh to go into the terraform container, and then use ssh -i ssh_access.key -p <port> <user>@<ip> to login to the Linux VM instances, where <port>, <user>, and <ip> are VM instance information available in the inventory.yaml file.
- For Windows VM instances, use virsh domdisplay <instance-name> to retrieve the spice URL, similar to spice://10.165.31.154:5901. Then you can access the Windows display via a spice viewer. You can get the randomly generated Windows password in the inventory.yaml file.

Please note that if you spice URL is local only: spice://127.0.0.1:5901, then you cannot access to the spice display remotely. To workaround, you can setup ssh port forwarding as follows: ssh -LfN 5901:127.0.0.1:5901 <user>@<kvm-host>, which port-forwards the spice connection of the KVM host to your local 5901 port.

Advanced Topic: Create VM OS Images¶

Create Windows OS Image¶

This section describes the steps to create any custom Windows OS images to be used by the WSF terraform KVM backend.

Required ISO:¶

Download the following ISO'es:

Windows ISO: Download the Windows Server 2022 from the Microsoft* evaluation center.
virtio-win ISO: Download virtio-win from github.

Prepare Windows VM:¶

Launch the Windows VM as follows:

virt-install --connect qemu:///system \
  --name ws2022 --ram 8192 --vcpus 8 \
  --network network=default,model=virtio \
  --disk path=windows-server-2022.qcow2,format=qcow2,device=disk,bus=virtio,size=11 \
  --os-type windows --os-variant win2k19 \
  --cdrom windows-server-2022.iso \
  --disk path=virtio.iso,device=cdrom \
  --graphics spice,listen=10.165.31.154

where windows-server-2022.iso is the Windows ISO, and virtio.iso is the virtio-win ISO, and 10.165.31.154 is the public IP address of your KVM host.

You can now continue the Windows Server setup with any spice viewer at spice://10.165.31.154:5901:

Select Windows Server 2022 Standard Evaluation (Desktop Experience):
Select load driver to load the virtio storage driver from the virtio-win cdrom.

After the OS setup, continue the VM setup: - Bring up Device Manager. Install the virtio network driver (and more device drivers) from your virtio-win cdrom.

Set execution policy unrestricted with powershell:

Set-ExecutionPolicy Unrestricted

Use diskpart to delete any system recovery partition, and use Computer Management->Disk Management to expand disk0 to the full extent. This is critical so that the disk can be expanded later to other sizes.

diskpart
> select disk 0
> select partition 3
> delete partition override

Download cloudbase-init:

Invoke-WebRequest -Uri https://github.com/cloudbase/cloudbase-init/releases/download/1.1.4/CloudbaseInitSetup_1_1_4_x64.msi -OutFile CloudbaseInitSetup_1_1_4_x64.msi -Proxy http://proxy-dmz.intel.com:911

Install cloudbase-init. During the installation, change the username to Administrator at this screen, and then click Install to install cloudbase-init. Pause to modify a few files in the next step, before proceeding to the final screen.

Modify c:\Program Files\Cloudbase Solutions\Cloudbase-init\conf\cloudbase-init-unattended.conf:

metadata_services=cloudbaseinit.metadata.services.nocloudservice.NoCloudConfigDriveService,cloudbaseinit.metadata.services.base.EmptyMetadataService
plugins=cloudbaseinit.plugins.common.sethostname.SetHostNamePlugin,cloudbaseinit.plugins.windows.createuser.CreateUserPlugin,cloudbaseinit.plugins.windows.extendvolumes.ExtendVolumesPlugin,cloudbaseinit.plugins.windows.winrmlistener.ConfigWinRMListenerPlugin,cloudbaseinit.plugins.common.userdata.UserDataPlugin
allow_reboot=true

Go back to the cloudbase-init installation screen. Select both sysprep and shutdown, and then click Finish to complete the installation. Wait until sysprep is completed and the VM is shutdown.

You can now copy your windows-server-2022.qcow2 to an antifactory for reuse with KVM.

Preparing Infrastructure Setup Memory¶

Source: doc/user-guide/preparing-infrastructure/setup-memory.md

The memory labels are used to check if the workload minimum memory size is available.

Assume the label is HAS-SETUP-MEMORY-192, the workload minimum memory size is 192 GB.

Preparing Infrastructure Setup Module¶

Source: doc/user-guide/preparing-infrastructure/setup-module.md

The set of HAS-SETUP-MODULE labels specify the request of installing kernel modules that are part of the OS distribution but not by default installed during boot.

Module Setup¶

The label should be specified in the format of HAS-SETUP-MODULE-<module-name>, where <module-name> is the module name.

System Setup¶

The kernel module can be installed as follows:

sudo modprobe <module-name>.ko

Node Labels¶

Add a node label to the worker node(s): - HAS-SETUP-MODULE-msa: optional

Preparing Infrastructure Setup Network¶

Source: doc/user-guide/preparing-infrastructure/setup-network.md

For network workload, the test environment must be prepared in advance.

Install high-speed network NIC and link multiple workers together using either a high-speed switch or point-to-point cross-link cables. The network interface can be either a physical interface or a logical bond of multiple physical interfaces.

For example, at least two Ice Lake (or Sapphire Rapids) Linux hosts running CentOS Stream. On each Linux Host, you shall install one Intel E810-CQDA2 or E810-2CQDA2 100GE NIC adaptor (each adaptor has two 100GE ports) under CPU0 IIO root port PCIe bus (DO NOT install the 100GE NIC adaptor under PCH PCIe bus).

Network Setup¶

Use two 100GE copper(or optical) cables to connect those 100GE ports, connection method:

Host0 100GE physical port0 <-> Host1 100GE physical port0;  (100GE connection 0 between two Linux hosts)  
Host0 100GE physical port1 <-> Host1 100GE physical port1;  (100GE connection 1 between two Linux hosts)

> Note: Make sure those 100GE ports is link up from NIC ports LED green light after machine power on again

Make sure you Linux Hosts installed Linux kernel 100GE driver package "ice" for E810 adaptor:
```
modprobe ice
```
Use Linux command to make sure ice driver/firmware version and 100GE ports link up Status/Duplex/Speed is 100GE full duplex link up.
```
ethtool ethX
ethtool -i ethX
```
- ethX is the interface name of each 100GE port in Linux host ifconfig output.
- you can use Linux kernel in-tree ice driver package. Just FYI: latest ice driver package download: https://sourceforge.net/projects/e1000/files/ice%20stable/
Use ifconfig command to config correct IP address with two different subnets(eg, 192.168.8.x and 192.168.9.x) for two 100GE connections, ping each other Linux Host to make sure those 100GE connections works fine;
For Example, you can configure IP address like this:
Host0 100GE physical port0(192.168.8.88) <-> Host1 100GE physical port0(192.168.8.99).
Host0 100GE physical port1(192.168.9.88) <-> Host1 100GE physical port1(192.168.9.99).
- Host0
```
sudo ifconfig <port0-name> 192.168.8.88 netmask 255.255.255.0 up
sudo ifconfig <port1-name> 192.168.9.88 netmask 255.255.255.0 up
```
- Host1
```
sudo ifconfig <port0-name> 192.168.8.99 netmask 255.255.255.0 up
sudo ifconfig <port1-name> 192.168.9.99 netmask 255.255.255.0 up
```
- Host0
```
ping 192.168.8.99
ping 192.168.9.99
```
On each Linux host, use command to find out two E810 devices PCIe bus:device:function numbers, bind E810 PCIe devices to DPDK will use this PCIe bus:device:function number.
```
lspci | grep Eth | grep E810
```
On each Linux host, install currently running kernel version kernel-devel package, then you can compile kernel modules .ko;
```
yum install kernel-devel
```
- make sure installed kernel-devel version equals uname -r version.
On each Linux host, we now suggest to bind NIC port to vfio-pci because igb_uio is not supported in newest OS.

To use vfio-pci we need to add intel_iommu=on iommu=pt to the Linux Boot Command line.
```
sudo grubby --update-kernel=DEFAULT --args="intel_iommu=on iommu=pt"
sudo reboot
```
Also enable Intel vt-d in bios:

EDKII Menu → Socket Configuration → IIO Configuration → Intel VT for directed IO (VT-d) → Intel VT for directed IO → Enable

Command to check Virtualization Technology and IOMMU enable:
```
dmesg | grep -E "DMAR|IOMMU"  
```
Then plugin vfio-pci module:
```
sudo modprobe vfio-pci
```
7. On each Linux host, use command to set 100GE ports as ifconfig down state.
```
ifconfig <port-name> down
```
* if E810 ports is ifconfig up state, it cannot bind to DPDK vfio-pci.

On each Linux Host, download DPDK21.11.tar.xz, decompress it, use dpdk-devbind.py to bind 100GE ports to DPDK vfio-pci

wget http://fast.dpdk.org/rel/dpdk-21.11.tar.xz
tar xf dpdk-21.11.tar.xz
cd dpdk-stable-21.11/usertools
./dpdk-devbind.py --status
./dpdk-devbind.py --bind=vfio-pci bus:dev:func bus:dev:func
./dpdk-devbind.py --status

Note: if your dpdk-devbind.py cannot running, run:
yum install -y numactl libfdt pciutils which python3

Note: if you want to bind 100GE port to kernel driver "ice" again run:
```
./dpdk-devbind.py --bind=ice bus:dev:func bus:dev:func
```

Use these commands to add labels to K8S nodes.

Currently workloads l3fwd/vppfib/ngfw/ovs-dpdk/pktgen need this label to schedule.

It indicates that the two 100GE ports have been bound to DPDK vfio-pci driver on the Linux host correctly.
```
kubectl label nodes nodeX HAS-SETUP-DPDK=yes --overwrite
kubectl label nodes nodeY HAS-SETUP-DPDK=yes --overwrite
```
On Linux grub boot up command line, you need to setup huge page numbers >= 2048.
```
grubby --update-kernel=DEFAULT --args="hugepages=2048"
reboot
```
The workload requests 2048 2MB hugepages. See Hugepage Setup for setup instructions.

After properly setup two Linux hosts 2MB hugepages, use these commands to make K8S two nodes label, currently l3fwd/pktgen k8s scheduling need this label.
```
kubectl label nodes nodeX HAS-SETUP-HUGEPAGE-2048kB-2048=yes --overwrite  
kubectl label nodes nodeY HAS-SETUP-HUGEPAGE-2048kB-2048=yes --overwrite
```
Install K8S/Docker environment, configure one Linux host as k8s master and node, another Linux host as k8s node.

This workload can run with K8S backend; this workload cannot run with single host Docker backend;
Create configuration file named network_env.conf in the etc directory. This file contains 100G port or device information. The context should follow the following format: /etc/network_env.conf
```
dpdk_port1=0000:38:00.0    
dpdk_port2=0000:38:00.1  
dsa_dev1=0000:6a:01.0  
dsa_dev2=0000:6f:01.0  
dpdk_port1_srcmac_l3fwd=0xb4,0x96,0x91,0xb2,0xa6,0x48  
dpdk_port2_srcmac_l3fwd=0xb4,0x96,0x91,0xb2,0xa6,0x49  
dpdk_port1_destmac_l3fwd=0xb4,0x96,0x91,0xc3,0x85,0xd8  
dpdk_port1_destmac_l3fwd_tx=b4:96:91:c3:85:d8  
dpdk_port2_destmac_l3fwd=0xb4,0x96,0x91,0xc3,0x85,0xd9  
dpdk_port2_destmac_l3fwd_tx=b4:96:91:c3:85:d9  
dpdk_port_mac1=0xb4,0x96,0x91,0xb2,0xa5,0x10  
dpdk_port_mac2=0xb4,0x96,0x91,0xb2,0xa5,0x11  
neigh_port_mac1=0xb4,0x96,0x91,0x9b,0x79,0x38  
neigh_port_mac2=0xb4,0x96,0x91,0x9b,0x79,0x39  
```
- dpdk_port[1-9] means 100G port pci number.This port is bound to vfio_pci and used by dpdk. You can use command lspci | grep Eth | grep E810 to get pci number.
- dsa_dev[1-9] means DSA device number.For Intel® DSA devices, they are currently (at time of writing) appearing as devices with type “0b25”, due to the absence of pci-id database entries for them at this point. This device is bound to vfio_pci and used by dpdk. You can use command lspci | grep 0b25 to get dsa device number.
- dpdk_port_mac[1-9]: MAC address of NIC port with format as example 0xb4,0x96,0x91,0xb2,0xa6,0xd8.
- neigh_port_mac[1-9]: MAC address of NIC port which connected with dpdk_port on another test machine.
- dpdk_port[1-9]_srcmac_l3fwd: MAC address of NIC in this host port with format as example 0xb4,0x96,0x91,0xb2,0xa6,0x48.
- dpdk_port[1-9]_destmac_l3fwd: MAC address of NIC port which connected with dpdk_port on another test machine.
- dpdk_port[1-9]_destmac_l3fwd_tx: MAC address of NIC port which connected with dpdk_port on another test machine. Format should be b4:96:91:c3:85:d8

A workload can request network as follows: - HAS-SETUP-NETWORK-SPEC-1: The worker node must have a set of network, whose specification, netwrok_spec_1, is specified in the terraform configuration files.

Node Labels:¶

Label the worker nodes with the following node labels:
- HAS-SETUP-NIC-25G=yes: Optional. - HAS-SETUP-NIC-40G=yes: Optional.
- HAS-SETUP-NIC-100G=yes: Optional. - HAS-SETUP-DPDK=yes: Required. - HAS-SETUP-HUGEPAGE-2048kB-2048=yes: Required. - HAS-SETUP-NETWORK-SPEC-1=yes: The worker node is equipped with the network described in network_spec_1.

Preparing Infrastructure Setup Nfd¶

Source: doc/user-guide/preparing-infrastructure/setup-nfd.md

Node Feature Discovery (NFD) lists platform capabilities and can be used for intelligent workload scheduling in Kubernetes.

Note: NFD + Intel Device Plugins can be achieved in SF through Ansible automation please refer to roles available here.

Intel Device Plugins¶

The Intel Device Plugins for Kubernetes provides a collection of device plugins that advertise Intel hardware resources.

Currently, the operator can support QAT, SGX device plugins etc…

Benefits of Completing WL's With NFD Based Labels Accompanying Intel Device Plugins¶

As much as possible, an Ansible role is designed to automate both NFD and Intel DP projects to reduce manual workload
After NFD + Intel Device Plugins configured, WL’s can be triggered with no need for complicated or hardcoded values such as hardware type, hugepages, OS’s etc. It is more robust for WL's to rely on NFD labels which will show whether SGX, QAT, etc. are properly configured and present in the system.

Note: Additionally, in testing, it was observed that Intel Device Plugins configured with NFD resulted in faster execution time for WL's such as OpenSSL3-RSAMB

Required: Installation of the drivers QAT, SGX etc is expected else Intel Device Plugins might end up with issues.
Example of QAT labels is as under

Note: Please update labels in WL’s before execution, for example in OpenSSL3-RSAMB WL, QAT lables should be updated as under

workload/OpenSSL3-RSAMB/kubernetes-config.yaml.m4

nodeSelector:
   intel.feature.node.kubernetes.io/qat: "true"

workload/OpenSSL3-RSAMB/cluster-config.yaml.m4

- labels:
    intel.feature.node.kubernetes.io/qat: required

Note: Same can be achieved with all SGX related / dependent WL's

nodeSelector:
  intel.feature.node.kubernetes.io/sgx: "true"

and

- labels:
    intel.feature.node.kubernetes.io/sgx: required

Note: Same can be achieved with all DLB related / dependent WL's

nodeSelector:
  intel.feature.node.kubernetes.io/dlb: "true"

and

- labels:
    intel.feature.node.kubernetes.io/dlb: required

Note: Same can be achieved with all DSA related / dependent WL's

nodeSelector:
  intel.feature.node.kubernetes.io/dsa: "true"

and

- labels:
    intel.feature.node.kubernetes.io/dsa: required

intel.feature.node.kubernetes.io/dsa

Verify Node Feature Discovery¶

kube-system node-feature-discovery-worker      1    1    1    1    1    <none>    61s

To verify that NFD in Kubernetes is running as expected, use the following command:

kubectl label node --list --all

Listing labels for Node./node:
 feature.node.kubernetes.io/kernel-config.NO_HZ=true
 feature.node.kubernetes.io/cpu-pstate.turbo=true
 feature.node.kubernetes.io/cpu-cpuid.X87=true
 feature.node.kubernetes.io/cpu-cpuid.MOVDIR64B=true
 feature.node.kubernetes.io/cpu-rdt.RDTCMT=true
 kubernetes.io/arch=amd64
 feature.node.kubernetes.io/cpu-cpuid.AVX512FP16=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512VPOPCNTDQ=true
 feature.node.kubernetes.io/cpu-power.sst_bf.enabled=true
 feature.node.kubernetes.io/system-os_release.VERSION_ID=22.04
 intel.power.node=true
 feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true
 intel.feature.node.kubernetes.io/dlb=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512IFMA=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512BF16=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512BITALG=true
 feature.node.kubernetes.io/cpu-cpuid.AMXBF16=true
 feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
 feature.node.kubernetes.io/cpu-rdt.RDTL2CA=true
 feature.node.kubernetes.io/cpu-cstate.enabled=true
 feature.node.kubernetes.io/pci-0b40_8086.present=true
 feature.node.kubernetes.io/cpu-cpuid.VAES=true
 feature.node.kubernetes.io/kernel-version.revision=0
 intel.feature.node.kubernetes.io/qat=true
 feature.node.kubernetes.io/kernel-version.full=5.15.0-25-generic
 cndp=true
 feature.node.kubernetes.io/cpu-cpuid.FMA3=true
 feature.node.kubernetes.io/cpu-cpuid.SHA=true
 ethernet.intel.com/intel-ethernet-present=
 feature.node.kubernetes.io/cpu-cpuid.SERIALIZE=true
 feature.node.kubernetes.io/cpu-cpuid.CETIBT=true
 feature.node.kubernetes.io/pci-0b40_8086.sriov.capable=true
 feature.node.kubernetes.io/cpu-cpuid.TSXLDTRK=true
 kubernetes.io/hostname=node
 feature.node.kubernetes.io/cpu-cpuid.LAHF=true
 feature.node.kubernetes.io/kernel-version.minor=15
 feature.node.kubernetes.io/cpu-cpuid.MOVDIRI=true
 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=22
 feature.node.kubernetes.io/cpu-cpuid.XSAVE=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512VBMI2=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512VBMI=true
 feature.node.kubernetes.io/cpu-hardware_multithreading=true
 feature.node.kubernetes.io/cpu-cpuid.WBNOINVD=true
 feature.node.kubernetes.io/cpu-model.id=143
 feature.node.kubernetes.io/cpu-cpuid.STIBP=true
 feature.node.kubernetes.io/kernel-version.major=5
 intel.feature.node.kubernetes.io/dsa=true
 feature.node.kubernetes.io/cpu-cpuid.SCE=true
 feature.node.kubernetes.io/system-os_release.ID=ubuntu
 feature.node.kubernetes.io/cpu-cpuid.WAITPKG=true
 feature.node.kubernetes.io/cpu-cpuid.IBPB=true
 feature.node.kubernetes.io/cpu-rdt.RDTMON=true
 feature.node.kubernetes.io/cpu-cpuid.VMX=true
 feature.node.kubernetes.io/cpu-rdt.RDTMBA=true
 feature.node.kubernetes.io/cpu-cpuid.CLDEMOTE=true
 feature.node.kubernetes.io/cpu-cpuid.MOVBE=true
 intel.feature.node.kubernetes.io/sgx=true
 feature.node.kubernetes.io/cpu-cpuid.FXSR=true
 feature.node.kubernetes.io/cpu-sgx.enabled=true
 feature.node.kubernetes.io/cpu-cpuid.AMXINT8=true
 feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true
 feature.node.kubernetes.io/storage-nonrotationaldisk=true
 kubernetes.io/os=linux
 feature.node.kubernetes.io/cpu-cpuid.CETSS=true
 feature.node.kubernetes.io/memory-numa=true
 node-role.kubernetes.io/worker=
 feature.node.kubernetes.io/cpu-cpuid.ENQCMD=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
 feature.node.kubernetes.io/cpu-cpuid.VPCLMULQDQ=true
 feature.node.kubernetes.io/network-sriov.capable=true
 feature.node.kubernetes.io/cpu-pstate.status=active
 feature.node.kubernetes.io/cpu-cpuid.GFNI=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512VNNI=true
 feature.node.kubernetes.io/cpu-cpuid.AVX=true
 beta.kubernetes.io/arch=amd64
 feature.node.kubernetes.io/cpu-cpuid.AESNI=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512VP2INTERSECT=true
 feature.node.kubernetes.io/cpu-cpuid.AVX2=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
 feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
 feature.node.kubernetes.io/cpu-rdt.RDTL3CA=true
 feature.node.kubernetes.io/cpu-model.vendor_id=Intel
 feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
 feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
 feature.node.kubernetes.io/cpu-cpuid.ADX=true
 feature.node.kubernetes.io/cpu-pstate.scaling_governor=powersave
 feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
 feature.node.kubernetes.io/pci-0300_1a03.present=true
 beta.kubernetes.io/os=linux
 feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
 feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true
 feature.node.kubernetes.io/cpu-cpuid.AMXTILE=true
 feature.node.kubernetes.io/cpu-model.family=6
Listing labels for Node./controller:
 kubernetes.io/os=linux
 node-role.kubernetes.io/control-plane=
 node.kubernetes.io/exclude-from-external-load-balancers=
 beta.kubernetes.io/arch=amd64
 beta.kubernetes.io/os=linux
 kubernetes.io/arch=amd64
 kubernetes.io/hostname=controller
 ```

#### Verify Intel Device Plugin Operator

```shell
kubectl get pods --all-namespaces | grep 'inteldeviceplugins'

inteldeviceplugins-system       inteldeviceplugins-controller-manager-59b46b7949-hkp4g       2/2     Running   0          17m

Verify Intel QAT Device Plugin¶

kubectl get node "Update Node Name" -o json | jq '.status.allocatable'

{
  "cndp/e2e": "1",
  "cpu": "125",
  "ephemeral-storage": "282566437625",
  "hugepages-1Gi": "8Gi",
  "hugepages-2Mi": "64Gi",
  "intel.com/ens107_intelnics_1": "1",
  "intel.com/ens107_intelnics_2": "4",
  "intel.com/ens107_intelnics_3": "1",
  "memory": "186797444Ki",
  "pods": "110",
  "power.intel.com/balance-performance": "76",
  "power.intel.com/balance-performance-node": "76",
  "power.intel.com/balance-power": "102",
  "power.intel.com/balance-power-node": "102",
  "power.intel.com/performance": "51",
  "power.intel.com/performance-node": "51",
  "qat.intel.com/generic": "32",
  "sgx.intel.com/enclave": "20",
  "sgx.intel.com/provision": "20"
}

kubectl get no -o json | jq .items[].metadata.labels | grep qat

 "intel.feature.node.kubernetes.io/qat": "true",

Verify Intel SGX Device Plugin¶

kubectl get node "Update Node Name" -o json | jq '.status.allocatable'

{
  "cndp/e2e": "1",
  "cpu": "125",
  "ephemeral-storage": "282566437625",
  "hugepages-1Gi": "8Gi",
  "hugepages-2Mi": "64Gi",
  "intel.com/ens107_intelnics_1": "1",
  "intel.com/ens107_intelnics_2": "4",
  "intel.com/ens107_intelnics_3": "1",
  "memory": "186797444Ki",
  "pods": "110",
  "power.intel.com/balance-performance": "76",
  "power.intel.com/balance-performance-node": "76",
  "power.intel.com/balance-power": "102",
  "power.intel.com/balance-power-node": "102",
  "power.intel.com/performance": "51",
  "power.intel.com/performance-node": "51",
  "qat.intel.com/generic": "32",
  "sgx.intel.com/enclave": "20",
  "sgx.intel.com/provision": "20"
}

kubectl get no -o json | jq .items[].metadata.labels | grep sgx

"feature.node.kubernetes.io/cpu-sgx.enabled": "true",
"intel.feature.node.kubernetes.io/sgx": "true",

Verify Intel DLB Device Plugin¶

kubectl get node "Update Node Name" -o json | jq '.status.allocatable'

{
  "cndp/e2e": "0",
  "cpu": "125",
  "dlb.intel.com/pf": "0",
  "dsa.intel.com/wq-user-dedicated": "0",
  "ephemeral-storage": "282566437625",
  "hugepages-1Gi": "4Gi",
  "hugepages-2Mi": "2Gi",
  "memory": "256095196Ki",
  "pods": "110",
  "power.intel.com/balance-performance": "76",
  "power.intel.com/balance-performance-node": "76",
  "power.intel.com/balance-power": "102",
  "power.intel.com/balance-power-node": "102",
  "power.intel.com/performance": "51",
  "power.intel.com/performance-node": "51",
  "qat.intel.com/generic": "16",
  "sgx.intel.com/enclave": "0",
  "sgx.intel.com/provision": "0"
}

kubectl get no -o json | jq .items[].metadata.labels | grep dlb

"intel.feature.node.kubernetes.io/dlb": "true",

Verify Intel DSA Device Plugin¶

kubectl get node "Update Node Name" -o json | jq '.status.allocatable'

{
  "cndp/e2e": "0",
  "cpu": "125",
  "dlb.intel.com/pf": "0",
  "dsa.intel.com/wq-user-dedicated": "0",
  "ephemeral-storage": "282566437625",
  "hugepages-1Gi": "4Gi",
  "hugepages-2Mi": "2Gi",
  "memory": "256095196Ki",
  "pods": "110",
  "power.intel.com/balance-performance": "76",
  "power.intel.com/balance-performance-node": "76",
  "power.intel.com/balance-power": "102",
  "power.intel.com/balance-power-node": "102",
  "power.intel.com/performance": "51",
  "power.intel.com/performance-node": "51",
  "qat.intel.com/generic": "16",
  "sgx.intel.com/enclave": "0",
  "sgx.intel.com/provision": "0"
}

kubectl get no -o json | jq .items[].metadata.labels | grep dsa

"intel.feature.node.kubernetes.io/dsa": "true",

Ensure Intel Device Plugins Are Working After Deployment¶

kubectl get pods --all-namespaces | grep inteldeviceplugins-system

inteldeviceplugins-system       intel-dlb-plugin-2plwt                                        1/1     Running   12 (16m ago)   48m
inteldeviceplugins-system       intel-dsa-plugin-l946v                                        1/1     Running   12 (15m ago)   47m
inteldeviceplugins-system       intel-qat-plugin-7dxmj                                        1/1     Running   1 (13m ago)    49m
inteldeviceplugins-system       intel-sgx-plugin-dwjkg                                        1/1     Running   12 (16m ago)   49m
inteldeviceplugins-system       inteldeviceplugins-controller-manager-59b46b7949-55hvg        2/2     Running   0              10m

Preparing Infrastructure Setup Nginx Cache¶

Source: doc/user-guide/preparing-infrastructure/setup-nginx-cache.md

This document is a guide for setting up Nginx-DLB benchmark environment, including Hardware platform and Software configuration on storage.

HW Prerequisites¶

Setup 3 servers: one node (client) for client deployment; 2 node for cache server (worker-0) and content server (worker-1) deployment.
Cache server (worker-0) requires 5 NVMe disks.

K8S Labels configuaration¶

Please finish the section, Storage configuration, then label the corresponding nodes.

Command examples:

Label:

kubectl label node <node name> HAS-SETUP-NVMECACHE=yes

Unlabel:

kubectl label node <node name> HAS-SETUP-NVMECACHE-

Nginx cache server worker-0:*

HAS-SETUP-NGINX-CACHE=yes

Storage configuration¶

This should be done on worker-0.

Prepare 5 nvme disk for nginx cache server pod. nvme? means multiple nvme disk.

Check NVMe drives and Partition drives ```shell command ls /dev/nvme*

```output
/dev/nvme1    /dev/nvme2    /dev/nvme3    /dev/nvme4    /dev/nvme5

Format drives as ext4 (or xfs): shell command mkfs.ext4 /dev/nvme1n1 mkfs.ext4 /dev/nvme2n1 mkfs.ext4 /dev/nvme3n1 mkfs.ext4 /dev/nvme4n1 mkfs.ext4 /dev/nvme5n1
Create cache mountpoints and mount to four pairs shell command mkdir /nginx/cache1 mount -o rw,noatime,seclabel,discard /dev/nvme1n1 /nginx/cache1 mkdir /nginx/cache2 mount -o rw,noatime,seclabel,discard /dev/nvme2n1 /nginx/cache2 mkdir /nginx/cache3 mount -o rw,noatime,seclabel,discard /dev/nvme3n1 /nginx/cache3 mkdir /nginx/cache4 mount -o rw,noatime,seclabel,discard /dev/nvme4n1 /nginx/cache4 mkdir /nginx/cache5 mount -o rw,noatime,seclabel,discard /dev/nvme5n1 /nginx/cache5

Preparing Infrastructure Setup Qat In Tree¶

Source: doc/user-guide/preparing-infrastructure/setup-qat-in-tree.md

Intel® QuickAssist Technology allows data encryption and compression. In-tree setup is described in this document.

QAT In-tree Driver Setup For 4xxx Device¶

Check System Prerequisites¶

Platform must have Intel® QuickAssist Technology QAT device such as "4xxx"
QAT Physical Functions (PF's) can be determined as under:

lspci -d 8086:4942
76:00.0 Co-processor: Intel Corporation Device 4942 (rev 40)
...

or

lspci -d 8086:4940
6b:00.0 Co-processor: Intel Corporation Device 4940 (rev 40)
...

* Firmware must be available.

check that these files exist:

/lib/firmware/qat_4xxx.bin or /lib/firmware/qat_4xxx.bin.xz

/lib/firmware/qat_4xxx_mmp.bin or /lib/firmware/qat_4xxx_mmp.bin.xz

if not, download form:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/qat_4xxx.bin
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/qat_4xxx_mmp.bin

On updating these files run

sudo rmmod qat_4xxx
sudo modprobe qat_4xxx
sudo dracut --force

to update kernel modules and initramfs.

Required Kernel Information¶

Linux kernel v5.11+ (This is for crypto, for compression use v5.17+)
Fedora 34+ (for compression use 36+)
RHEL 8.4+ (for compression use 9.0+)

BIOS Settings¶

Intel VT-d and SR-IOV must be enabled in the platform (BIOS).

Grub Settings¶

Fedora:

sudo grubby --update-kernel=DEFAULT --args="intel_iommu=on vfio-pci.disable_denylist=1 iommu=pt default_hugepagesz=2M hugepagesz=2M hugepages=4096"
Reboot system

Ubuntu:

sudo vim /etc/default/grub
move to GRUB_CMDLINE_LINUX add intel_iommu=on vfio-pci.disable_denylist=1 iommu=pt default_hugepagesz=2M hugepagesz=2M hugepages=4096
sudo update-grub
Reboot system

Install QATLib¶

Fedora 34+, using software package manager

# Install QATLib
sudo dnf install -y qatlib-devel

# Add your user to qat group and re-login to make the change effective
sudo usermod -a -G qat `whoami`
sudo su -l $USER

# Make sure qat service is started properly and ready for use.
sudo systemctl stop qat.service
sudo systemctl enable qat.service
sudo systemctl restart qat.service
sudo systemctl status qat.service

Other Distributions: Building From Source¶

Fedora

# Install dependencies
sudo dnf update -y
sudo dnf install -y gcc systemd-devel automake autoconf libtool
sudo dnf install -y openssl-devel zlib-devel yasm

Ubuntu

# Install dependencies
sudo apt update -y
sudo apt install -y build-essential cmake g++ pkg-config wget make yasm nasm libboost-all-dev libnl-genl-3-dev zlib1g zlib1g-dev
apt install -y systemd m4 pkg-config libudev-dev libssl-dev autoconf libtool tar git libssl-dev

Build & install¶

git clone https://github.com/intel/qatlib
cd qatlib
./autogen.sh
./configure --prefix=/usr --enable-service
make -j
sudo make -j install
sudo make samples-install

# Make sure qat service is started properly and ready to use
sudo systemctl stop qat.service
sudo systemctl enable qat.service
sudo systemctl restart qat.service
sudo systemctl status qat.service

QAT Drivers Uninstall¶

Remove / clean-up of drivers / configurations. Especially helpful if QAT setup needs to be re-configured

sudo systemctl stop qat.service
# Move to dir location in which drivers / configuration are saved such as "/opt/intel/QAT" and execute mentioned below commands:
make uninstall
make clean
make distclean

References¶

Please refer to the following links for detailed information on QAT In-tree Driver Setup For 4xxx Device

https://github.com/intel/qatlib/blob/main/INSTALL
intel-device-plugins-for-kubernetes/Dockerfile at main · intel/intel-device-plugins-for-kubernetes · GitHub

For more information on setting up PFs / VFs for specific QAT devices, please visit

https://doc.dpdk.org/guides/cryptodevs/qat.html

Preparing Infrastructure Setup Terraform¶

Source: doc/user-guide/preparing-infrastructure/setup-terraform.md

The terraform backend can be used to validate workloads on a remote cluster, On-Premises or on Cloud.

Setup Terraform for Cloud Validation¶

Follow the instructions in the WSF Cloud Setup to setup the development host.
The terraform backend supports Cloud vendors such as aws, gcp, azure, tencent, alicloud, and oracle. Each vendor has a corresponding configuration file: script/terraform/terraform-config.<SUT>.tf, where <SUT> is the Cloud vendor name. You can customize as needed.

Configure Cloud Account¶

If this is your first time, run the terraform build command:

Note: mentioned below commands should be executed from a build folder such as ~/workload-services-framework/build# make build_terraform

make build_terraform

Then proceed with the Cloud account setup as follows:

make aws           # or make -C ../.. aws, if under build/workload/<workload>
$ aws configure    # please specify a region and output format as json
$ exit

make azure         # or make -C ../.. azure, if under build/workload/<workload>
$ az login
# if you have multiple subscriptions in your account, please set
# az account set --subscription="<subscription-id>"
$ exit

make gcp           # or make -C ../.. gcp, if under build/workload/<workload>
$ gcloud init --no-launch-browser
$ gcloud auth application-default login --no-launch-browser # Please make sure quota-project-id is set
$ gcloud auth activate-service-account --key-file <keyfile> # required only for running workloads in Google Kubernetes Engine
$ exit

make tencent       # or make -C ../.. tencent, if under build/workload/<workload>
$ tccli configure  # please specify a region
$ exit

make alicloud      # make -C ../.. alicloud, if under build/workload/<workload>
$ aliyun configure # please specify a region
$ exit

make oracle        # make -C ../.. oracle, if under build/workload/<workload>
$ oci setup config # please specify a compartment id and upload public key to the oracle cloud console 
$ exit

Run Workload(s) Through Terraform¶

cd workload/<workload>
make
./ctest.sh -N

Cleanup Cloud Resources¶

If your terraform validation is interrupted for any reason, the Cloud resource may remain active. You can explicitly cleanup any Cloud resources as follows:

Note: mentioned below commands should be executed from the required workload folder such as ~/workload-services-framework/build/workload/OpenSSL3-RSAMB# make -C ../.. azure

make -C ../.. [aws|gcp|azure|tencent|alicloud]
$ cleanup
$ exit

Setup Terraform for On-Premises Validation¶

Follow the instructions in the WSF On-Premises Setup to setup the On-Premises hosts.
Customize terraform-config.static.tf to specify your cluster information.

Under script/terraform, you can create any terraform-config.mysut.tf out of terraform-config.static.tf, where mysut is your sut name. Use cmake -DTERRAFORM_SUT=mysut .. to configure it.

Now you can run any workload as follows:

cd build
cmake -DTERRAFORM_SUT=static -DBENCHMARK=workload/<workload-name> ..
make
./ctest.sh -N

Setup Terraform for KVM Validation¶

Follow the instructions in the WSF KVM Setup to setup the KVM environment.
Customize terraform-config.kvm.tf to specify your KVM host information.

Now you can run any workload as follows:

cd workload/<workload>
make
./ctest.sh -N

Telemetry Trace and Publishing Options¶

See Trace Module for available trace options. You can enable telemetry trace modules during the workload validation as follows:

cmake -DTERRAFORM_OPTIONS=--collectd ..
cd workload/<workload>
./ctest.sh -N

Additionally, you can use --svrinfo to the TERRAFORM_OPTIONS to automatically detect the platform information as follows:

cmake -DTERRAFORM_OPTIONS=--svrinfo ..
cd workload/<workload>
./ctest.sh -N

Debugging¶

While the workload evaluation is in progress, you can logon to the remote instances to debug any encountered issues. As terraform engine runs inside a container, you need to first login to the container as follows:

./debug.sh

The script will bring you to the container shell where you can perform, from the current directory or /opt/workspace, additional operations such as examining the workload execution logs and logging onto the workload instances.

Files of interest: - cluster-config.yaml: The workload cluster configuration definition. - terraform-config.yaml: The workload terraform entry point. - workload-config.yaml: The workload configuration parameters. - kubernetes-config.yaml[.mod.yaml]: The kubernetes deployment script (for containerized workloads.) - cluster.yaml: The ansible playbook to initialize the VM instances. - deployment.yaml: The ansible playbook to run the workload. - inventory.yaml: The provisioned VM information. - tfplan.logs: The process logs. - ssh_access.key[.pub]: The SSH keys for accessing to the VM instances. - template/*: Source code used to provision VMs and evaluate workloads.

$ cat inventory.yaml
...
        worker-0:
          ansible_host: 35.92.225.114
          ansible_user: ubuntu
...
$ ssh -i ssh_access.key ubuntu@35.92.225.114

HINT: while working within the terraform container, if you need to ssh or scp data to public VMs, use the ssh identity ssh_access.key. If you are not under the default directory /opt/workspace, please use the absolute path.

For Example:

$ scp -r -i ssh_access.key /tmp/test.log <username>@<public_ip>:/tmp/
Warning: Permanently added '<public_ip>' (ED25519) to the list of known hosts.
test.log                                                                                                                                                                                                                                                    100%    5     0.1KB/s   00:00

Setting Breakpoint(s)¶

You can set one or many breakpoints by specifying the wl_debug option in TERRAFORM_OPTIONS or terraform-config.<sut>.tf:

cmake -DTERRAFORM_OPTIONS=--wl_debug=<BreakPoint>[,<BreakPoint>] ..

The following <BreakPoint>s are supported:
- PrepareStage: Pause when the workload is about to setup the host environment.
- RunStage: Pause when the workload is about to start the workload execution. - CleanupStage: Pause when the workload is about to cleanup.

When a breakpoint is reached, the execution is paused for an hour (as specified by the wl_debug_timeout value.) You can explicitly resume the execution by creating a signaling file under /opt/workspace, as follows:

./debug.sh
$ touch ResumeRunStage
$ exit

Preparing Infrastructure Setup Wsf¶

Source: doc/user-guide/preparing-infrastructure/setup-wsf.md

This document describes the steps and scripts to setup host environment for workload evaluation and development on the Cloud or On-Premises.

Network topology¶

Network topology wise, the WSF assumes that there is a development (dev) machine for workload build and development, and a set of test machines (SUT, or System Under Test) for workload execution, as follows:

flowchart TD;
  dev[Development Host];;
  net((Network));;
  sut1[SUT1 Host];;
  sut2[SUT2 Host];;
  sut3[SUT3 Host];;
  dev <--> net;;
  net <--> sut1;;
  net <--> sut2;;
  net <--> sut3;;

where the SUT machines can be physical hosts in the On-Premesis case, or virtualized VMs in the Cloud or KVM execution. If Kubernetes is used, it is assumed that the SUT hosts form a Kubernetes cluster, where one of the SUTs be the Kubernetes controller.

For Cloud evaluation, we can usually provision as many VM instances as requried. For the On-Premesis setup, depending on resource constraints, there can be many variations of the above diagram. The followings are some of the common On-Premesis variations: | Name | Description and Restrictions | |:-----|:-------------------------| |Single Dev Default | A single dev machine also serves as a SUT. This is the default if you clone the repository without any changes. This is to quickly evaluate docker and Kubernetes single-container workloads or multiple containers without any node anti-affinity rules. The workloads cannot request any SUT system-level changes, such as hugepage, sysfs, or sysctls. Native workloads cannot run in this mode. Kubernetes uses a single-node setup with the controller untainted for execution, and a private registry (or the WSF official registry) on the same dev machine. Trace is not functional or accurate for most telemetry programs. | |Single Dev with Lookback | A single dev machine also serves as a SUT. Require configuration modification to ssh back to the dev machine for workload execution. This is to evaluate native, docker and Kubernetes single-container workloads or multiple containers workloads without any node anti-affinity rules. The workloads can request system-level changes such as hugepage, sysfs, and sysctls without any reboot of the system or restart of either the containerd, docker or kubelet service. Kubernetes uses a single-node setup with the controller untainted for execution, and a private registry (or the WSF official registry) on the same dev machine. Most trace programs are functional. | |Single dev as the Kubernetes controller | The dev machine also serves as a Kubernetes controller with additional Kubernetes worker nodes. This is to evaluate any native, docker and Kubernetes workloads. Must use a private registry (or the WSF official registry) for Kubernetes workload execution. The workloads can request any supported SUT system-level setups. Telemtry collections are fully functional. | |Dedicated Dev, Kubernetes controller and workers | The dev machine, the Kubernetes controller, and Kubernetes workers are all on dedicated ndoes. This is to evaluate any native, docker and Kubernetes workloads. For Kubernetes workload execution, it is optional to use any private registry. The workloads can request any supported SUT system-level setups. Telemtry collections are fully functional. |

In general, the WSF can work on any network topology, as long as the dev host can establish ssh (or winrm for Windows) connections to the SUT machines.

Prerequisites
Instructions of Cloud Setup
Instructions of On-Premises Setup
Instructions of KVM Setup
Instructions of ITEP Setup
Manual of Setup Scripts

Prerequisites¶

Ensure that your cluster hosts are properly setup in terms of hostname, networking, proxy and datetime:

The hostname must be unique within your subnet and follow RFC-1178, i.e., matching pattern [a-z][a-z0-9-]*.
The hostname is properly registered with DNS so that any of your cluster host can reach each other via hostname.
If you are behind a corporate firewall, please setup http_proxy, https_proxy and no_proxy in /etc/environment.
Sync your host datetime. An accurate system time is required for any credential authentication operations.

Cloud Setup¶

To develop or evaluate workloads for Cloud, you need a Linux development machine. The development machine can be used to build workloads and provision Cloud VMs. The workload execution results are saved back to the development machine before publishing to the WSF portal, if specified.

flowchart LR;
  dev[Development Host];;
  proxy[Proxy];;
  csp[Cloud Service Provider];;
  dev <--> proxy <--> csp;;

where the proxy server is optional if the dev host can connect to Internet directly.

Workload Type	Docker Registry	Dev Setup	SUT Setup	Registry Setup
`Native`	`N/A`	`setup-dev.sh`	`N/A`	`N/A`
`docker/compose`	`Optional`	`setup-dev.sh`	`N/A`	`setup-reg.sh`
`Kubernetes`	`Optional`	`setup-dev.sh`	`N/A`	`setup-reg.sh`

Use the following setup steps: - Run the setup-dev.sh script to setup the development host.
- Follow the backend specific instructions to initialize Cloud accounts. See Terraform Cloud Setup.

Completely optional in this setup, run the setup-reg.sh script, if you would like to setup a local docker registry for storing images or caching any public registry.

On-Premises Setup¶

Dedicated Dev and SUT¶

Start with a simple case, where the dev and the SUT hosts are dedicated. This is the recommended setup for simplicity and full functionality.

flowchart LR;
  dev[Dev Host];;
  net((Network));;
  sut1[SUT1 Host];;
  sut2[SUT2 Host];;
  sut3[SUT3 Host];;
  dev <--> net;;
  net <--> sut1;;
  net <--> sut2;;
  net <--> sut3;;

where, if Kubernetes is used, the SUT hosts are assumed to form a Kubernetes cluster, where one of the SUT host becomes the Kubernetes controller.

Workload Type	Docker Registry	Dev Setup	SUT Setup	Registry Setup
`Native`	`N/A`	`setup-dev.sh`	`setup-sut-native.sh`	`N/A`
`docker/compose`	`Optional`	`setup-dev.sh`	`setup-sut-docker.sh`	`setup-reg.sh`
`Kubernetes`	`Optional`	`setup-dev.sh`	`setup-sut-k8s.sh`	`setup-reg.sh`

Use the following setup steps: - Run the setup-dev.sh script to setup the dev host. - Completely optional in this setup, run the setup-reg.sh script (on the dev host), if you plan to setup a local docker registry for building workloads and storing the built images. - Depending on the workload types, you can run either the setup-sut-native.sh, setup-sut-docker.sh script or the setup-sut-k8s.sh script (on the dev host) to setup the SUT hosts. The native setup can run any baremetal native workloads. The docker setup can run most of the single-node containerized workloads (docker or docker compose). The Kubernetes setup can run all containerized workloads (not tied to any Cloud services) on premises.

Valid for Cloud native workloads only, if you do not setup a local docker registry, please set k8s_enable_registry: true in script/terraform/terraform-config.static.tf. The terraform backend will auto-create an incluster docker registry to serve any Kubernetes workers.
If you do have setup a local docker registry, make sure all SUT hosts can access to the docker registry securely. With setup-reg.sh, this is done by appending all SUT hosts to the setup-reg.sh command line.

Combined Dev and Kuberenetes Controller¶

In this case, you can combine the dev host and the Kubernetes controller to save a physical machine. A valid docker registry is required to serve the docker images to the Kubernetes workers.

flowchart LR;
  dev[Dev Host<br>Controller<br>Registry];;
  net((Network));;
  sut1[K8S Worker1];;
  sut2[K8S Worker2];;
  sut3[K8S Worker3];;
  dev <--> net;;
  net <--> sut1;;
  net <--> sut2;;
  net <--> sut3;;

Workload Type	Docker Registry	Dev Setup	SUT Setup	Registry Setup
`Native`	`N/A`	`setup-dev.sh`	`setup-sut-native.sh`	`N/A`
`docker/compose`	`Optional`	`setup-dev.sh`	`setup-sut-docker.sh`	`setup-reg.sh`
`Kubernetes`	`Required`	`setup-dev.sh`	`setup-sut-k8s.sh`	`setup-reg.sh`

Use the following setup steps:
- Run the setup-dev.sh script on the dev host.
- If you don't have a private registry, run the setup-reg.sh script to setup a private registry on the dev host. Make sure you append your SUT hosts to the command line so that the SUT hosts can securely access the private docker registry.
- Run the setup-sut-k8s.sh script to setup the workers. Use the dev host IP address as the controller IP address.

Use cmake -DREGISTRY=<registry_url> .. to set the private registry URL.
Set k8s_enable_registry: false in script/terraform/terraform-config.static.tf. This is the default.

Single Host Dev+SUT¶

In a constrained environment, you can combine the dev and SUT hosts all onto the same physical host. The SUT access is through either the ansible local connection or a loop-back ssh connection. Please note that this is only for quick evaluation, the WSF scripts will not be able to perform automations that lead to reboot of the system, or restart of the containerd, dockerd, or kubelet service.

flowchart LR;
  dev[Dev/Controller<br>SUT/Registry];;
  dev <--> dev;;

Workload Type	Docker Registry	Dev Setup	SUT Setup	Registry Setup
`Native`	`N/A`	`setup-dev.sh`	`setup-sut-native.sh`	`N/A`
`docker/compose`	`Optional`	`setup-dev.sh`	`setup-sut-docker.sh`	`setup-reg.sh`
`Kubernetes`	`Required`	`setup-dev.sh`	`setup-sut-k8s.sh`	`setup-reg.sh`

Use the following setup steps:
- Run the setup-dev.sh script on the development host: - If a registry is required, run the setup-reg.sh script on the dev host. Append all SUT info on the setup-reg.sh command line so that the SUTs can access to the registry securely.
- Run either the setup-sut-native.sh, setup-sut-docker.sh script or the setup-sut-k8s.sh script to setup the workers. As there is only a single physical host in this setup, only workloads that require a single SUT can proceed.

Use cmake -DREGISTRY=<registry_url> .. to set the private registry URL.
If you use ssh loopback, modify the dev machine username and IP address in script/terraform/terraform-config.static.tf.
If the host has enough CPU cores and memory, suggest to use the KVM development setup for more flexibility.

KVM Setup¶

To evaluate workloads with KVM, you need a Linux dev machine and a KVM host. The KVM host can create as many VMs that can be used as the SUTs.

flowchart LR;
  dev[Dev];;
  kvm[KVM Host];;
  dev <--> kvm;;

Workload Type	Docker Registry	Dev Setup	SUT Setup
`Native`	`N/A`	`setup-dev.sh`	`setup-sut-kvm.sh`
`docker/compose`	`Optional`	`setup-dev.sh`	`setup-sut-kvm.sh`
`Kubernetes`	`Optional`	`setup-dev.sh`	`setup-sut-kvm.sh`

Use the following setup steps:
- Make sure Virtualization is turned on in your KVM host BIOS. - Run the setup-dev.sh script to setup the dev host.
- Run the setup-sut-kvm.sh script to setup the KVM host.

Use cmake -DTERRAFORM_SUT=kvm .. to use the kvm setup.

HINT: To execute KVM setup for a specific workload, run the following command such as cmake -DTERRAFORM_SUT=kvm -DPLATFORM=SPR -DBENCHMARK=OpenSSL3-RSAMB ../
Modify the IP address and credentials of your KVM host in script/terraform/terraform-config.kvm.tf.

ITEP Setup¶

To evaluate workloads with ITEP, you need to make sure ITEP is already deployed and status of edge cluster is already active.

Use the following setup steps:
- Run the [setup-itep.sh][setup-itep.sh-self] script to install all necessary prerequisites.

Use cmake -DTERRAFORM_SUT=itep .. to use the ITEP setup.

HINT: To execute ITEP setup for a specific workload, run the following command such as cmake -DTERRAFORM_SUT=itep -DPLATFORM=SPR -DBENCHMARK=OpenSSL3-RSAMB ../

Setup Scripts¶

Note: All scripts mentioned below, such as setup-dev.sh, setup-sut-kvm.sh, etc, are found inside script/setup/ and must be executed from there. - setup-dev.sh: Setup the dev host.
- setup-reg.sh: Setup a private docker registry.
- setup-sut-native.sh: Setup the SUT host for native workload execution.
- setup-sut-docker.sh: Setup the SUT host for docker/docker-compose workload execution.
- setup-sut-k8s.sh: Setup the SUT host for Kubernetes workload execution.
- setup-sut-kvm.sh: Setup the KVM host.
- [setup-itep.sh][setup-itep.sh-self]: Install all necessary prerequisites for ITEP.

setup-dev.sh¶

The setup-dev.sh script creates the host environment for building workloads and evaluating workloads. The script does not have any arguments and performs the following options underneath: - Install Intel certificates. - Install packages: cmake, make, m4, and gawk. These are minimum system requirements. - Install and setup docker on your host. - Add an alias ctest=./ctest.sh to your ~/.bashrc.

Usage: [options]

where options are: - --nodaemonize: Do not install daemonize. - --no-password: Do not ask for password. Use environment variable DEV_SUDO_PASSWORD where password is required.

The script requires sudo privilege. After the setup, for any changes to take effect, you must logout of the current ssh session and relogin the development host.

Setup-reg.sh¶

The setup-reg.sh script can help to create a private docker registry or a pass-through registry on the development host. The private docker registry can be used to save rebuilt docker images. A pass-through registry can provide a cached mirror to any official registries.

Usage: [options] <user>@<hostname|ip>[:port] [<user>@<host>...]

where options are: - --port: Specify the SSH port number. Default 22.
- --force: Force replacing any existing certificate.
- --no-password: Do not ask for password. Use environment variable DEV_SUDO_PASSWORD, SUT_SUDO_PASSWORD, and/or SUT_SSH_PASSWORD where password is required.

The mandated argument is the hostname or IP of the registry URL. The port is optional. The default port for a docker registry is 20666 and the default port for a pass-through registry is 20690. You can create both on the same host if required.

If a list of <user>@<host> is specified, the script will copy the registry certificate to the machines so that they can securely access the newly created docker registry.

Examples:

# Create a docker registry on my host xx.yy.zz
script/setup/setup-reg.sh xx.yy.zz

After the setup, you can set the REGISTRY value to be the registry URL, in the form of <hostname|ip>:<port>.

# Setup to use the docker registry
cd build
cmake -DREGISTRY=xx.yy.zz:20666 ..

setup-sut-native.sh¶

The setup-sut-native.sh script creates the host environment for baremetal native workload execution. The script takes the following arguments:

Usage: [options] <user@worker1-ip> [user@worker2-ip ...]

where options are: - --port: Specify the SSH port number. Default 22.
- --no-password: Do not ask for password. Use environment variable DEV_SUDO_PASSWORD, SUT_SUDO_PASSWORD, and/or SUT_SSH_PASSWORD where password is required.

setup-sut-docker.sh¶

The setup-sut-docker.sh script creates the host environment for docker execution. The script takes the following arguments:

Usage: [options] <user@worker1-ip>

where options are: - --port: Specify the SSH port number. Default 22.
- --nointelcert: Do not install Intel certificates.

setup-sut-k8s.sh¶

The setup-sut-k8s.sh script creates the host environment for Kubernetes. The script takes the following arguments:

Usage: [options] <user@controller-ip[:private_ip]> <user@worker1-ip[:private_ip]> [<user>@worker2-ip[:private_ip]> ...]

where options are: - --port: Specify the SSH port number. Default 22.
- --nointelcert: Do not install Intel certificates.
- --no-password: Do not ask for password. Use environment variable DEV_SUDO_PASSWORD, SUT_SUDO_PASSWORD, and/or SUT_SSH_PASSWORD where password is required.
- --worker: Switch to the worker group. - --client: Switch to the client group. - --controller: Switch to the controller group.

setup-sut-kvm.sh¶

The setup-sut-kvm.sh script creates the host environment for the KVM execution. The script takes the following arguments:

Usage: [options] <user@ip[:dev]> [<user@ip>[:dev] ...]

where options are: - --port: Specify the SSH port number. Default 22.
- --hugepage: Specify the hugepage setup as hugepagesz/hugepages, for example, 2M/8192 or 1G/16.
- --reboot: Reboot the SUT after the setup.
- --mtu <value>: Specify the VXLAN MTU value. - --dev <value>: Specify the VXLAN interface device. - --vxlan: Setup VXLAN among multiple KVM hosts. - --reset: Force re-installing KVM and VXLAN.
- --no-password: Do not ask for password. Use environment variable DEV_SUDO_PASSWORD, SUT_SUDO_PASSWORD, and/or SUT_SSH_PASSWORD where password is required.

setup-itep.sh¶

The [setup-itep.sh][setup-itep.sh] script install all necessary prerequisites for ITEP setup. The script takes the following arguments:

Usage: <ITEP_fqdn> <username>

User Guide (Enhanced Comprehensive)¶

Source Files¶

Collecting Results List Kpi¶

Options for list-kpi.sh¶

Executing Workload Cmake¶

Build examples¶

Customize the Build Process¶

Command Make Targets¶

See Also¶

Executing Workload Ctest¶

CTest Options¶

Customize Configurations¶

Benchmark Scripts¶

ctest.sh¶

Examples¶

Cloud SUT Reuse¶

Running Testcases using Testset YAML¶

Executing Workload Terraform Options¶

Stages¶

Terraform Configuration Parameters¶

CSP Common Parameters¶

VM Work Group Parameters¶

Data Disks Parameters¶

Ansible Configuration Parameters¶

Common Parameters¶

SUT Parameters¶

Containerd Parameters¶

Docker Parameters¶

Kubernetes Parameters¶

Trace Module Parameters¶

Instance Watch Parameters¶

Executing Workload Testcase¶

Naming patterns¶

Special Test Cases¶

Testcase Configurations¶

Preparing Infrastructure Setup Ai¶

Overview¶

AI Workload Naming¶

AI Test Case Naming¶

Configuration:¶

Intel ICX:¶

AMD Milan:¶

AMD Roma:¶

AWS Graviton2:¶

AWS Graviton3:¶

AWS Inferentia:¶

Nivida GPU:¶

Best Configuration:¶

Restriction¶

Node Labels¶

KPI output¶

Preparing Infrastructure Setup Auth¶

REGISTRY_AUTH¶

Docker¶

Pass¶

Authenticate to Cloud Private Registry¶

Amazon Elastic Container Registry¶

Google Cloud Container Registry¶

Azure Container Registry:¶

Preparing Infrastructure Setup Bios¶

Introduction¶

Probe BIOS Versions and Knobs¶

cluster.bios¶

Global Options¶

See Also¶

Preparing Infrastructure Setup Cdn¶

HW Prerequisites¶

OS configuration¶

K8S Labels configuaration¶

Network configuration¶

Storage configuration¶

QAT hardware configuration¶

Monitor runtime performance¶

Others¶

Preparing Infrastructure Setup Containerd¶

Installation¶

Setup Proxy¶

Setup Configuration Files¶

Setup Insecure Registries¶

Setup Data Storage¶

Options for `list-kpi.sh`¶

`ctest.sh`¶

`REGISTRY_AUTH`¶

`cluster.bios`¶