Developer Guide (Enhanced Comprehensive)¶

TL;DR: Deterministic comprehensive guide assembled from all mirrored source markdown files.

Developer Guide (Enhanced Comprehensive)

Source Files¶

Component Design Build¶

Source: doc/developer-guide/component-design/build.md

The build.sh script performs the build of a workload.

Since the process is standardized, there is usually no need to customize it. You can use the following template as is to call the ready-made build.sh from under the script folder:

#!/bin/bash -e

DIR="$(dirname "$(readlink -f "$0")")"
. "$DIR"/../../script/build.sh

Customizing with switches¶

In some cases, the script/build.sh can be customized as follows: - BUILD_FILES: Specify an array of base file names to be included in the build. - BUILD_OPTIONS: Specify any custom arguments to the docker build command. - BUILD_CONTEXT: Optionally specify a relative directory name or an array of relative directory names, where the Dockerfiles are located. By default, the Dockerfiles are assumed to be located directly under the workload directory. - FIND_OPTIONS: Specify any custom arguments to the find program to locate the set of Dockerfiles for building the docker images and for listing the BOMs.

Template Expansion¶

The build.sh script automatically performs template expansion if you have templates defined in either the .m4 format or the .j2 format, except those under the template directory.

For example, if you define Dockerfile.m4 or Dockerfile.j2, the script will expand the template to Dockerfile before building the docker images.

More about templating systems can be found under Templating systems.

Build Dependencies¶

If your workload depends on one of the common software stacks, invoke the corresponding software stack's build.sh.

For example, if your image depends on QAT-Setup, your build.sh can be something like below:

#!/bin/bash -e

DIR="$(dirname "$(readlink -f "$0")")"

# build QAT-Setup (added section)
STACK="qat_setup" "$DIR"/../../stack/QAT-Setup/build.sh $@

# build our image(s)
. "$DIR"/../../script/build.sh

Component Design Cluster Config¶

Source: doc/developer-guide/component-design/cluster-config.md

The cluster-config.yaml manifest describes the machine specification to run the workloads. The specification is still evolving and subject to change.

The following example describes a 3-node cluster to be used in some workload:

cluster:
  - labels: {}
  - labels: {}
  - labels: {}

The cluster-config.yaml consists of the following sections:

cluster: This section defines the post-Sil cluster configurations.

cluster.labels¶

The cluster.labels section describes any must have system level setup that a workload must use. The setup is specified in terms of a set of Kubernetes node labels as follows:

Label	Description
`HAS-SETUP-DATASET`	This set of labels specifies the available dataset on the host. See also: Dataset Setup.
`HAS-SETUP-DISK-AVAIL`	This set of labels probe the disk availablibility to ensure there is enough data space available for workload execution. See also: Disk Avail Setup.
`HAS-SETUP-DISK-SPEC`	This set of labels specify that SSD or NVME disks be mounted on the worker node(s). See also: [Storage Setup][Storage Setup].
`HAS-SETUP-HUGEPAGE`	This set of labels specify the kernel hugepage settings. See also: Hugepage Setup
`HAS-SETUP-MEMORY`	This label specifies the minimum memory required by the workload. See also: Memory Setup.
`HAS-SETUP-MODULE`	This set of labels specify the kernel modules that the workload must use. See also: Module Setup.

The label value is either required or preferred as follows:

cluster:
- labels:
    HAS-SETUP-HUGEPAGE-2048kB-2048: required

cluster.cpu_info¶

The cluster.cpu_info section describes any CPU-related constraints that a workload must use. The cpu_info section is currently declarative and is not enforced.

cluster:
- cpu_info:
    flags:
    - "avx512f"

where the CPU flags must match what are shown by lscpu or cat /proc/cpuinfo.

cluster.mem_info¶

The cluster.mem_info section describes any memory constraints that a workload must use. The mem_info section is currently declarative and is not enforced.

Please also use the Kubernetes [resource constraints][resource constraints] to specify the workload memory requirements.)

cluster:
- mem_info:
    available: 128

where the available memory is in the unit of GBytes.

cluster.vm_group¶

The cluster.vm_group section describes the worker group that this worker node belongs to. Each worker group is a set of SUTs of similar specification. If not specified, the worker group is assumed to be worker.

Enforced by the terraform backend.

cluster:
- labels: {}
  vm_group: client

cluster.off_cluster¶

The cluster.off_cluster section describes whether the worker node should be part of the Kubernetes cluster. This is ignored if the workload is not a Cloud Native workload or the execution is not through Kuberentes.

cluster:
- labels: {}
- labels: {}
  off_cluster: true

If not specified, all nodes are part of the Kubernetes cluster.

cluster.sysctls¶

The cluster.sysctls section describes the sysctls that the workload expects to use. The sysctls are specified per worker group. Multiple sysctls are merged together and applied to all the worker nodes in the same workgroup.

Enforced by the terraform backend.

cluster:
- labels: {}
  sysctls:
    net.bridge.bridge-nf-call-iptables: 1

cluster.sysfs¶

The cluster.sysfs section describes the sysfs or procfs controls that the workload expects to use. The controls are specified per worker group. Multiple controls are merged together and applied to all the worker nodes in the same workgroup.

Enforced by the terraform backend.

cluster:
- labels: {}
  sysfs:
    /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: performance

cluster.bios¶

The cluster.bios section describes the bios settings that the workload expects to use. The controls are specified per worker group. Multiple controls are merged together and applied to all the worker nodes in the same workgroup.

Enforced by the terraform backend.

cluster:
- labels: {}
  bios:
    SE5C620.86B:
      "Intel(R) Hyper-Threading Tech": Enabled          # Disabled
      "CPU Power and Performance Policy": Performance   # "Balanced Performance", "Balanced Power", or "Power"

cluster.msr¶

The cluster.msr section describes the msr register settings that the workload expects to use. The controls are specified per worker group. Multiple controls are merged together and applied to all the worker nodes in the same workgroup.

Enforced by the terraform backend.

cluster:
- labels: {}
  msr:
    0x0c90: 0x7fff
    0x0d10: 0xff

terraform¶

The terraform section overwrites the default configuration parameters of the terraform validation backend default. See Terraform Options for specific options.

terraform:
  k8s_cni: flannel

Note that any specified options in TERRAFORM_OPTIONS or by the CLI takes precedent. They will not be overriden by the parameters specified in this section.

Example of Enabling Kubernetes NUMA Controls¶

terraform:
  k8s_kubeadm_options:
    KubeletConfiguration:
      cpuManagerPolicy: static
      systemReserved:
        cpu: 200m
      topologyManagerPolicy: single-numa-node
      topologyManagerScope: pod
      memoryManagerPolicy: Static
      reservedMemory:
        - numaNode: 0
          limits:
            memory: 100Mi
      featureGates:
        CPUManager: true
        TopologyManager: true
        MemoryManager: true

Example of Enabling Kubernetes Per-Socket Topology Aware Controls¶

Below configuration enables topology aware scheduling - including hardware cores and socket awareness. Only integral values for CPU reservations are allowed and misconfiguration of k8s deployment will result in SMTAlignmentError. This should be considered advanced users only example. Required Kubernetes version of 1.26.1 or higher.

terraform:
  k8s_kubeadm_options:
    KubeletConfiguration:
      cpuManagerPolicy: static
      cpuManagerPolicyOptions:
        align-by-socket: "true"
        distribute-cpus-across-numa: "true"
        full-pcpus-only: "true"
      systemReserved:
        cpu: 1000m
      topologyManagerPolicy: best-effort
      topologyManagerPolicyOptions:
        prefer-closest-numa-nodes: "true"
      topologyManagerScope: pod
      memoryManagerPolicy: Static
      reservedMemory:
        - numaNode: 0
          limits:
            memory: 100Mi
      featureGates:
        CPUManager: true
        CPUManagerPolicyAlphaOptions: true
        CPUManagerPolicyBetaOptions: true
        CPUManagerPolicyOptions: true
        MemoryManager: true
        TopologyManager: true
        TopologyManagerPolicyAlphaOptions: true
        TopologyManagerPolicyBetaOptions: true
        TopologyManagerPolicyOptions: true

Component Design Cmakelists¶

Source: doc/developer-guide/component-design/cmakelists.md

The CMakeLists.txt defines actions in cmake: build and test. It contains a set of directives and instructions that describe project's source files and targets.

Writing the definition for workload¶

Let us start with a simple example:

add_workload("bert_large")
add_testcase(${workload})

The add_workload function defines cmake build rules for a workload. The name must be lower cased as a convention and does not contain any special characters with the exception of _. It is recommended to append the version info to indicate the implementation versioning. The function also defines a parent-scope variable workload with the same name that any subsequent function can use.

The add_testcase function defines a test case. You may define multiple test cases, each with a unique name and some configuration parameters. Internally, this gets routed to the validate.sh script with the specified parameters. (There is no argument in the above example.) The validation results are saved to the corresponding logs-$workload directory under the build tree. See also: Workload Testcases.

Note that the name of any cmake target must be unique across all workloads. Thus it is usually a concatenation of platform, feature, workload and configuration.

Licensing Terms¶

If the workload requires the user to agree to any license terms, use the check_license function. The function prompts the user for license agreement and then saves the decision. If the user denies the license terms, the workload will be skipped during the build process. If there are multiple license terms, you can write as many check_license functions as needed.

check_license("media.xiorg.com" "Please agree to the license terms for downloading datasets from xiorg.com")
add_workload("foo" LICENSE "media.xiorg.com")

Fixed or Negative SUT¶

If the workload can only run on specific SUT (System Under Test), in this case azure, specify the SUT constraints as part of the add_workload function as follows:

add_workload("foo" SUT azure)

where the azure SUT must be defined with script/terraform/terraform-config.<sut>.tf.

You can also specify a negative SUT name to remove the SUT type from selection. This will match all possible SUTs, except aws:

add_workload("foo" SUT -aws)

Software Stack CMakeLists.txt¶

CMakeLists.txt for software stacks defines the software stack build and test targets. Let us start with a simple example:

add_stack("qat_setup")
add_testcase(${stack})

The add_stack function defines cmake build rules for a workload. The name must be lower cased as a convention and does not contain any special characters with the exception of _. It is recommended to append the version info to indicate the implementation versioning. The function also defines a parent-scope variable stack with the same name that any subsequent function can use.

The add_testcase function defines a test case. You may define multiple test cases, each with a unique name and some configuration parameters. Internally, this gets routed to the validate.sh script with the specified parameters. (There is no argument in the above example.) The validation results are saved to the corresponding logs-$stack directory under the build tree.

Note that the name of any cmake target must be unique across all workloads. Thus it is usually a concatenation of platform, feature, stack and configuration.

Similar to workload CMakeLists.txt, you can also use the check_git_repo function, the check_license function, and the SUT/LICENSE constraints.

Component Design Compose Config¶

Source: doc/developer-guide/component-design/compose-config.md

The compose-config.yaml script is a manifest that describes how the workload container(s) should be scheduled (to the machine cluster described by cluster-config.yaml.) This is the standard docker-compose script.

You can choose to write compose-config.yaml in any of the following formats: - compose-config.yaml: For simple workloads, you can directly write the docker-compose script.
- compose-config.yaml.m4: Use the .m4 template to add conditional statements in the docker-compose script.
- compose-config.yaml.j2: Use the .j2 template to add conditional statements in the docker-compose script.

Image Name¶

The container image in compose-config.yaml should use the full name in the format of <REGISTRY><image-name><RELEASE>, where <REGISTRY> is the docker registry URL (if any) and the <RELEASE> is the release version, (or :latest if not defined.)

If you use the .m4 template, the IMAGENAME macro can expand an image name to include the registry and release information:

include(config.m4)
...
services:
  dummy-benchmark:
    image: IMAGENAME(Dockerfile)
...

where dummy-benchmark must match what defined in JOB_FILTER.

If you use the .j2 template, you must write the image name as follows:

...
services:
  dummy-benchmark:
    image: "{{ RELEASE }}dummy{{ RELEASE }}"
...

Component Design Docker Config¶

Source: doc/developer-guide/component-design/docker-config.md

The docker-config.yaml script is a manifest that describes how to schedule the workload container(s) on multiple hosts (described by cluster-config.yaml.)

You can choose to write docker-config.yaml in any of the following formats: - docker-config.yaml.m4: Use the .m4 template to add conditional statements in the docker-config script.
- docker-config.yaml.j2: Use the .j2 template to add conditional statements in the docker-config script.

DOCKER-CONFIG Format¶

The docker-config.yaml uses the following syntax:

worker-0:
- image: "{{ REGISTRY }}image-name{{ IMAGESUFFIX }}{{ RELEASE }}"
  options:
  - -e VAR1=VALUE1
  - -e VAR2=VALUE2
  command: "/bin/bash -c 'echo hello world'"
  export-logs: true

where - The top level keys are the SUT hostnames. The number of the SUT hosts must match what is specified in cluster-config.yaml. The SUT hosts are named against their SUT workgroup. For example, for the workers, the SUT hosts are named as worker-0, worker-1, etc. For clients, the SUT hosts are named as client-0, client-1, etc.
- The value of each SUT host is a list of containers to be scheduled on the SUT host. The list order is not enforced.
- Each container is described as a dictionary of - image: Specify the full docker image name - options: Specify the docker run command line arguments, as a string or a list. - command: Optional. Specify any startup command. This will overwrite whatever is defined in the docker image. - export-logs or service-logs: Optional. Specify whether logs should be collected on the container.

The script will first collect logs on containers whose export-logs is true, which also signals that the workload execution is completed. Then collect logs on containers whose service-logs is true. export-logs and service-logs are exclusive options and can not both be true.

Test Time Considerations¶

At test time, the validation script launches the containers described in docker-config.yaml, for example, 2 containers on worker-0 and 1 on worker-1. The launch order is not enforced thus the workload must implement alternative locking mechanism if the launch order is important.

If docker-config.yaml exists, the settings will take precedent over DOCKER_IMAGE and DOCKER_OPTIONS, specified in validate.sh.

To faciliate SUT-level network communication, the list of all SUT private IP addresses are provided to each container runtime as environment variables, for example, WORKER_0_HOST=10.20.30.40, WORKER_1_HOST=20.30.40.50, CLIENT_0_HOST=30.40.50.60, etc. The workload can then use the IP addresses to setup services and communicate among the SUT hosts.

Component Design Dockerfile¶

Source: doc/developer-guide/component-design/dockerfile.md

The workload Dockerfile must meet certain requirements to facilitate image build, validation execution and data collection.

Use Template¶

You can use m4 template in constructing Dockerfiles, which avoids duplication of identical steps. Any files with the .m4 suffix will be replaced with the corresponding files without the suffix, during the build process.

Set Build Order¶

If there are multiple Dockerfiles under the workload directory, the build order is determined by the filename pattern of the Dockerfile: Dockerfile.[1-9].<string>. The bigger the number in the middle of the filename, the earlier that the build script builds the Dockerfile. If there are two Dockerfiles with the same number, the build order is platform-specific.

Filename:

Dockerfile.1.xyz

Specify Image Name¶

The first line of the Dockerfile is used to specify the docker image name, as follows:

Note: If optional # syntax= line is added, it should preceed the name line.

Final images, that are pushed to the docker registry:

# resnet_50
...

Intermediate images, that are not pushed to the docker registry:

## resnet_50_model

Output:

REPOSITORY            TAG
resnet_50             latest
resnet_50_model       latest

Note: Image's TAG may differ based on the RELEASE setting. If unspecified, latest is used.

Note: For ARMv* platforms, the image names will be appended with an -arm64 suffix, so that they can coexist with x86 platform images on the same host.

Naming Convention:¶

As a convention, the image name uses the following pattern: [<platform>-]<workload>-<other names>, and it must be unique. The platform prefix is a must have if the image is platform specific, and optional if the image can run on any platform.

List Ingredients¶

Any significant ingredients used in the workload must be marked with the ARG statement, so that we can easily list ingredients of a workload, for example:

ARG IPP_CRYPTO_VER="ippcp_2020u3"
ARG IPP_CRYPTO_REPO=https://github.com/intel/ipp-crypto.git
#...

The following ARG suffixes are supported: - _REPO or _REPOSITORY: Specify the ingredient source repository location. - _VER or _VERSION: Specify the ingredient version. - _IMG or _IMAGE: Specify an ingredient docker image. - _PKG or _PACKAGE: Specify an ingredient OS package, such as deb or rpm.

_VER and the corresponding _REPO/_PACKAGE/_IMAGE must be in a pair to properly show up in the Wiki ingredient table. For example, if you define OS_VER, then there should be an OS_IMAGE definition.

Export Status & Logs¶

It is the workload developer's responsibility to design how to start the workload and how to stop the workload. However, it is a common requirement for the validation runtime to reliably collect execution logs and any telemetry data for analysing the results.

Export to FIFO¶

The workload image must create a FIFO under /export-logs path, and then archive:

The workload exit code (in status) > Note: Any exit code different than 0 returned in status defines a failed execution.
and any workload-specific logs, which can be used to generate performance indicators

Note: Path to FIFO can be overwrote from /export-logs, by setting the EXPORT_LOGS=/my/custom/path variable in validate.sh to point an absolute path to the FIFO inside the container.

For example:

RUN mkfifo /export-logs
CMD (./run-workload.sh; echo $? > status) 2>&1 | tee output.logs && \
    tar cf /export-logs status output.logs && \
    sleep infinity

RUN mkfifo /export-logs creates a FIFO for logs export;
CMD executes and collects logs:
1. (./run-workload.sh; executes workload;
2. echo $? > status) sends exit code to status;
3. 2>&1 points standard error output to standard output;
4. | tee output.logs sends the output to both terminal and output.logs file;
5. tar cf /export-logs status output.logs creates a tarball archive with status and output.logs inside the /export-logs queue;
6. sleep infinity is mandatory to hold the container for logs retrieval.

Alternatively, a list of files can be echoed to /export-logs, for example:

RUN mkfifo /export-logs
CMD (./run-workload.sh; echo $? > status) 2>&1 | tee output.logs && \
    echo "status output.logs" > /export-logs && \
    sleep infinity

The difference is only within point 5 of CMD: echo "status output.logs" > /export-logs sends the list of files to the queue.

Import from FIFO¶

The validation backend (script/validate.sh) imports the logs data through the FIFO, as follows for any docker execution:

# docker
docker exec <container-id> sh -c 'cat /export-logs > /tmp/tmp.tar; tar tf /tmp/tmp.tar > /dev/null && cat /tmp/tmp.tar || tar cf - $(cat /tmp/tmp.tar)' | tar xf -

# kubernetes
kubectl exec <pod-id> sh -c 'cat /export-logs > /tmp/tmp.tar; tar tf /tmp/tmp.tar > /dev/null && cat /tmp/tmp.tar || tar cf - $(cat /tmp/tmp.tar)' | tar xf -

The above command blocks, when the workload execution is in progress, and exits, after the workload is completed (thus it is time for cleanup).

`ENTRYPOINT` reserved feature¶

Do not use ENTRYPOINT in the Dockerfile. This is a reserved feature for future extension.

Workaround the `software.intel.com` proxy issue¶

The Intel proxy setting includes intel.com in the no_proxy setting. This is generally an ok solution but software.intel.com is an exception, which must go through the proxy. Use the following workaround on the specific command that you need to bypass the intel.com restriction:

RUN no_proxy=$(echo $no_proxy | tr ',' '\n' | grep -v -E '^.?intel.com$' | tr '\n' ',') yum install -y intel-hpckit

Component Design Image¶

Source: doc/developer-guide/component-design/image.md

The document describes how to build VM images with HashiCorp's Packer. The projects are located under the image directory.

Each folder under image equates to a VM image project. For an example, look at image/HostOS.

Prerequisites¶

This document does not cover the required WSF preconfiguration. Follow the required steps to:

Configure the WSF environment instructions
Configure your Terraform backend. If you haven't setup terraform, please follow the instructions to setup terraform for Cloud validation.

Navigating the WSF VM Image folder structures¶

./images : Contains the VM image projects. Each folder is a VM image project.

./script/terraform/ : Contains the Terraform configuration files for each CSP. For example: terraform-config.azure.tf contains the Azure variables, including the os_type = "ubuntu2204"

./script/terraform/ : Custom files can be created to be used during cmake, for example, create terraform-config.my-custom-azure.tf and use -DTERRAFORM_SUT=my-custom-azure .

./script/terraform/template/packer/<csp>/generic : Includes Packer files, including VM Image Offer/Publisher/SKU mapping.

Note that only one variable os_type that is defined in the terraform-config file is required, the Offer/Publisher are automatically set based on the os_type. See the mapping here: ./script/terraform/template/packer/<csp>/generic

Getting Started¶

Overview on how to build VM images using the WSF framework.

Building VM Images¶

Assuming you have configured WSF environment and Terraform backend, you can start building VM images.

Below is an example of how to build a VM image using the HostOS project.

# Clone the repo and create a build directory
 git clone https://github.com/intel/workload-services-framework.git wsf-fork
 cd wsf-fork
 mkdir build
 cd build

# Run cmake to build the project. This uses the HostOS project under images folder, and Azure configuration under script/terraform
cmake -DBENCHMARK=image/HostOS -DPLATFORM=SPR -DTERRAFORM_SUT=azure ..

# Build the Terraform containers/configuration
make build_terraform

# Log into Azure container, login to Azure, and exit
make azure
az login 
exit

# Run the 'make' command to start VM Image creation process
make

Once the process is complete, you should have a new VM Image in your Azure account(or CSP of choice)

Note how the cmake command specifies the DBENCHMARK, DPLATFORM, and DTERRAFORM_SUT variables.

DBENCHMARK : Specifies the VM Image project to be used. In this case, it is image/HostOS.

DTERRAFORM_SUT: Specifies the CSP configuration to be used. These configuration files are located at ./script/terraform/*.tf

For example ./script/terraform/terraform-config.azure.tf is used for Azure.

Custom files can be created as used during cmake, for example terraform-config.my-custom-azure.tf and specify DTERRAFORM_SUT=my-custom-azure to use it.

Cleanup process¶

# From the 'build' folder, log into the Azure container
make azure

# To cleanup just the terraform files
cleanup 

# To cleanup images
cleanup --images 

# Exit Azure container
exit

CMakeLists.txt¶

Make the project depends on the terraform backend, and use the add_image function to declare the VM image project name as follows:

if (BACKEND STREQUAL "terraform")
  add_image("my_vm_image")
endif()

build.sh - Configuration of Image creation¶

build.sh can be used to pass custom values to variables for the image creation process.

In the build.sh script, you should specify the project variables and call the script/terraform/packer.sh script to build the VM images.

The packer.sh takes the following arguments:

Usage: <project-name> $@

The project name defines where the packer script location, expected to be under template/packer/<csp>/<project-name>, where <csp> is the Cloud Service Provider.

The script defines the following environment variables where you can include in your project definitions:

OWNER: The owner string.
REGION: The region string.
ZONE: The availability zone string.
NAMESPACE: The randomly generated namespace string of the current packer run.
INSTANCE_TYPE: The CSP instance type.
SPOT_INSTANCE: A boolean value to specify whether the build should use a spot instance.
OS_DISK_TYPE: The OS disk type.
OS_DISK_SIZE: The OS disk size.
ARCHITECTURE: The architecture: x86_64, amd64, or arm64.
SSH_PROXY_HOST: The socks5 proxy host name.
SSH_PROXY_PORT: The socks5 proxy host port value.
OS_IMAGE: The os_image value.

An example of build.sh may look like the following:

#!/bin/bash -e

COMMON_PROJECT_VARS=(
    'owner=$OWNER'
    'region=$REGION'
    'zone=$ZONE'
    'job_id=$NAMESPACE'
    'instance_type=$INSTANCE_TYPE'
    'spot_instance=$SPOT_INSTANCE'
    'os_disk_type=$OS_DISK_TYPE'
    'os_disk_size=$OS_DISK_SIZE'
    'architecture=$ARCHITECTURE'
    'ssh_proxy_host=$SSH_PROXY_HOST'
    'ssh_proxy_port=$SSH_PROXY_PORT'
    'image_name=wsf-${OS_TYPE}-${ARCHITECTURE}-dataset-ai'
    'ansible_playbook=../../../ansible/custom/install.yaml'
)

DIR="$( cd "$( dirname "$0" )" &> /dev/null && pwd )"
. "$DIR"/../../script/terraform/packer.sh generic $@

Optionally, you can also define CSP-specific variables, which will be merged with the common variables when running packer.sh:

AWS_PROJECT_VARS=(
    'subnet_id=$SUBNET_ID'
    'security_group_id=$SECURITY_GROUP_ID'
)

GCP_PROJECT_VARS=(
    'subnet_id=$SUBNET_ID'
    'project_id=$PROJECT_ID'
    'min_cpu_platform=$MIN_CPU_PLATFORM'
    'firewall_rules=$FIREWALL_RULES'
)

AZURE_PROJECT_VARS=(
    'subscription_id=$SUBSCRIPTION_ID'
    'availability_zone=$AVAILABILITY_ZONE'
    'network_name=$NETWORK_NAME'
    'subnet_name=$SUBNET_NAME'
    'managed_resource_group_name=$RESOURCE_GROUP_NAME'
)

ORACLE_PROJECT_VARS=(
    'subnet_id=$SUBNET_ID'
    'compartment=$COMPARTMENT'
    'cpu_core_count=$CPU_CORE_COUNT'
    'memory_size=$MEMORY_SIZE'
)

TENCENT_PROJECT_VARS=(
    'vpc_id=$VPC_ID'
    'subnet_id=$SUBNET_ID'
    'resource_group_id=$RESOURCE_GROUP_ID'
    'security_group_id=$SECURITY_GROUP_ID'
    'os_image_id=$OS_IMAGE_ID'
)

ALICLOUD_PROJECT_VARS=(
    'vpc_id=$VPC_ID'
    'resource_group_id=$RESOURCE_GROUP_ID'
    'security_group_id=$SECURITY_GROUP_ID'
    'vswitch_id=$VSWITCH_ID'
    'os_image_id=$OS_IMAGE_ID'
)

KVM_PROJECT_VARS=(
    'kvm_host=$KVM_HOST'
    'kvm_host_user=$KVM_HOST_USER'
    'kvm_host_port=$KVM_HOST_PORT'
    'pool_name=${KVM_HOST_POOL/null/osimages}' # Must exist
    'os_image=null'
)

The image building Ansible playbooks can also be applied directly to a static worker:

STATIC_PROJECT_VARS=(
    'ssh_port=$SSH_PORT'
    'user_name=$USER_NAME'
    'public_ip=$PUBLIC_IP'
    'private_ip=$PRIVATE_IP'
)

Ansible playbooks¶

You should write custom installation scripts in Ansible playbooks, usually located under template/ansible/<project-name>/install.yaml. This location can be overwritten if you specify ansible_playbook in build.sh.

See basevm-generic for an example of creating a debian11 VM images with updated kernel 5.19.

More information about Ansible playbooks can be found in Ansible playbooks documentation.

Declare Ingredients in Ansible playbooks¶

Any component ingredients should be declared in defaults/*.yaml or defaults/*.yml:

qpl_version: 1.28
qpl_repository: https://github.com/intel/qpl.git

where you can use the pairs of _version + _package or _version + _repository.

_version, _ver: Declare the ingredient version.
_repository, _repo: Declare the ingredient repository.
_package, _pkg: Declare the package location.

Note: The name of the variable is case-insensitive. Meaning, for example _version can be defined as _VERSION.

Component Design Kpi¶

Source: doc/developer-guide/component-design/kpi.md

The kpi.sh script parses the validation output and exports a set of key/value pairs to represent the workload performance.

Output format¶

The following is some example of the KPI data:

# this is a test                        ## Optional comments
## threads: 4                           ## Tunable parameters overwrite
throughput: 123.45                      ## Simple key/value
throughput (op/s): 123.45               ## Key, unit (in parentheses) and value
*throughput (images/s): 123.45          ## Primary KPI for regression reporting
throughput: 123.45  # This is a tooltip ## Comment shown as toolkit in UI

Please note that it is crucial that the decimal separator is a point (.), not a comma (,).

Parsing the output¶

To avoid introducing additional software dependencies, it is recommended to use gawk to parse the validation logs and format the output.

The validation output is assumed to be stored at 1 layer under the current directory. The kpi.sh example is as follows:

#!/bin/bash -e

awk '
{
   # KPI parsing script
}
' */output.logs 2>/dev/null || true

where 2>/dev/null suppresses any error message if */output.logs does not exist, and ||true makes the kpi.sh always returns an ok status.

Check The GNU AWK User's Guide for more information on how to write a parsing script using gawk.

Component Design Kubernetes Config¶

Source: doc/developer-guide/component-design/kubernetes-config.md

The kubernetes-config.yaml script is a manifest that describes how the workload container(s) should be scheduled (to the machine cluster described by cluster-config.yaml.) This is the standard Kubernetes script.

Templating possibilities¶

You can choose to write kubernetes-config.yaml in any of the following formats: - kubernetes-config.yaml: For simple workloads, you can directly write the Kubernetes deployment scripts.
- kubernetes-config.yaml.m4: Use the .m4 template to add conditional statements in the Kuberentes deployment scripts.
- kuberentes-config.yaml.j2: Use the .j2 template to add conditional statements in the Kubernetes deployment scripts.
- helm charts: For complex deployment scripts, you can use any helm charts under the helm directory.

Image Name¶

The container image in kubernetes-config.yaml should use the full name in the format of <REGISTRY><image-name><IMAGESUFFIX><RELEASE>, where <REGISTRY> is the docker registry URL (if any), <IMAGESUFFIX> is the platform suffix, and the <RELEASE> is the release version, (or :latest if not defined.)

If you use the .m4 template, the IMAGENAME macro can expand an image name to include the registry and release information:

include(config.m4)
...
spec:
...
    spec:
      containers:
      - name: database
        image: IMAGENAME(wordpress5mt-defn(`DATABASE'))
...

If you use the .j2 template or helm charts, you must write the image name as follows:

...
spec:
...
    spec:
      containers:
      - name: database
        image: "{{ REGISTRY }}wordpress5mt-{{ DATABASE }}{{ IMAGESUFFIX }}{{ RELEASE }}"
...

About `imagePullPolicy`¶

To ensure that the validation runs always on the latest code, it is recommended to use imagePullPolicy: Always.

Not all docker images are built equally. Some are less frequently updated and less sensitive to performance. Thus it is preferrable to use imagePullPolicy: IfNotPresent.

About `podAntiAffinity`¶

To spread the pods onto different nodes, use podAntiAffinity as follows:

...
    metadata:
      labels:
        app: foo
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - foo
              topologyKey: "kubernetes.io/hostname"
...

If you use the .m4 template, you can use the PODANTIAFFINITY macro:

...
    metadata:
      labels:
        app: foo
    spec:
       PODANTIAFFINITY(preferred,app,foo)
...

If you use the .j2 template or helm charts, there is no convenient function for above. You have to write the podAntiAffinity terms in explicit.

Component Design Native Script¶

Source: doc/developer-guide/component-design/native-script.md

Terraform scripts can be used to provision any Cloud SUT instances. Usually, Cloud provisioning needs are common among many workloads thus the terraform scripts are shared under [script/terraform/template/terraform][terraform template]. However, if a workload provides custom terraform scripts under workload/<workload>/template/terraform/<CSP>/main, where <workload> is the workload name and <CSP> is the Cloud provider abbreviation, the specified terraform scripts will be used instead to provision the SUT instances.

Custom Terraform scripts¶

Custom terraform scripts should be used for unique provisioning requirements only. It is not recommended to utilize custom terraform scripts for trivial needs, which as a result duplicates common code.

If provided, the custom terraform scripts of a workload must be implemented as follows: - The scripts must implement the same set of input variables such that they can be used (as a module) by [script/terraform/terraform-config.<CSP>.tf][terraform config csp]. For example, the scripts must take region zone, and owner as input variables.
- The scripts must implement the same set or a superset of output variables as required by [script/terraform/terraform-config.<CSP>.tf][terraform config csp]. For example, the scripts must export the SUT instance public and private IPs. - The scripts must properly tag the Cloud resources with owner: <name> so that the CSP cleanup script can cleanup the corresponding Cloud resources.

See [HammerDB TPCC PAAS][HammerDB TPCC PAAS] for an example of customizing terraform scripts.

Ansible Scripts¶

Ansible scripts can be used to customize the workload host setup, workload execution, and cleanup.

flowchart LR;
  installation[Installation];;
  deployment[Deployment];;
  cleanup[Cleanup];;
  installation --> deployment --> cleanup;

Please observe the following requirements when writing your custom Ansible scripts: - If specified, the installation playbook should be at template/ansible/custom/installation.yaml, relative to the workload directory. Additional roles can be present underneath the custom/roles directory. The installation playbook must install all benchmark software onto the SUTs. - If specified, the cleanup Ansible playbook should be at template/ansible/custom/cleanup.yaml, relative to the workload directory. The cleanup script should remove any installed software and restore the system environment.
- If specified, the workload execution laybook should be at template/ansible/custom/deployment.yaml, relative to the workload directory. Additional roles can be present underneath the custom/roles directory. The deployment playbook typically implements the following features:

flowchart TB;
  start((Start));;
  exec[Execution];;
  stop((End));;
  start --> exec --> | itr | exec --> stop;

The start stage prepares the workload execution. For example, create and copy the execution scripts to the SUT. Note that any software installation should be done in installation.yaml, not in deployment.yaml. The execution stage runs the workload multple times (as specified by the iteration number wl_run_iterations.)

Within each iteration, perform the following steps:
- Invoke the timing role to record the workload start timing.
- Run the workload. Save the workload status and logs under {{ wl_logs_dir }}/itr-{{ itr }}/<pod>, where <pod> is an arbitrary directory that identifies the benchmark pod.
- During the workload execution, invoke the trace role such that telemetry can be collected during the workload execution.
- Invoke the timing role to record the workload stop timing.

# deployment.yaml
- hosts: "{{ ('controller' in groups) | ternary('controller','localhost') }}"
  gather_facts: no
  become: false
  tasks:

    - name: run workloads over iterations
      include_role:
        name: deployment
      when: (ansible_connection|default('ssh')) in ['ssh','local']
      loop: "{{ range(1, run_stage_iterations | default(1) |int+1, 1) | list }}"
      loop_control:
        loop_var: itr

# roles/deployment/tasks/main.yaml
- include_role:
    name: timing
    tasks_from: start-iteration

- name: run the workload
...

- name: invoke traces
  include_role:
    name: trace
  vars:
    trace_waitproc_pid: "{{ workload_process_pid }}"
    trace_logs_scripts: "..."

- include_role:
    name: timing
    tasks_from: stop-iteration

- name: collect trace data
  include_role:
    name: trace
    tasks_from: collect
  when: wl_trace_modules | default('') | split(',') | reject('==','') | length > 0
  ignore_errors: yes
  run_once: true

- name: "collect workload logs under {{ wl_logs_dir }}/itr-{{ itr }}/benchmark"
  file:
    path: "{{ wl_logs_dir }}/itr-{{ itr }}/benchmark"
    state: directory
  delegate_to: localhost
  run_once: true
...

The Ansible playbooks can use the following workload parameters:

Name	Description
`wl_name`	The workload name.
`wl_namespace`	A unique identifier to identify this workload execution.
`wl_run_iterations`	Specify how many times a workload must execute in sequence.
`workload_config`	The content of the workload-config yaml file.

List Ingredients¶

Any significant ingredients used in the workload must be declared with a matched pair of variable definitions in defaults/main.yaml or defaults/main/*.yaml. so that we can easily list ingredients of a workload, for example:

IPP_CRYPTO_VER: "ippcp_2020u3"
IPP_CRYPTO_REPO: "https://github.com/intel/ipp-crypto.git"
...

The following suffixes are supported: - _REPO or _REPOSITORY: Specify the ingredient source repository location. - _VER or _VERSION: Specify the ingredient version. - _IMG or _IMAGE: Specify an ingredient docker image. - _PKG or _PACKAGE: Specify an ingredient OS package, such as deb or rpm.

_VER and the corresponding _REPO/_PACKAGE/_IMAGE must be in a pair to properly show up in the Wiki ingredient table. For example, if you define OS_VER, then there should be an OS_IMAGE definition. Avoid complicated jinja templates in the variable definitions. For parsing ingredient purpose, the only supported jinja pattern is {{ variable }}.

Ansible Script Examples¶

See [SpecSERT][SpecSERT] for an example of customzing Ansible scripts.

Ansible Resources¶

Ansible Inventory¶

The following ansible inventory groups can be used in the scripts:

Name	Description
`workload_hosts`	This group contains the VM hosts that are used to run the workloads. If Kubernetes is used, the group refers to VM workers that are within the Kubernetes cluster.
`cluster_hosts`	If Kubernetes is used, this group refers to all VM workers and the Kubernetes controller.
`off_cluster_hosts`	This group refers to VM hosts that are outside the Kubernetes cluster.
`trace_hosts`	This group includes all VM hosts that must collect performance traces.
`controller`	This is the Kubernetes controller group.
`worker`, `client`, etc	These are the workload VM groups.
`vsphere_hosts`, `kvm_hosts`, etc	These groups contain the physical hosts where the workload VMs reside.

Avoid using hosts: all in your custom ansible scripts, as the hosts may refer to different types of hosts, for example, workload VMs or the physical hosts where the VMs reside. Use hosts: cluster_hosts:off_cluster_hosts to refer to all workload VMs.

The following code paths are available to ansible scripts:

Directory	Description
`/opt/workspace`	Map to the logs directory of the current workload.
`/opt/workload`	Map to the source directory of the current workload.
`/opt/<backend>`	Map to the backend-specific resources. For terraform, this is `/opt/terraform`, which maps to `<project>/script/terraform`.
`/opt/project`	Map to the root directory of the WSF code base.

Avoid using /opt/project to access any backend-specific resources. Use /opt/<backend> instead. This makes it possible to selectively use either the resources within the backend container or the current code in the repository.

Common Ansible Scripts¶

The following ansible roles/tasks can be invoked for common functionalities:

containerd: This role installs the containerd engine.
docker: This role installs the docker engine.
trace: The trace role starts the trace procedure and waits for the workload to complete.
trace_waitproc_pid: The pid of the workload process.
trace_logs_scripts: The list of commands to show service logs. The logs will be piped to the trace procedure for determining ROI triggers.
trace_logs_host: The host that the trace_logs_scripts scripts run on. This is optional.
trace_status_file: Specify a status file that contains the workload return status code.

Please note that the trace role should not be executed on multiple hosts. Use run_once: true to restrict the execution to the first host:

  - include_role:
      name: trace
    run_once: true

  Once the trace role completes, you can retrieve the trace results by invoking the `collect` task of the `trace` role:

- include_role: name: trace tasks_from: collect run_once: true

- **`docker-image`**: The `docker-image` role copies a set of docker images to a remote docker daemon (`to-daemon.yaml`) or a remote registry (`to-registry.yaml`):
  - `wl_docker_images`: The dictionary of docker images. The keys are the image names and the values are boolean, true if the images are from a secured docker registry. If unsure, set to `true`.  
- **`timing`**: The `timing` role records the start/stop timing of various workload stages with the following tasks to be invoked with:
  - **start/stop-setup**: Record the setup timing.  
  - **start/stop-image-transfer**: Record the image transfer timing.  
  - **start/stop-iteration**: Record the workload iteration timing.  
  - **start/stop-roi**: Record the workload trace ROI timing.  

[terraform template]: ../../../script/terraform/template/terraform
[terraform config csp]: ../../../script/terraform/terraform-config.aws.tf
[HammerDB TPCC PAAS]: ../../../workload/HammerDB-TPCC-PAAS
[SpecSERT]: ../../../workload/HammerDB-TPCC-PAAS/README.md

---

## Component Design North Traffic

_Source: `doc/developer-guide/component-design/north-traffic.md`_

##### Introduction

This article describes the technique used to create workloads with North-South traffic, i.e., traffic that enters and leaves a Kubernetes cluster. The 
implementation is based on the terraform backend.  

```mermaid
flowchart TD;;
   client1[client 1];;
   client2[client 2];;
   controller[Kubernetes<br>Controller];;
   sut1[Kubernetes Worker 1];;
   sut2[Kubernetes Worker 2];;
   sut3[Kubernetes Worker 3];;
   client1 & client2 <--> controller;;
   controller <--> sut1 & sut2 & sut3;;

Request Off-Cluster Nodes¶

You can request one or many worker nodes to be off the Kubernetes cluster as follows in cluster-config.yaml.m4:

cluster:
- labels: {}
- labels: {}
  off_cluster: true

where the off_cluster option indicates that the requested node is not part of the Kubernetes cluster.

Optionally, you can define any variables that you might want to pass to the ansible scripts, under terraform:

# cluster-config.yaml.m4

terraform:
  off_cluster_docker_image_name: my-docker-image:latest

Install Software on Off-Cluster Nodes¶

Unlike Kubernetes workers, where the terraform scripts apply the Kubernetes deployment scripts directly, you have to write custom ansibles to install any native software on the off-cluster nodes as follows:

You can overwrite the template/ansible/kubernetes/installation.yaml script to install any custom software on the off-cluster nodes:

- import_playbook: installation.yaml.origin

- hosts: worker-1
  become: yes
  gather_facts: no
  tasks:

  - name: Install docker
    include_role:
      name: docker

- hosts: worker-1
  gather_facts: no
  tasks:

  - name: Transfer client image
    include_role:
      name: image-to-daemon
    vars:
      images:
      - key: "{{ off_cluster_docker_image_name }}"
        value: false
      wl_docker_images: "{{ images | items2dict }}"

where the above ansible script installs docker on the off-cluster node and then transfers the custom docker image to the node. Here wl_docker_images is a dictionary of docker images, where the keys are the docker images and the values are either true or false, indicating whether the docker images are from a secured or unsecured docker registry. Use false if you are not sure.

Execute Software on Off-Cluster Nodes¶

You should manage the native software execution on the off-cluster nodes, by inserting ansible snippets into the regular Kubernetes execution process, managed by template/ansible/kubernetes/deployment.yaml and template/ansible/kubernetes/roles/deployment/tasks/process-traces-and-logs.yaml, whereas the former controls the entire process and the later performs trace and log collection during the workload execution.

Overwrite template/ansible/kubernetes/deployment.yaml if you need to do any preparation work, for example, to initialize the Kubernetes deployment script with real cluster IP address, as follows:

- hosts: localhost
  gather_facts: no
  tasks:

  - name: rewrite cluster IP
    replace:
      path: "{{ wl_logs_dir }}/kubernetes-config.yaml.mod.yaml"
      regexp: "127.0.0.1"
      replace: "{{ hostvars['controller-0']['private_ip'] }}"

- import_playbook: deployment.yaml.origin

where 127.0.0.1 is a place holder IP address.

Overwrite the template/ansible/kubernetes/roles/deployment/tasks/process-traces-and-logs.yaml script to insert off-cluster software execution.

Depending on the workload design, if the off-cluster node is used simply as a client simulator and the workload traces and logs are within the Kubernetes cluster, then you can simply start the off-cluster node and then release the control back to the terraform original code:

# This step should not block as the Kubernetes services may not be up and running yet.   
- name: start off-cluster-node
  command: "docker run --rm -d {{ off_cluster_docker_image_name }}"
  register: container
  delegate_to: worker-1

- name: resume Kubernetes routines
  include_tasks:
    fle: process-traces-and-logs.yaml.origin

- name: destroy container
  command: "docker rm -f {{ container.stdout }}"
  delegate_to: worker-1

If the off-cluster node is where the traces and logs must be collected, you can use the off-cluster-docker.yaml script, which is a common utility that monitors the container until completion, at the same time collecting traces for all the hosts (including the Kubernetes workers.)

- name: start off-cluster-node
  command: "docker run --rm -d {{ off_cluster_docker_image_name }}"
  register: container
  delegate_to: worker-1

- name: monitor the docker execution and process traces and logs
  include_tasks:
    file: off-cluster-docker.yaml
  vars:
    off_cluster_host: worker-1
    off_cluster_container_id: "{{ container.stdout }}"

- name: start off-cluster-node
  command: "docker rm -f {{ container.stdout }}"
  register: container
  delegate_to: worker-1

If your off-cluster node execution isn't based on docker, you can model from the off-cluster-docker.yaml script to write your own traces and logs collection routine.

Component Design Nsys Hlprof¶

Source: doc/developer-guide/component-design/nsys-hlprof.md

Introduction¶

The trace tools nsys and hlprof are workload profiling tools for CUDA and Habana Gaudi accelerators, respectively. This document describes the steps required to integrate nsys and hlprof.

The nsys Trace Tool¶

Restrictions¶

The nsys trace tool does not like other trace tools, which work on the host system, independent of the workload execution. The nsys tool requires that the workload be launched by the nsys launch command. This limitation restricts the tool usage scenarios: - The nsys tool does not support :0 or :host tracing placements.
- The nsys tool must be used to launch the workload executable. The current implementation limits nsys to containerized workloads only, run under the docker engine.

Create nsys Containers¶

The nsys tool must be installed within the workload containers. Since the nsight-system installation alone occupies about 1.2GB, you might want to create different container images with and without nsys. You can use the condition [[ " $TERRAFORM_OPTIONS $CTESTSH_OPTIONS " = *" --nsys "* ]] to switch between container images.

ARG  OS_VER=24.04
ARG  OS_IMAGE=ubuntu
FROM ${OS_IMAGE}:${OS_VER}
RUN  apt-get update -y && apt-get install -y --no-install-recommends gnupg curl && \
     apt-get clean -y && rm -rf /var/lib/apt/lists/*

ARG  NVIDIA_DEVTOOLS_VER=3bf863cc
ARG  NVIDIA_DEVTOOLS_REPO=http://developer.download.nvidia.com/compute/cuda/repos
RUN  curl --netrc-optional --retry 10 --retry-connrefused -fsSL -o /tmp/${NVIDIA_DEVTOOLS_VER}.pub ${NVIDIA_DEVTOOLS_REPO}/$(. /etc/os-release;echo $ID$VERSION_ID | tr -d .)/$(uname -m)/${NVIDIA_DEVTOOLS_VER}.pub && \
     gpg --yes --dearmor -o /usr/share/keyrings/nvidia-devtools.gpg /tmp/${NVIDIA_DEVTOOLS_VER}.pub && \
     echo "deb [signed-by=/usr/share/keyrings/nvidia-devtools.gpg] ${NVIDIA_DEVTOOLS_REPO}/$(. /etc/os-release;echo $ID$VERSION_ID | tr -d .)/$(uname -m) /" > /etc/apt/sources.list.d/nvidia-devtools.list && \
     apt-get update -y && apt-get install -y --no-install-recommends nsight-systems && \
     apt-get clean -y && rm -rf /var/lib/apt/lists/*

ENV  PATH=/usr/lib/nsight-systems/host-linux-x64:$PATH
...
RUN  mkfifo /export-logs
CMD  (nsys launch /run_test.sh; echo $? > status) 2>&1 | tee output.logs && \
     echo "status output.logs" > /export-logs && \
     sleep infinity

where in addition to installing nsys, you must start your workload with nsys launch.

The following points must be followed: - The executable nsys must be on the PATH.
- The executable QdstrmImporter must be on the PATH.

If successful, your logs directory should contain the trace data:

$ ls
nsys-c0r1.logs                                nsys-c0r1.nsys-rep.logs
nsys-c0r1.nsys-rep                            nsys-c0r1.nsys-rep_nvtx_sum.csv
nsys-c0r1.nsys-rep_cuda_api_sum.csv           nsys-c0r1.nsys-rep_openacc_sum.csv
nsys-c0r1.nsys-rep_cuda_api_sync.csv          nsys-c0r1.nsys-rep_opengl_khr_gpu_range_sum.csv
nsys-c0r1.nsys-rep_cuda_gpu_kern_sum.csv      nsys-c0r1.nsys-rep_opengl_khr_range_sum.csv
nsys-c0r1.nsys-rep_cuda_gpu_mem_size_sum.csv  nsys-c0r1.nsys-rep_openmp_sum.csv
nsys-c0r1.nsys-rep_cuda_gpu_mem_time_sum.csv  nsys-c0r1.nsys-rep_osrt_sum.csv
nsys-c0r1.nsys-rep_cuda_memcpy_async.csv      nsys-c0r1.nsys-rep_um_cpu_page_faults_sum.csv
nsys-c0r1.nsys-rep_cuda_memcpy_sync.csv       nsys-c0r1.nsys-rep_um_sum.csv
nsys-c0r1.nsys-rep_cuda_memset_sync.csv       nsys-c0r1.nsys-rep_um_total_sum.csv
nsys-c0r1.nsys-rep_dx11_pix_sum.csv           nsys-c0r1.nsys-rep_vulkan_gpu_marker_sum.csv
nsys-c0r1.nsys-rep_dx12_gpu_marker_sum.csv    nsys-c0r1.nsys-rep_vulkan_marker_sum.csv
nsys-c0r1.nsys-rep_dx12_mem_ops.csv           nsys-c0r1.nsys-rep_wddm_queue_sum.csv
nsys-c0r1.nsys-rep_dx12_pix_sum.csv           nsys-collect.logs
nsys-c0r1.nsys-rep_gpu_gaps.csv               TRACE_START
nsys-c0r1.nsys-rep_gpu_time_util.csv          TRACE_STOP

The hlprof Trace Tool¶

Restrictions¶

The hlprof trace tool does not like other trace tools, which work on the host system, independent of the workload execution. The hlprof tool requires that the workload be launched with the environment variable HL_PROFILE=true. This limitation restricts the tool usage scenarios: - The hlprof tool does not support :0 or :host tracing placements.
- The hlprof tool must be used to launch the workload executable. The current implementation limits hlprof to containerized workloads only, run under the docker engine. - The hlprof tool cannot precisely stop a trace collection based on a stop phrase, unlike other trace tools. The stopping mechanism is controlled by the hlprof_options variable, which by default is defined as -g 1-2 -b 250, or capturing traces up to 2 enqueue invocations.

Create hlprof Containers¶

The hl-prof-config tool must be installed within the workload containers. This is usually preinstalled in the Habana Gaudi base-image containers.

ARG HABANA_VER="1.16.0-526"
ARG HABANA_IMG=vault.habana.ai/gaudi-docker/1.16.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.2
FROM ${HABANA_IMG}:${HABANA_VER}

...

RUN  mkfifo /export-logs
CMD  (HABANA_PROFILE=1 /run_test.sh; echo $? > status) 2>&1 | tee output.logs && \
     echo "status output.logs" > /export-logs && \
     sleep infinity

where HABANA_PROFILE=1 enables the Habana Gaudi trace system.

If successful, your logs directory should contain the trace data:

$ ls -1 -R workload/LLMs-PyTorch-OOB/logs-mygaudi_llms_pytorch_gaudi_inference_throughput_bfloat16_pkm/worker-0-1-hlprof/
workload/LLMs-PyTorch-OOB/logs-mygaudi_llms_pytorch_gaudi_inference_throughput_bfloat16_pkm/worker-0-1-hlprof/:
hlprof-c0r1
hlprof-c0r1.logs
hlprof-collect.logs
TRACE_START

workload/LLMs-PyTorch-OOB/logs-mygaudi_llms_pytorch_gaudi_inference_throughput_bfloat16_pkm/worker-0-1-hlprof/hlprof-c0r1:
hlprof-c0r1_84.hltv

Component Design Persistent Volumes¶

Source: doc/developer-guide/component-design/persistent-volumes.md

Introduction¶

The WSF supports OpenEBS or local-static-provisioner as optional Kubernetes plugins for local persistent volumes.

OpenEBS¶

Request OpenEBS support¶

Request to install the OpenEBS operator as follows in cluster-config.yaml.m4:

cluster:
- labels:
    HAS-SETUP-DISK-SPEC-1: required

terraform:
  k8s_plugins: [openebs]

This requests that the OpenEBS operator be installed in the Kubernetes cluster. The default storage class is local-hostpath, which uses the storage path /mnt/disk1. You can define additional storage class in your workload.

Use Persistent Volume¶

In your workload Kubernetes deployment script (or in helm charts), declare PersistentVolumeClaim and VolumeMounts as follows:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-hostpath-pvc
spec:
  storageClassName: local-hostpath
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5G

---

apiVersion: batch/v1
kind: Job
metadata:
  name: dummy-benchmark
spec:
  template:
    spec:
      containers:
      - name: dummy-benchmark
        image: IMAGENAME(Dockerfile)
        imagePullPolicy: IMAGEPOLICY(Always)
        env:
        - name: `SCALE'
          value: "SCALE"
        - name: `RETURN_VALUE'
          value: "RETURN_VALUE"
        - name: `SLEEP_TIME'
          value: "SLEEP_TIME"
        volumeMounts:
        - mountPath: /mnt/disk1
          name: local-storage
      volumes:
      - name: local-storage
        persistentVolumeClaim:
          claimName: local-hostpath-pvc
      restartPolicy: Never

Local-Static-Provisioner¶

Request Local-Static-Provisioner Support¶

Request to install the local-static-provisioner plugin as follows in cluster-config.yaml.m4:

cluster:
- labels:
    HAS-SETUP-DISK-SPEC-1: required

terraform:
  k8s_plugins:
  - local-storage-provisioner

This requests that the local-static-provisioner plugin be installed in the Kubernetes cluster. The default storage class is local-static-storage, which uses the storage path /mnt/disk1. You can define additional storage class in your workload.

Use Persistent Volume¶

In your workload Kubernetes deployment script (or in helm charts), declare PersistentVolumeClaim and VolumeMounts as follows:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-claim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: local-static-storage

---

apiVersion: batch/v1
kind: Job
metadata:
  name: dummy-benchmark
spec:
  template:
    spec:
      containers:
      - name: dummy-benchmark
        image: IMAGENAME(Dockerfile)
        imagePullPolicy: IMAGEPOLICY(Always)
        env:
        - name: `SCALE'
          value: "SCALE"
        - name: `RETURN_VALUE'
          value: "RETURN_VALUE"
        - name: `SLEEP_TIME'
          value: "SLEEP_TIME"
        volumeMounts:
        - mountPath: /mnt/disk1
          name: local-storage
      volumes:
      - name: local-storage
        persistentVolumeClaim:
          claimName: local-claim
      restartPolicy: Never

Component Design Readme¶

Source: doc/developer-guide/component-design/readme.md

Each workload should have a README with the required instructions.

Sections¶

The workload README should have the following sections: - Introduction: Introduce the workload and any background information. - Test Case: Describe the test cases. - Configuration: Describe the workload configuration parameters. - Execution: Show some examples of how to run the workload. - KPI: Describe the KPI definitions and the meanings of the values. - Performance BKM: Describe system setup and any performance tunning tips. - Index Info: List the workload indexing information. - Validation Notes: This section is auto-inserted by the validation team. New workload should remove this section. - See Also: Add any workload-related references.

The dummy workload README for reference.

Performance BKM¶

It is recommended to include (but not limited to) the following information in the Performance BKM section: - The minimum system setup (and the corresponding testcase). - The recommended system setup (and the corresponding test case). - Workload parameter tuning guidelines. - Links to any performance report(s).

Component Design Secrets¶

Source: doc/developer-guide/component-design/secrets.md

Introduction¶

This document describes how to handle secrets (such as an access token) in the workload development.

Configure Secrets¶

Store user secrets under $PROJECTDIR/script/csp/.<domain>/config.json (with mode 600), where $PROJECTDIR is the root of the repository, and .<domain>/config.json is a domain specific configuration file. The JSON format is preferred but it can be any convenient format.

{
   "token": "1234567890"
}

Read Secrets¶

The workload validate.sh can read the workload secrets into environment variables. Special care must be taken not to expose the secret values:

Declare the secret variable in WORKLOAD_PARAMS with a leading -. This will ensure that the secret values won't be accidentally shown on the screen, in any of the visible configuration files, or be uploaded to the WSF dashboard in subsequent operations.

WORKLOAD_PARAMS=(-TOKEN)

The WSF assumes a limited set of host-level utilities that can be used in bash scripts. jq (a popular utility to access json constructs) is not one of them. You can instead use sed to parse the json configuration file. While parsing the secret values, pay attention not to expose the values directly on the command line.

TOKEN="$(sed -n '/"token":/{p;q}' "$PROJECTDIR"/script/csp/.mydomain/config.json | cut -f4 -d'"')"

Use Secrets in Docker¶

To use the workload secrets in a docker execution, declare DOCKER_OPTIONS in validate.sh:

DOCKER_OPTIONS="-e TOKEN"

or use a dedicated docker-config.yaml:

worker-0:
- image: ...
  options:
  - -e TOKEN
...

Do not expose the TOKEN value on the command line. Let docker read from the environment instead.

Use Secrets in Docker-Compose¶

To use workload secrets in a docker-compose file, use the environment variables to access the secret values:

# compose-config.yaml
services:
  my-workload-service:
    image: ...
    environment:
      TOKEN: "${TOKEN}"
...

Use Secrets in Kubernetes Scripts/Helm Charts:¶

Use the workload-config secret (auto-generated) to access the workload secrets in a Kubernetes configuration file or in Helm Charts:

# kubernetes-config.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: my-workload
spec:
  template:
    spec:
      containers:
      - name: my-workload
        image: ...
        env:
        - name: TOKEN
          valueFrom:
            secretKeyRef:
              name: workload-config
              key: TOKEN
...

Access Secretes in Native Ansible Scripts¶

Use the following code snippets to use the workload secrets as environment variables. Be careful not to show the secret values in the ansible debugging output or on the command line on a SUT.

# deployment.yaml
- name: Use my secret
  command: curl --header $TOKEN ...
  environment: "{{ workload_secrets }}"
  vars:
    workload_secrets: "{{ lookup('file',wl_logs_dir+'/.workload-secret.yaml) | from_yaml }}"

Component Design Stack¶

Source: doc/developer-guide/component-design/stack.md

A software stack is the underlying software layers that a workload is constructed upon. The software layers include reusable software libraries, ansible scripts, docker images, and microservices.

Structure¶

Software stack consists of the following elements, some described in this document and others in the linked document.

Dockerfiles: A software stack may contain one or many Dockerfiles.
CMakeLists.txt: A manifest to configure cmake.
build.sh: A script for building the workload docker image(s).

Optionally, software stacks can define unit tests similar to how workloads work to verify software stack functionalities.

Component Design Template¶

Source: doc/developer-guide/component-design/template.md

There are templating systems, based on M4 macros (*.m4 files) and Jinja2 (*.j2 files), built into the workload build process. You can use them to simplify the workload recipe development by encapsulating any duplicated steps.

Note: This document lacks information about Jinja templating. We are working on improvements.

Usage¶

To use the template system, create a (or more) .m4/.j2 files under your workload folder, and put any shared templates .m4/.j2 under the template folder under the workload, feature, platform or the top directory. During the build process, those .m4/.j2 files will be expanded to either .tmpm4.xyzt or .tmpj2.xyzt, where xyzt is a random string. The temporary files will be removed after the build.

Example¶

The following sample uses ippmb.m4 to encapsulate the IPP library installation steps:

# SPR/Crypto/WordPress/Docker.1.nginx.m4
...
include(ippmb.m4)
...

where ippmb.m4 will be expanded to:

# SPR/Crypto/template/ippmb.m4
ARG IPP_CRYPTO_VERSION="ippcp_2020u3"
ARG IPP_CRYPTO_REPO=https://github.com/intel/ipp-crypto.git
RUN git clone -b ${IPP_CRYPTO_VERSION} --depth 1 ${IPP_CRYPTO_REPO} && \
    cd /ipp-crypto/sources/ippcp/crypto_mb && \
    cmake . -B"../build" \
      -DOPENSSL_INCLUDE_DIR=/usr/local/include/openssl \
      -DOPENSSL_LIBRARIES=/usr/local/lib64 \
      -DOPENSSL_ROOT_DIR=/usr/local/bin/openssl && \
    cd ../build && \
    make crypto_mb && \
    make install

Pre-defined Variables:¶

PLATFORM: The platform name that the workload is defined for.
FEATURE: The hero feature name that the workload is defined under.
WORKLOAD: The workload name.
REGISTRY: The private registry.
RELEASE: The release version.

Component Design Timezone¶

Source: doc/developer-guide/component-design/timezone.md

Introduction¶

It is not a general requirement to align container time zone with what is on the SUT host. However, if you need to sync the container date time with an external PDU, it is desired to align the container time zone with what is on the SUT host.

The time zone information is in the file /etc/localtime and optionally with an environment variable TZ.

Docker Execution¶

For workloads that run with docker, the validation script automatically exposes the TZ environment variable.

The workload should perform the following steps to properly use the TZ value: - Install the tzdata package. - Link /etc/localtime: ln -sf /usr/share/zoneinfo/$TZ /etc/localtime.

Most of the time, however, you can bypass the above steps by just mounting /etc/localtime from the host, i.e., specify -v /etc/localtime:/etc/localtime:ro in DOCKER_OPTIONS.

Docker Compose¶

The TZ environment variable is exposed to the docker-compose file. You should use it in your docker-compose file:

services:
  my-service:
    environment:
      TZ: ${TZ}

Kubernetes¶

The validation script automatically exposes a workload-config secret in your namespace. The secret contains:
- TZ: The time zone string.

You can configure it in your Kubernetes/Helm scripts:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-pod
    image: my-pod-image
    env:
      - name: TZ
        valueFrom:
          SecretKeyRef:
            name: workload-config
            key: TZ

Component Design Validate¶

Source: doc/developer-guide/component-design/validate.md

The validate.sh script initiates the workload execution.

Example¶

An example of typical validate.sh is shown as follows:

#!/bin/bash -e

# Read test case configuration parameters
... 

# Logs Setting
DIR=$(dirname $(readlink -f "$0"))
. "$DIR/../../script/overwrite.sh"

# Workload Setting
WORKLOAD_PARAMS=(CONFIG1 CONFIG2 CONFIG3)

# Docker Setting
DOCKER_IMAGE="Dockerfile"
DOCKER_OPTIONS=""

# Kubernetes Setting
RECONFIG_OPTIONS="-DCONFIG=$CONFIG"
JOB_FILTER="job-name=benchmark"

. "$DIR/../../script/validate.sh"

where . "$DIR/../../script/overwrite.sh" is a script to support workload parameter overwrite via ctest.sh command line, and . "$DIR/../../script/validate.sh" is a script for workload execution. The validate.sh saves any validation results to the current directory.

Reserved variables¶

The following script variables are reserved. Avoid overwriting their values in validate.sh: - PLATFORM - WORKLOAD - TESTCASE - DESCRIPTION - REGISTRY - RELEASE - IMAGEARCH - IMAGESUFFIX - TIMEOUT - SCRIPT

Optional parameters¶

Optionally, after . "$DIR/../../script/overwrite.sh", you can invoke the . "$DIR/../../script/sut-info.sh", which queries the Cloud CLI for SUT information. The SUT information is saved as shell variables, example as follows:

SUTINFO_CSP=gcp
SUTINFO_WORKER_VCPUS=6
SUTINFO_WORKER_MEMORY=4096
SUTINFO_CLIENT_VCPUS=2
SUTINFO_CLIENT_MEMORY=2048
SUTINFO_CONTROLLER_VCPUS=2
SUTINFO_CONTROLLER_MEMORY=2048

where memory size is in MiB.

Validation Parameters¶

WORKLOAD_PARAMS: Specify the workload configuration parameters as an array variable of workload variables. The configuration parameters will be shown as software configuration metadata in the WSF dashboard. Workload configuration parameters can be also be accessed in Ansible using wl_tunables.<VARIABLE_NAME> e.g. wl_tunables.SCALE.

WORKLOAD_PARAMS=(SCALE RETURN_VALUE SLEEP_TIME)

- You can add a - prefix to the workload parameter to specify that this workload parameter is a secret. The script will ensure that the value is not exposed to any console print out.

WORKLOAD_PARAMS=(SCALE -TOKEN)

- You can append, after #, workload parameter description to the workload parameter variable to print help messages to the user. You can use backslash escapes or the cat workaround for multiple line descriptions.

WORKLOAD_PARAMS=(
"SCALE#This parameter specifies the number of PI digits."
"RETURN_VALUE#$(cat <<EOF
You can emulate the workload exit code by explicitly
specifying the return exit code.
EOF
)"
)

WORKLOAD_TAGS: Specify any workload related tags as a space separated string.
DOCKER_IMAGE: If the workload is a single-container workload and support docker run, specify either the docker image name or the Dockerfile used to compile the docker image. If the workload does not support docker run, leave the variable value empty.
DOCKER_OPTIONS: Specify any docker run options, if the workload supports docker run.
J2_OPTIONS: Specify any configuration parameters when expanding the Jinja2 .j2 templates.
RECONFIG_OPTIONS: Specify any configuration parameters when expanding any Kubernetes deployement script as a .m4 template.
HELM_OPTIONS: Specify any helm charts build options. This applies to any Kubernetes workloads with the deployment scripts written as helm charts.
JOB_FILTER: Specify which job/deployment is used to monitor the validation progress and after validation completion, retrieve the validation logs. You can specify multiple job/deployment filters, using the , as a separator. The first filter is for the benchmark pods, and the rest are service pods. For jobs with multiple containers, you can specify the container name as a qualifier, for example, job-name=dummy-benchmark:dummy-benchmark
SCRIPT_ARGS: Specify the script arguments for the kpi.sh or setup.sh.

Event Tracing Parameters¶

EVENT_TRACE_PARAMS: Specify the event tracing parameters:
roi: Specify the ROI-based trace parameters: roi,<start-phrase>,<end-phrase>[,roi,<start-phase>,<stop-phrase> ...]. For example, the trace parameters can be roi,begin region of interest,end region of interest. The workload must be instrumented to print these phrases in the console output.

For more sophisticated multi-line context-based ROI, if the string of the start-phrase or stop-phrase starts with and ends with /, then the string is a regular expression. Use ~ to represent any new line character. For example, /~iteration 10.*start workload/ triggers the start of the ROI after the 10th iteration.

Additional delay can be appended to the start/stop string as follows: START_BENCHMARK+5s, which specifies that the ROI starts 5 seconds after identifying the starting phrase START_BENCHMARK.

time: Specify a time-based trace parameters: time,<start-time>,<trace-duration>[,time,<start-time>,<end-time>]. For example, if the trace parameters are time,30,10, the trace collection starts 30 seconds after the workload containers become ready and the collection duration is 10 seconds.

For short-ROI workloads (less than a few seconds), it is recommended that you specify the EVENT_TRACE_PARAMS value as an empty string, meaning that the trace ROI should be the entirety of the workload execution, which ensures that the trace collection catches the short duration of the workload execution.

Between roi and time, use roi if possible and use time as the last resort if the workload does not output anything meaningful to indicate a ROI.

Note that none of the event tracing mechanisms is timing accurate. You need to define the event trace parameter values with a high timing tolerance, at least in seconds.

PRESWA Parameters¶

The Pre-Si analysis pipeline requires to additionally identify the Process of Interest (POI) of a workload. For example, in a client-server workload, the POI is the service process. The POI is specified as a regular expression that can be used to match the workload process. If the workload uses Kubernetes orchestration, the workload must specify a pod filter to uniquely identify the pod.

PRESWA_POI_PARAMS: Specify the Pre-Si POI parameters as follows: process-name-filter [pod-label-filter], where the process filter is a regular expression string to filter the process names, and the pod-filer (optional for docker) is the Kubernetes label filter to uniquely identify the pod. Some example: mongo app=server

With docker, the process info can be obtained through /sys/fs/cgroup/systemd/docker/<container-id>/cgroup.procs.

With Kubernetes with containerd runtime, the process info can be obtained through /sys/fs/cgroup/systemd/system.slice/containerd.service/kubepods-besteffort-pod<pod-uid>.slice:cri-containerd:<container-uid>/cgroup.procs, where <pod-uid> and container-uid can be obtained from kubectl get pod -A -o json.

Component Design Workload¶

Source: doc/developer-guide/component-design/workload.md

The Workload Service Framework (WSF) supports the following types of workloads:

Native workloads: The workload runs directly on the SUT (System Under Test) hosts. The workload logic is implemented by Ansible scripts.
Containerized workloads: The workload runs under either docker or Kubernetes. The workload logic is implemented by a set of Dockerfiles and docker/Kubernetes configuration files.

Native Workloads¶

A native workload consists of the following elements:

CMakeLists.txt: A manifest to configure how to build and test the workload.
build.sh: A script for building the workload. Strictly speaking, native workloads do not need a separate build process. They build on the SUTs if required. This is just a place holder script for scanning and listing workload ingredients.
validate.sh: A script to define how to execute the workload.
kpi.sh: A script for extracting KPI data out of the workload execution logs.
cluster-config.yaml.m4: A manifest to describe how to provision the SUTs.
Native Scripts: The native scripts that implement the workload logic, including Ansible scripts (workload execution logic) and optional Terraform scripts (SUT provisioning logic).
README: A README to introduce the workload, configure parameters, and provide other related information.

Containerized Workloads¶

A containerized workload can run under docker (single-container) or Kubernetes (single- or multiple-containers). The workload consists of the following elements:

CMakeLists.txt: A manifest to configure how to build and test the workload.
build.sh: A script for building the workload docker image(s).
validate.sh: A script for executing the workload.
kpi.sh: A script for extracting KPI data out of the workload execution logs.
compose-config.yaml.m4/j2: An optional manifest to describe how to schedule the containers with docker-compose.
cluster-config.yaml.m4/j2: A manifest to describe how to provision a machine or a set of machines for running the workload.
Dockerfiles: A workload may contain one or multiple Dockerfiles.
kubernetes-config.yaml.m4/j2 or helm charts: An optional manifest to describe how to schedule the containers to a Kubernetes cluster.
Native Scripts: Optionally, the workload may provide native scripts for customizing the workload execution logic (Ansible scripts) or the SUT provisioning logic (Terraform scripts).
README: A README to introduce the workload, configure parameters, and provide other related information.

Developer Guide (Enhanced Comprehensive)¶

Source Files¶

Component Design Build¶

Customizing with switches¶

Template Expansion¶

Build Dependencies¶

Component Design Cluster Config¶

cluster.labels¶

cluster.cpu_info¶

cluster.mem_info¶

cluster.vm_group¶

cluster.off_cluster¶

cluster.sysctls¶

cluster.sysfs¶

cluster.bios¶

cluster.msr¶

terraform¶

Example of Enabling Kubernetes NUMA Controls¶

Example of Enabling Kubernetes Per-Socket Topology Aware Controls¶

Component Design Cmakelists¶

Writing the definition for workload¶

Licensing Terms¶

Fixed or Negative SUT¶

Software Stack CMakeLists.txt¶

See also¶

Component Design Compose Config¶

Image Name¶

Component Design Docker Config¶

DOCKER-CONFIG Format¶

Test Time Considerations¶

Component Design Dockerfile¶

Use Template¶

Set Build Order¶

Specify Image Name¶

Naming Convention:¶

List Ingredients¶

Export Status & Logs¶

Export to FIFO¶

Import from FIFO¶

ENTRYPOINT reserved feature¶

Workaround the software.intel.com proxy issue¶

See Also¶

Component Design Image¶

Prerequisites¶

Navigating the WSF VM Image folder structures¶

Getting Started¶

Building VM Images¶

Cleanup process¶

CMakeLists.txt¶

build.sh - Configuration of Image creation¶

Ansible playbooks¶

Declare Ingredients in Ansible playbooks¶

Component Design Kpi¶

Output format¶

Parsing the output¶

Component Design Kubernetes Config¶

Templating possibilities¶

Image Name¶

About imagePullPolicy¶

About podAntiAffinity¶

See Also¶

Component Design Native Script¶

Custom Terraform scripts¶

Ansible Scripts¶

List Ingredients¶

Ansible Script Examples¶

Ansible Resources¶

Ansible Inventory¶

Common Ansible Scripts¶

Request Off-Cluster Nodes¶

Install Software on Off-Cluster Nodes¶

Execute Software on Off-Cluster Nodes¶

Component Design Nsys Hlprof¶

Introduction¶

The nsys Trace Tool¶

Restrictions¶

Create nsys Containers¶

The hlprof Trace Tool¶

Restrictions¶

Create hlprof Containers¶

`ENTRYPOINT` reserved feature¶

Workaround the `software.intel.com` proxy issue¶

About `imagePullPolicy`¶

About `podAntiAffinity`¶