Developer Guide (Enhanced Comprehensive)¶
TL;DR: Deterministic comprehensive guide assembled from all mirrored source markdown files.
- Developer Guide (Enhanced Comprehensive)
- Source Files
- Component Design Build
- Component Design Cluster Config
- Component Design Cmakelists
- Component Design Compose Config
- Component Design Docker Config
- Component Design Dockerfile
- Component Design Image
- Component Design Kpi
- Component Design Kubernetes Config
- Component Design Native Script
- Component Design Nsys Hlprof
- Component Design Persistent Volumes
- Component Design Readme
- Component Design Secrets
- Component Design Stack
- Component Design Template
- Component Design Timezone
- Component Design Validate
- Component Design Workload
Source Files¶
- doc/developer-guide/component-design/build.md
- doc/developer-guide/component-design/cluster-config.md
- doc/developer-guide/component-design/cmakelists.md
- doc/developer-guide/component-design/compose-config.md
- doc/developer-guide/component-design/docker-config.md
- doc/developer-guide/component-design/dockerfile.md
- doc/developer-guide/component-design/image.md
- doc/developer-guide/component-design/kpi.md
- doc/developer-guide/component-design/kubernetes-config.md
- doc/developer-guide/component-design/native-script.md
- doc/developer-guide/component-design/north-traffic.md
- doc/developer-guide/component-design/nsys-hlprof.md
- doc/developer-guide/component-design/persistent-volumes.md
- doc/developer-guide/component-design/readme.md
- doc/developer-guide/component-design/secrets.md
- doc/developer-guide/component-design/stack.md
- doc/developer-guide/component-design/template.md
- doc/developer-guide/component-design/timezone.md
- doc/developer-guide/component-design/validate.md
- doc/developer-guide/component-design/workload.md
Component Design Build¶
Source: doc/developer-guide/component-design/build.md
The build.sh script performs the build of a workload.
Since the process is standardized, there is usually no need to customize it. You can use the following template as is to call the ready-made build.sh from under the script folder:
Customizing with switches¶
In some cases, the script/build.sh can be customized as follows:
- BUILD_FILES: Specify an array of base file names to be included in the build.
- BUILD_OPTIONS: Specify any custom arguments to the docker build command.
- BUILD_CONTEXT: Optionally specify a relative directory name or an array of relative directory names, where the Dockerfiles are located. By default, the Dockerfiles are assumed to be located directly under the workload directory.
- FIND_OPTIONS: Specify any custom arguments to the find program to locate the set of Dockerfiles for building the docker images and for listing the BOMs.
Template Expansion¶
The build.sh script automatically performs template expansion if you have templates defined in either the .m4 format or the .j2 format, except those under the template directory.
For example, if you define Dockerfile.m4 or Dockerfile.j2, the script will expand the template to Dockerfile before building the docker images.
More about templating systems can be found under Templating systems.
Build Dependencies¶
If your workload depends on one of the common software stacks, invoke the corresponding software stack's build.sh.
For example, if your image depends on QAT-Setup, your build.sh can be something like below:
#!/bin/bash -e
DIR="$(dirname "$(readlink -f "$0")")"
# build QAT-Setup (added section)
STACK="qat_setup" "$DIR"/../../stack/QAT-Setup/build.sh $@
# build our image(s)
. "$DIR"/../../script/build.sh
Component Design Cluster Config¶
Source: doc/developer-guide/component-design/cluster-config.md
The cluster-config.yaml manifest describes the machine specification to run the workloads. The specification is still evolving and subject to change.
The following example describes a 3-node cluster to be used in some workload:
The cluster-config.yaml consists of the following sections:
cluster: This section defines the post-Sil cluster configurations.
cluster.labels¶
The cluster.labels section describes any must have system level setup that a workload must use. The setup is specified in terms of a set of Kubernetes node labels as follows:
| Label | Description |
|---|---|
HAS-SETUP-DATASET |
This set of labels specifies the available dataset on the host. See also: Dataset Setup. |
HAS-SETUP-DISK-AVAIL |
This set of labels probe the disk availablibility to ensure there is enough data space available for workload execution. See also: Disk Avail Setup. |
HAS-SETUP-DISK-SPEC |
This set of labels specify that SSD or NVME disks be mounted on the worker node(s). See also: [Storage Setup][Storage Setup]. |
HAS-SETUP-HUGEPAGE |
This set of labels specify the kernel hugepage settings. See also: Hugepage Setup |
HAS-SETUP-MEMORY |
This label specifies the minimum memory required by the workload. See also: Memory Setup. |
HAS-SETUP-MODULE |
This set of labels specify the kernel modules that the workload must use. See also: Module Setup. |
The label value is either required or preferred as follows:
cluster.cpu_info¶
The cluster.cpu_info section describes any CPU-related constraints that a workload must use. The cpu_info section is currently declarative and is not enforced.
where the CPU flags must match what are shown by lscpu or cat /proc/cpuinfo.
cluster.mem_info¶
The cluster.mem_info section describes any memory constraints that a workload must use. The mem_info section is currently declarative and is not enforced.
Please also use the Kubernetes [resource constraints][resource constraints] to specify the workload memory requirements.)
where the available memory is in the unit of GBytes.
cluster.vm_group¶
The cluster.vm_group section describes the worker group that this worker node belongs to. Each worker group is a set of SUTs of similar specification. If not specified, the worker group is assumed to be worker.
Enforced by the terraform backend.
cluster.off_cluster¶
The cluster.off_cluster section describes whether the worker node should be part of the Kubernetes cluster. This is ignored if the workload is not a Cloud Native workload or the execution is not through Kuberentes.
If not specified, all nodes are part of the Kubernetes cluster.
cluster.sysctls¶
The cluster.sysctls section describes the sysctls that the workload expects to use. The sysctls are specified per worker group. Multiple sysctls are merged together and applied to all the worker nodes in the same workgroup.
Enforced by the terraform backend.
cluster.sysfs¶
The cluster.sysfs section describes the sysfs or procfs controls that the workload expects to use. The controls are specified per worker group. Multiple controls are merged together and applied to all the worker nodes in the same workgroup.
Enforced by the terraform backend.
cluster.bios¶
The cluster.bios section describes the bios settings that the workload expects to use. The controls are specified per worker group. Multiple controls are merged together and applied to all the worker nodes in the same workgroup.
Enforced by the terraform backend.
cluster:
- labels: {}
bios:
SE5C620.86B:
"Intel(R) Hyper-Threading Tech": Enabled # Disabled
"CPU Power and Performance Policy": Performance # "Balanced Performance", "Balanced Power", or "Power"
cluster.msr¶
The cluster.msr section describes the msr register settings that the workload expects to use. The controls are specified per worker group. Multiple controls are merged together and applied to all the worker nodes in the same workgroup.
Enforced by the terraform backend.
terraform¶
The terraform section overwrites the default configuration parameters of the terraform validation backend default. See Terraform Options for specific options.
Note that any specified options in
TERRAFORM_OPTIONSor by the CLI takes precedent. They will not be overriden by the parameters specified in this section.
Example of Enabling Kubernetes NUMA Controls¶
terraform:
k8s_kubeadm_options:
KubeletConfiguration:
cpuManagerPolicy: static
systemReserved:
cpu: 200m
topologyManagerPolicy: single-numa-node
topologyManagerScope: pod
memoryManagerPolicy: Static
reservedMemory:
- numaNode: 0
limits:
memory: 100Mi
featureGates:
CPUManager: true
TopologyManager: true
MemoryManager: true
Example of Enabling Kubernetes Per-Socket Topology Aware Controls¶
Below configuration enables topology aware scheduling - including hardware cores and socket awareness. Only integral values for CPU reservations are allowed and misconfiguration of k8s deployment will result in SMTAlignmentError. This should be considered advanced users only example. Required Kubernetes version of 1.26.1 or higher.
terraform:
k8s_kubeadm_options:
KubeletConfiguration:
cpuManagerPolicy: static
cpuManagerPolicyOptions:
align-by-socket: "true"
distribute-cpus-across-numa: "true"
full-pcpus-only: "true"
systemReserved:
cpu: 1000m
topologyManagerPolicy: best-effort
topologyManagerPolicyOptions:
prefer-closest-numa-nodes: "true"
topologyManagerScope: pod
memoryManagerPolicy: Static
reservedMemory:
- numaNode: 0
limits:
memory: 100Mi
featureGates:
CPUManager: true
CPUManagerPolicyAlphaOptions: true
CPUManagerPolicyBetaOptions: true
CPUManagerPolicyOptions: true
MemoryManager: true
TopologyManager: true
TopologyManagerPolicyAlphaOptions: true
TopologyManagerPolicyBetaOptions: true
TopologyManagerPolicyOptions: true
Component Design Cmakelists¶
Source: doc/developer-guide/component-design/cmakelists.md
The CMakeLists.txt defines actions in cmake: build and test. It contains a set of directives and instructions that describe project's source files and targets.
Writing the definition for workload¶
Let us start with a simple example:
The add_workload function defines cmake build rules for a workload. The name must be lower cased as a convention and does not contain any special characters with the exception of _. It is recommended to append the version info to indicate the implementation versioning. The function also defines a parent-scope variable workload with the same name that any subsequent function can use.
The add_testcase function defines a test case. You may define multiple test cases, each with a unique name and some configuration parameters. Internally, this gets routed to the validate.sh script with the specified parameters. (There is no argument in the above example.) The validation results are saved to the corresponding logs-$workload directory under the build tree. See also: Workload Testcases.
Note that the name of any
cmaketarget must be unique across all workloads. Thus it is usually a concatenation of platform, feature, workload and configuration.
Licensing Terms¶
If the workload requires the user to agree to any license terms, use the check_license function. The function prompts the user for license agreement and then saves the decision. If the user denies the license terms, the workload will be skipped during the build process. If there are multiple license terms, you can write as many check_license functions as needed.
check_license("media.xiorg.com" "Please agree to the license terms for downloading datasets from xiorg.com")
add_workload("foo" LICENSE "media.xiorg.com")
Fixed or Negative SUT¶
If the workload can only run on specific SUT (System Under Test), in this case azure, specify the SUT constraints as part of the add_workload function as follows:
where the azure SUT must be defined with script/terraform/terraform-config.<sut>.tf.
You can also specify a negative SUT name to remove the SUT type from selection. This will match all possible SUTs, except aws:
Software Stack CMakeLists.txt¶
CMakeLists.txt for software stacks defines the software stack build and test targets. Let us start with a simple example:
The add_stack function defines cmake build rules for a workload. The name must be lower cased as a convention and does not contain any special characters with the exception of _. It is recommended to append the version info to indicate the implementation versioning. The function also defines a parent-scope variable stack with the same name that any subsequent function can use.
The add_testcase function defines a test case. You may define multiple test cases, each with a unique name and some configuration parameters. Internally, this gets routed to the validate.sh script with the specified parameters. (There is no argument in the above example.) The validation results are saved to the corresponding logs-$stack directory under the build tree.
Note that the name of any
cmaketarget must be unique across all workloads. Thus it is usually a concatenation of platform, feature, stack and configuration.
Similar to workload CMakeLists.txt, you can also use the check_git_repo function, the check_license function, and the SUT/LICENSE constraints.
See also¶
- Documentation of available CMake controls, including conditional blocks and loops
Component Design Compose Config¶
Source: doc/developer-guide/component-design/compose-config.md
The compose-config.yaml script is a manifest that describes how the workload container(s) should be scheduled (to the machine cluster described by cluster-config.yaml.) This is the standard docker-compose script.
You can choose to write compose-config.yaml in any of the following formats:
- compose-config.yaml: For simple workloads, you can directly write the docker-compose script.
- compose-config.yaml.m4: Use the .m4 template to add conditional statements in the docker-compose script.
- compose-config.yaml.j2: Use the .j2 template to add conditional statements in the docker-compose script.
Image Name¶
The container image in compose-config.yaml should use the full name in the format of <REGISTRY><image-name><RELEASE>, where <REGISTRY> is the docker registry URL (if any) and the <RELEASE> is the release version, (or :latest if not defined.)
If you use the .m4 template, the IMAGENAME macro can expand an image name to include the registry and release information:
dummy-benchmark must match what defined in JOB_FILTER.
If you use the .j2 template, you must write the image name as follows:
Component Design Docker Config¶
Source: doc/developer-guide/component-design/docker-config.md
The docker-config.yaml script is a manifest that describes how to schedule the workload container(s) on multiple hosts (described by cluster-config.yaml.)
You can choose to write docker-config.yaml in any of the following formats:
- docker-config.yaml.m4: Use the .m4 template to add conditional statements in the docker-config script.
- docker-config.yaml.j2: Use the .j2 template to add conditional statements in the docker-config script.
DOCKER-CONFIG Format¶
The docker-config.yaml uses the following syntax:
worker-0:
- image: "{{ REGISTRY }}image-name{{ IMAGESUFFIX }}{{ RELEASE }}"
options:
- -e VAR1=VALUE1
- -e VAR2=VALUE2
command: "/bin/bash -c 'echo hello world'"
export-logs: true
cluster-config.yaml. The SUT hosts are named against their SUT workgroup. For example, for the workers, the SUT hosts are named as worker-0, worker-1, etc. For clients, the SUT hosts are named as client-0, client-1, etc.- The value of each SUT host is a list of containers to be scheduled on the SUT host. The list order is not enforced.
- Each container is described as a dictionary of -
image: Specify the full docker image name
- options: Specify the docker run command line arguments, as a string or a list.
- command: Optional. Specify any startup command. This will overwrite whatever is defined in the docker image.
- export-logs or service-logs: Optional. Specify whether logs should be collected on the container.
The script will first collect logs on containers whose
export-logsis true, which also signals that the workload execution is completed. Then collect logs on containers whoseservice-logsistrue.export-logsandservice-logsare exclusive options and can not both be true.
Test Time Considerations¶
At test time, the validation script launches the containers described in docker-config.yaml, for example, 2 containers on worker-0 and 1 on worker-1. The launch order is not enforced thus the workload must implement alternative locking mechanism if the launch order is important.
If
docker-config.yamlexists, the settings will take precedent overDOCKER_IMAGEandDOCKER_OPTIONS, specified invalidate.sh.
To faciliate SUT-level network communication, the list of all SUT private IP addresses are provided to each container runtime as environment variables, for example, WORKER_0_HOST=10.20.30.40, WORKER_1_HOST=20.30.40.50, CLIENT_0_HOST=30.40.50.60, etc. The workload can then use the IP addresses to setup services and communicate among the SUT hosts.
Component Design Dockerfile¶
Source: doc/developer-guide/component-design/dockerfile.md
The workload Dockerfile must meet certain requirements to facilitate image build, validation execution and data collection.
Use Template¶
You can use m4 template in constructing Dockerfiles, which avoids duplication of identical steps. Any files with the .m4 suffix will be replaced with the corresponding files without the suffix, during the build process.
Set Build Order¶
If there are multiple Dockerfiles under the workload directory, the build order is determined by the filename pattern of the Dockerfile: Dockerfile.[1-9].<string>. The bigger the number in the middle of the filename, the earlier that the build script builds the Dockerfile. If there are two Dockerfiles with the same number, the build order is platform-specific.
Filename:
Specify Image Name¶
The first line of the Dockerfile is used to specify the docker image name, as follows:
Note: If optional
# syntax=line is added, it should preceed the name line.
Final images, that are pushed to the docker registry:
Intermediate images, that are not pushed to the docker registry:
Output:
Note: Image's
TAGmay differ based on theRELEASEsetting. If unspecified,latestis used.Note: For
ARMv*platforms, the image names will be appended with an-arm64suffix, so that they can coexist withx86platform images on the same host.
Naming Convention:¶
As a convention, the image name uses the following pattern: [<platform>-]<workload>-<other names>, and it must be unique. The platform prefix is a must have if the image is platform specific, and optional if the image can run on any platform.
List Ingredients¶
Any significant ingredients used in the workload must be marked with the ARG statement, so that we can easily list ingredients of a workload, for example:
The following ARG suffixes are supported:
- _REPO or _REPOSITORY: Specify the ingredient source repository location.
- _VER or _VERSION: Specify the ingredient version.
- _IMG or _IMAGE: Specify an ingredient docker image.
- _PKG or _PACKAGE: Specify an ingredient OS package, such as deb or rpm.
_VERand the corresponding_REPO/_PACKAGE/_IMAGEmust be in a pair to properly show up in the Wiki ingredient table. For example, if you defineOS_VER, then there should be anOS_IMAGEdefinition.
Export Status & Logs¶
It is the workload developer's responsibility to design how to start the workload and how to stop the workload. However, it is a common requirement for the validation runtime to reliably collect execution logs and any telemetry data for analysing the results.
Export to FIFO¶
The workload image must create a FIFO under /export-logs path, and then archive:
- The workload exit code (in
status) > Note: Any exit code different than0returned instatusdefines a failed execution. - and any workload-specific logs, which can be used to generate performance indicators
Note: Path to FIFO can be overwrote from
/export-logs, by setting theEXPORT_LOGS=/my/custom/pathvariable invalidate.shto point an absolute path to the FIFO inside the container.
For example:
RUN mkfifo /export-logs
CMD (./run-workload.sh; echo $? > status) 2>&1 | tee output.logs && \
tar cf /export-logs status output.logs && \
sleep infinity
RUN mkfifo /export-logscreates a FIFO for logs export;CMDexecutes and collects logs:(./run-workload.sh;executes workload;echo $? > status)sends exit code to status;2>&1points standard error output to standard output;| tee output.logssends the output to both terminal andoutput.logsfile;tar cf /export-logs status output.logscreates a tarball archive withstatusandoutput.logsinside the/export-logsqueue;sleep infinityis mandatory to hold the container for logs retrieval.
Alternatively, a list of files can be echoed to /export-logs, for example:
RUN mkfifo /export-logs
CMD (./run-workload.sh; echo $? > status) 2>&1 | tee output.logs && \
echo "status output.logs" > /export-logs && \
sleep infinity
The difference is only within point 5 of CMD: echo "status output.logs" > /export-logs sends the list of files to the queue.
Import from FIFO¶
The validation backend (script/validate.sh) imports the logs data through the FIFO, as follows for any docker execution:
# docker
docker exec <container-id> sh -c 'cat /export-logs > /tmp/tmp.tar; tar tf /tmp/tmp.tar > /dev/null && cat /tmp/tmp.tar || tar cf - $(cat /tmp/tmp.tar)' | tar xf -
# kubernetes
kubectl exec <pod-id> sh -c 'cat /export-logs > /tmp/tmp.tar; tar tf /tmp/tmp.tar > /dev/null && cat /tmp/tmp.tar || tar cf - $(cat /tmp/tmp.tar)' | tar xf -
The above command blocks, when the workload execution is in progress, and exits, after the workload is completed (thus it is time for cleanup).
ENTRYPOINT reserved feature¶
Do not use ENTRYPOINT in the Dockerfile. This is a reserved feature for future extension.
Workaround the software.intel.com proxy issue¶
The Intel proxy setting includes intel.com in the no_proxy setting. This is generally an ok solution but software.intel.com is an exception, which must go through the proxy. Use the following workaround on the specific command that you need to bypass the intel.com restriction:
RUN no_proxy=$(echo $no_proxy | tr ',' '\n' | grep -v -E '^.?intel.com$' | tr '\n' ',') yum install -y intel-hpckit
See Also¶
- How to Create a Workload
- Provisioning Specification
- [Workload with Dataset][Workload with Dataset]
Component Design Image¶
Source: doc/developer-guide/component-design/image.md
The document describes how to build VM images with HashiCorp's Packer. The projects are located under the image directory.
Each folder under image equates to a VM image project. For an example, look at image/HostOS.
Prerequisites¶
This document does not cover the required WSF preconfiguration. Follow the required steps to:
- Configure the WSF environment instructions
- Configure your Terraform backend. If you haven't setup terraform, please follow the instructions to setup terraform for Cloud validation.
Navigating the WSF VM Image folder structures¶
./images : Contains the VM image projects. Each folder is a VM image project.
./script/terraform/ : Contains the Terraform configuration files for each CSP. For example: terraform-config.azure.tf contains the Azure variables, including the os_type = "ubuntu2204"
./script/terraform/ : Custom files can be created to be used during cmake, for example, create terraform-config.my-custom-azure.tf and use -DTERRAFORM_SUT=my-custom-azure .
./script/terraform/template/packer/<csp>/generic : Includes Packer files, including VM Image Offer/Publisher/SKU mapping.
Note that only one variable os_type that is defined in the terraform-config file is required, the Offer/Publisher are automatically set based on the os_type. See the mapping here: ./script/terraform/template/packer/<csp>/generic
Getting Started¶
Overview on how to build VM images using the WSF framework.
Building VM Images¶
Assuming you have configured WSF environment and Terraform backend, you can start building VM images.
Below is an example of how to build a VM image using the HostOS project.
# Clone the repo and create a build directory
git clone https://github.com/intel/workload-services-framework.git wsf-fork
cd wsf-fork
mkdir build
cd build
# Run cmake to build the project. This uses the HostOS project under images folder, and Azure configuration under script/terraform
cmake -DBENCHMARK=image/HostOS -DPLATFORM=SPR -DTERRAFORM_SUT=azure ..
# Build the Terraform containers/configuration
make build_terraform
# Log into Azure container, login to Azure, and exit
make azure
az login
exit
# Run the 'make' command to start VM Image creation process
make
Note how the cmake command specifies the DBENCHMARK, DPLATFORM, and DTERRAFORM_SUT variables.
DBENCHMARK : Specifies the VM Image project to be used. In this case, it is image/HostOS.
DTERRAFORM_SUT: Specifies the CSP configuration to be used. These configuration files are located at ./script/terraform/*.tf
For example ./script/terraform/terraform-config.azure.tf is used for Azure.
Custom files can be created as used during cmake, for example terraform-config.my-custom-azure.tf and specify DTERRAFORM_SUT=my-custom-azure to use it.
Cleanup process¶
# From the 'build' folder, log into the Azure container
make azure
# To cleanup just the terraform files
cleanup
# To cleanup images
cleanup --images
# Exit Azure container
exit
CMakeLists.txt¶
Make the project depends on the terraform backend, and use the add_image function to declare the VM image project name as follows:
build.sh - Configuration of Image creation¶
build.sh can be used to pass custom values to variables for the image creation process.
In the build.sh script, you should specify the project variables and call the script/terraform/packer.sh script to build the VM images.
The packer.sh takes the following arguments:
The project name defines where the packer script location, expected to be under template/packer/<csp>/<project-name>, where <csp> is the Cloud Service Provider.
The script defines the following environment variables where you can include in your project definitions:
OWNER: The owner string.REGION: The region string.ZONE: The availability zone string.NAMESPACE: The randomly generated namespace string of the current packer run.INSTANCE_TYPE: The CSP instance type.SPOT_INSTANCE: A boolean value to specify whether the build should use a spot instance.OS_DISK_TYPE: The OS disk type.OS_DISK_SIZE: The OS disk size.ARCHITECTURE: The architecture:x86_64,amd64, orarm64.SSH_PROXY_HOST: The socks5 proxy host name.SSH_PROXY_PORT: The socks5 proxy host port value.OS_IMAGE: The os_image value.
An example of build.sh may look like the following:
#!/bin/bash -e
COMMON_PROJECT_VARS=(
'owner=$OWNER'
'region=$REGION'
'zone=$ZONE'
'job_id=$NAMESPACE'
'instance_type=$INSTANCE_TYPE'
'spot_instance=$SPOT_INSTANCE'
'os_disk_type=$OS_DISK_TYPE'
'os_disk_size=$OS_DISK_SIZE'
'architecture=$ARCHITECTURE'
'ssh_proxy_host=$SSH_PROXY_HOST'
'ssh_proxy_port=$SSH_PROXY_PORT'
'image_name=wsf-${OS_TYPE}-${ARCHITECTURE}-dataset-ai'
'ansible_playbook=../../../ansible/custom/install.yaml'
)
DIR="$( cd "$( dirname "$0" )" &> /dev/null && pwd )"
. "$DIR"/../../script/terraform/packer.sh generic $@
Optionally, you can also define CSP-specific variables, which will be merged with the common variables when running packer.sh:
AWS_PROJECT_VARS=(
'subnet_id=$SUBNET_ID'
'security_group_id=$SECURITY_GROUP_ID'
)
GCP_PROJECT_VARS=(
'subnet_id=$SUBNET_ID'
'project_id=$PROJECT_ID'
'min_cpu_platform=$MIN_CPU_PLATFORM'
'firewall_rules=$FIREWALL_RULES'
)
AZURE_PROJECT_VARS=(
'subscription_id=$SUBSCRIPTION_ID'
'availability_zone=$AVAILABILITY_ZONE'
'network_name=$NETWORK_NAME'
'subnet_name=$SUBNET_NAME'
'managed_resource_group_name=$RESOURCE_GROUP_NAME'
)
ORACLE_PROJECT_VARS=(
'subnet_id=$SUBNET_ID'
'compartment=$COMPARTMENT'
'cpu_core_count=$CPU_CORE_COUNT'
'memory_size=$MEMORY_SIZE'
)
TENCENT_PROJECT_VARS=(
'vpc_id=$VPC_ID'
'subnet_id=$SUBNET_ID'
'resource_group_id=$RESOURCE_GROUP_ID'
'security_group_id=$SECURITY_GROUP_ID'
'os_image_id=$OS_IMAGE_ID'
)
ALICLOUD_PROJECT_VARS=(
'vpc_id=$VPC_ID'
'resource_group_id=$RESOURCE_GROUP_ID'
'security_group_id=$SECURITY_GROUP_ID'
'vswitch_id=$VSWITCH_ID'
'os_image_id=$OS_IMAGE_ID'
)
KVM_PROJECT_VARS=(
'kvm_host=$KVM_HOST'
'kvm_host_user=$KVM_HOST_USER'
'kvm_host_port=$KVM_HOST_PORT'
'pool_name=${KVM_HOST_POOL/null/osimages}' # Must exist
'os_image=null'
)
The image building Ansible playbooks can also be applied directly to a static worker:
STATIC_PROJECT_VARS=(
'ssh_port=$SSH_PORT'
'user_name=$USER_NAME'
'public_ip=$PUBLIC_IP'
'private_ip=$PRIVATE_IP'
)
Ansible playbooks¶
You should write custom installation scripts in Ansible playbooks, usually located under template/ansible/<project-name>/install.yaml. This location can be overwritten if you specify ansible_playbook in build.sh.
See basevm-generic for an example of creating a debian11 VM images with updated kernel 5.19.
More information about Ansible playbooks can be found in Ansible playbooks documentation.
Declare Ingredients in Ansible playbooks¶
Any component ingredients should be declared in defaults/*.yaml or defaults/*.yml:
where you can use the pairs of _version + _package or _version + _repository.
_version,_ver: Declare the ingredient version._repository,_repo: Declare the ingredient repository._package,_pkg: Declare the package location.
Note: The name of the variable is case-insensitive. Meaning, for example
_versioncan be defined as_VERSION.
Component Design Kpi¶
Source: doc/developer-guide/component-design/kpi.md
The kpi.sh script parses the validation output and exports a set of key/value pairs to represent the workload performance.
Output format¶
The following is some example of the KPI data:
# this is a test ## Optional comments
## threads: 4 ## Tunable parameters overwrite
throughput: 123.45 ## Simple key/value
throughput (op/s): 123.45 ## Key, unit (in parentheses) and value
*throughput (images/s): 123.45 ## Primary KPI for regression reporting
throughput: 123.45 # This is a tooltip ## Comment shown as toolkit in UI
Please note that it is crucial that the decimal separator is a point (
.), not a comma (,).
Parsing the output¶
To avoid introducing additional software dependencies, it is recommended to use gawk to parse the validation logs and format the output.
The validation output is assumed to be stored at 1 layer under the current directory. The kpi.sh example is as follows:
where 2>/dev/null suppresses any error message if */output.logs does not exist, and ||true makes the kpi.sh always returns an ok status.
Check The GNU AWK User's Guide for more information on how to write a parsing script using
gawk.
Component Design Kubernetes Config¶
Source: doc/developer-guide/component-design/kubernetes-config.md
The kubernetes-config.yaml script is a manifest that describes how the workload container(s) should be scheduled (to the machine cluster described by cluster-config.yaml.) This is the standard Kubernetes script.
Templating possibilities¶
You can choose to write kubernetes-config.yaml in any of the following formats:
- kubernetes-config.yaml: For simple workloads, you can directly write the Kubernetes deployment scripts.
- kubernetes-config.yaml.m4: Use the .m4 template to add conditional statements in the Kuberentes deployment scripts.
- kuberentes-config.yaml.j2: Use the .j2 template to add conditional statements in the Kubernetes deployment scripts.
- helm charts: For complex deployment scripts, you can use any helm charts under the helm directory.
Image Name¶
The container image in kubernetes-config.yaml should use the full name in the format of <REGISTRY><image-name><IMAGESUFFIX><RELEASE>, where <REGISTRY> is the docker registry URL (if any), <IMAGESUFFIX> is the platform suffix, and the <RELEASE> is the release version, (or :latest if not defined.)
If you use the .m4 template, the IMAGENAME macro can expand an image name to include the registry and release information:
include(config.m4)
...
spec:
...
spec:
containers:
- name: database
image: IMAGENAME(wordpress5mt-defn(`DATABASE'))
...
If you use the .j2 template or helm charts, you must write the image name as follows:
...
spec:
...
spec:
containers:
- name: database
image: "{{ REGISTRY }}wordpress5mt-{{ DATABASE }}{{ IMAGESUFFIX }}{{ RELEASE }}"
...
About imagePullPolicy¶
To ensure that the validation runs always on the latest code, it is recommended to use imagePullPolicy: Always.
Not all docker images are built equally. Some are less frequently updated and less sensitive to performance. Thus it is preferrable to use imagePullPolicy: IfNotPresent.
About podAntiAffinity¶
To spread the pods onto different nodes, use podAntiAffinity as follows:
...
metadata:
labels:
app: foo
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- foo
topologyKey: "kubernetes.io/hostname"
...
If you use the .m4 template, you can use the PODANTIAFFINITY macro:
If you use the .j2 template or helm charts, there is no convenient function for above. You have to write the podAntiAffinity terms in explicit.
See Also¶
- Requirements for Internet Hosts - RFC-1123
- Choosing a name for your computer - RFC-1178
- K8s Label - syntax and character set
Component Design Native Script¶
Source: doc/developer-guide/component-design/native-script.md
Terraform scripts can be used to provision any Cloud SUT instances. Usually, Cloud provisioning needs are common among many workloads
thus the terraform scripts are shared under [script/terraform/template/terraform][terraform template]. However,
if a workload provides custom terraform scripts under workload/<workload>/template/terraform/<CSP>/main, where <workload> is the
workload name and <CSP> is the Cloud provider abbreviation, the specified terraform scripts will be used instead to provision the
SUT instances.
Custom Terraform scripts¶
Custom terraform scripts should be used for unique provisioning requirements only. It is not recommended to utilize custom terraform scripts for trivial needs, which as a result duplicates common code.
If provided, the custom terraform scripts of a workload must be implemented as follows:
- The scripts must implement the same set of input variables such that they can be used (as a module) by
[script/terraform/terraform-config.<CSP>.tf][terraform config csp]. For example, the scripts must take region
zone, and owner as input variables.
- The scripts must implement the same set or a superset of output variables as required by
[script/terraform/terraform-config.<CSP>.tf][terraform config csp]. For example, the scripts must export the
SUT instance public and private IPs.
- The scripts must properly tag the Cloud resources with owner: <name> so that the CSP cleanup script can cleanup the corresponding
Cloud resources.
See [HammerDB TPCC PAAS][HammerDB TPCC PAAS] for an example of customizing terraform scripts.
Ansible Scripts¶
Ansible scripts can be used to customize the workload host setup, workload execution, and cleanup.
flowchart LR;
installation[Installation];;
deployment[Deployment];;
cleanup[Cleanup];;
installation --> deployment --> cleanup;
Please observe the following requirements when writing your custom Ansible scripts:
- If specified, the installation playbook should be at template/ansible/custom/installation.yaml, relative to the workload directory. Additional
roles can be present underneath the custom/roles directory. The installation playbook must install all benchmark software onto the SUTs.
- If specified, the cleanup Ansible playbook should be at template/ansible/custom/cleanup.yaml, relative to the workload directory. The cleanup
script should remove any installed software and restore the system environment.
- If specified, the workload execution laybook should be at template/ansible/custom/deployment.yaml, relative to the workload directory. Additional
roles can be present underneath the custom/roles directory. The deployment playbook typically implements the following features:
flowchart TB;
start((Start));;
exec[Execution];;
stop((End));;
start --> exec --> | itr | exec --> stop;
The start stage prepares the workload execution. For example, create and copy the execution scripts to the SUT. Note that any software installation should be done in installation.yaml, not in deployment.yaml. The execution stage runs the workload multple times (as specified by the iteration number wl_run_iterations.)
Within each iteration, perform the following steps:
- Invoke the timing role to record the workload start timing.
- Run the workload. Save the workload status and logs under {{ wl_logs_dir }}/itr-{{ itr }}/<pod>, where <pod> is an arbitrary directory that identifies the benchmark pod.
- During the workload execution, invoke the trace role such that telemetry can be collected during the workload execution.
- Invoke the timing role to record the workload stop timing.
# deployment.yaml
- hosts: "{{ ('controller' in groups) | ternary('controller','localhost') }}"
gather_facts: no
become: false
tasks:
- name: run workloads over iterations
include_role:
name: deployment
when: (ansible_connection|default('ssh')) in ['ssh','local']
loop: "{{ range(1, run_stage_iterations | default(1) |int+1, 1) | list }}"
loop_control:
loop_var: itr
# roles/deployment/tasks/main.yaml
- include_role:
name: timing
tasks_from: start-iteration
- name: run the workload
...
- name: invoke traces
include_role:
name: trace
vars:
trace_waitproc_pid: "{{ workload_process_pid }}"
trace_logs_scripts: "..."
- include_role:
name: timing
tasks_from: stop-iteration
- name: collect trace data
include_role:
name: trace
tasks_from: collect
when: wl_trace_modules | default('') | split(',') | reject('==','') | length > 0
ignore_errors: yes
run_once: true
- name: "collect workload logs under {{ wl_logs_dir }}/itr-{{ itr }}/benchmark"
file:
path: "{{ wl_logs_dir }}/itr-{{ itr }}/benchmark"
state: directory
delegate_to: localhost
run_once: true
...
The Ansible playbooks can use the following workload parameters:
| Name | Description |
|---|---|
wl_name |
The workload name. |
wl_namespace |
A unique identifier to identify this workload execution. |
wl_run_iterations |
Specify how many times a workload must execute in sequence. |
workload_config |
The content of the workload-config yaml file. |
List Ingredients¶
Any significant ingredients used in the workload must be declared with a matched pair of variable definitions in defaults/main.yaml or defaults/main/*.yaml. so that we can easily list ingredients of a workload, for example:
The following suffixes are supported:
- _REPO or _REPOSITORY: Specify the ingredient source repository location.
- _VER or _VERSION: Specify the ingredient version.
- _IMG or _IMAGE: Specify an ingredient docker image.
- _PKG or _PACKAGE: Specify an ingredient OS package, such as deb or rpm.
_VERand the corresponding_REPO/_PACKAGE/_IMAGEmust be in a pair to properly show up in the Wiki ingredient table. For example, if you defineOS_VER, then there should be anOS_IMAGEdefinition. Avoid complicated jinja templates in the variable definitions. For parsing ingredient purpose, the only supported jinja pattern is{{ variable }}.
Ansible Script Examples¶
See [SpecSERT][SpecSERT] for an example of customzing Ansible scripts.
Ansible Resources¶
Ansible Inventory¶
The following ansible inventory groups can be used in the scripts:
| Name | Description |
|---|---|
workload_hosts |
This group contains the VM hosts that are used to run the workloads. If Kubernetes is used, the group refers to VM workers that are within the Kubernetes cluster. |
cluster_hosts |
If Kubernetes is used, this group refers to all VM workers and the Kubernetes controller. |
off_cluster_hosts |
This group refers to VM hosts that are outside the Kubernetes cluster. |
trace_hosts |
This group includes all VM hosts that must collect performance traces. |
controller |
This is the Kubernetes controller group. |
worker, client, etc |
These are the workload VM groups. |
vsphere_hosts, kvm_hosts, etc |
These groups contain the physical hosts where the workload VMs reside. |
Avoid using
hosts: allin your custom ansible scripts, as the hosts may refer to different types of hosts, for example, workload VMs or the physical hosts where the VMs reside. Usehosts: cluster_hosts:off_cluster_hoststo refer to all workload VMs.
The following code paths are available to ansible scripts:
| Directory | Description |
|---|---|
/opt/workspace |
Map to the logs directory of the current workload. |
/opt/workload |
Map to the source directory of the current workload. |
/opt/<backend> |
Map to the backend-specific resources. For terraform, this is /opt/terraform, which maps to <project>/script/terraform. |
/opt/project |
Map to the root directory of the WSF code base. |
Avoid using
/opt/projectto access any backend-specific resources. Use/opt/<backend>instead. This makes it possible to selectively use either the resources within the backend container or the current code in the repository.
Common Ansible Scripts¶
The following ansible roles/tasks can be invoked for common functionalities:
- containerd: This role installs the containerd engine.
- docker: This role installs the docker engine.
trace: Thetracerole starts the trace procedure and waits for the workload to complete.trace_waitproc_pid: The pid of the workload process.trace_logs_scripts: The list of commands to show service logs. The logs will be piped to the trace procedure for determining ROI triggers.trace_logs_host: The host that thetrace_logs_scriptsscripts run on. This is optional.trace_status_file: Specify a status file that contains the workload return status code.
Please note that the
tracerole should not be executed on multiple hosts. Userun_once: trueto restrict the execution to the first host:
- include_role:
name: trace
run_once: true
Once the trace role completes, you can retrieve the trace results by invoking the `collect` task of the `trace` role:
- **`docker-image`**: The `docker-image` role copies a set of docker images to a remote docker daemon (`to-daemon.yaml`) or a remote registry (`to-registry.yaml`):
- `wl_docker_images`: The dictionary of docker images. The keys are the image names and the values are boolean, true if the images are from a secured docker registry. If unsure, set to `true`.
- **`timing`**: The `timing` role records the start/stop timing of various workload stages with the following tasks to be invoked with:
- **start/stop-setup**: Record the setup timing.
- **start/stop-image-transfer**: Record the image transfer timing.
- **start/stop-iteration**: Record the workload iteration timing.
- **start/stop-roi**: Record the workload trace ROI timing.
[terraform template]: ../../../script/terraform/template/terraform
[terraform config csp]: ../../../script/terraform/terraform-config.aws.tf
[HammerDB TPCC PAAS]: ../../../workload/HammerDB-TPCC-PAAS
[SpecSERT]: ../../../workload/HammerDB-TPCC-PAAS/README.md
---
## Component Design North Traffic
_Source: `doc/developer-guide/component-design/north-traffic.md`_
##### Introduction
This article describes the technique used to create workloads with North-South traffic, i.e., traffic that enters and leaves a Kubernetes cluster. The
implementation is based on the terraform backend.
```mermaid
flowchart TD;;
client1[client 1];;
client2[client 2];;
controller[Kubernetes<br>Controller];;
sut1[Kubernetes Worker 1];;
sut2[Kubernetes Worker 2];;
sut3[Kubernetes Worker 3];;
client1 & client2 <--> controller;;
controller <--> sut1 & sut2 & sut3;;
Request Off-Cluster Nodes¶
You can request one or many worker nodes to be off the Kubernetes cluster as follows in cluster-config.yaml.m4:
off_cluster option indicates that the requested node is not part of the Kubernetes cluster.
Optionally, you can define any variables that you might want to pass to the ansible scripts, under terraform:
Install Software on Off-Cluster Nodes¶
Unlike Kubernetes workers, where the terraform scripts apply the Kubernetes deployment scripts directly, you have to write custom ansibles to install any native software on the off-cluster nodes as follows:
You can overwrite the template/ansible/kubernetes/installation.yaml script to install any custom software on the off-cluster nodes:
- import_playbook: installation.yaml.origin
- hosts: worker-1
become: yes
gather_facts: no
tasks:
- name: Install docker
include_role:
name: docker
- hosts: worker-1
gather_facts: no
tasks:
- name: Transfer client image
include_role:
name: image-to-daemon
vars:
images:
- key: "{{ off_cluster_docker_image_name }}"
value: false
wl_docker_images: "{{ images | items2dict }}"
docker on the off-cluster node and then transfers the custom docker image to the node. Here
wl_docker_images is a dictionary of docker images, where the keys are the docker images and the values are either true or false,
indicating whether the docker images are from a secured or unsecured docker registry. Use false if you are not sure.
Execute Software on Off-Cluster Nodes¶
You should manage the native software execution on the off-cluster nodes, by inserting ansible snippets into the regular Kubernetes
execution process, managed by template/ansible/kubernetes/deployment.yaml and template/ansible/kubernetes/roles/deployment/tasks/process-traces-and-logs.yaml, whereas the former controls the entire process and the later performs trace and log collection during the workload execution.
Overwrite template/ansible/kubernetes/deployment.yaml if you need to do any preparation work, for example, to initialize the Kubernetes deployment script with real cluster IP address, as follows:
- hosts: localhost
gather_facts: no
tasks:
- name: rewrite cluster IP
replace:
path: "{{ wl_logs_dir }}/kubernetes-config.yaml.mod.yaml"
regexp: "127.0.0.1"
replace: "{{ hostvars['controller-0']['private_ip'] }}"
- import_playbook: deployment.yaml.origin
127.0.0.1 is a place holder IP address.
Overwrite the template/ansible/kubernetes/roles/deployment/tasks/process-traces-and-logs.yaml script to insert off-cluster software execution.
Depending on the workload design, if the off-cluster node is used simply as a client simulator and the workload traces and logs are within the Kubernetes cluster, then you can simply start the off-cluster node and then release the control back to the terraform original code:
# This step should not block as the Kubernetes services may not be up and running yet.
- name: start off-cluster-node
command: "docker run --rm -d {{ off_cluster_docker_image_name }}"
register: container
delegate_to: worker-1
- name: resume Kubernetes routines
include_tasks:
fle: process-traces-and-logs.yaml.origin
- name: destroy container
command: "docker rm -f {{ container.stdout }}"
delegate_to: worker-1
If the off-cluster node is where the traces and logs must be collected, you can use the off-cluster-docker.yaml script, which is a common utility that monitors the container until completion, at the same time collecting traces for all the hosts (including the Kubernetes workers.)
- name: start off-cluster-node
command: "docker run --rm -d {{ off_cluster_docker_image_name }}"
register: container
delegate_to: worker-1
- name: monitor the docker execution and process traces and logs
include_tasks:
file: off-cluster-docker.yaml
vars:
off_cluster_host: worker-1
off_cluster_container_id: "{{ container.stdout }}"
- name: start off-cluster-node
command: "docker rm -f {{ container.stdout }}"
register: container
delegate_to: worker-1
If your off-cluster node execution isn't based on docker, you can model from the off-cluster-docker.yaml script to write your
own traces and logs collection routine.
Component Design Nsys Hlprof¶
Source: doc/developer-guide/component-design/nsys-hlprof.md
Introduction¶
The trace tools nsys and hlprof are workload profiling tools for CUDA and Habana Gaudi accelerators, respectively. This document describes the steps required to integrate nsys and hlprof.
The nsys Trace Tool¶
Restrictions¶
The nsys trace tool does not like other trace tools, which work on the host system, independent of the workload execution. The nsys tool requires that the workload be launched by the nsys launch command. This limitation restricts the tool usage scenarios:
- The nsys tool does not support :0 or :host tracing placements.
- The nsys tool must be used to launch the workload executable. The current implementation limits nsys to containerized workloads only, run under the docker engine.
Create nsys Containers¶
The nsys tool must be installed within the workload containers. Since the nsight-system installation alone occupies about 1.2GB, you might want to create different container images with and without nsys. You can use the condition [[ " $TERRAFORM_OPTIONS $CTESTSH_OPTIONS " = *" --nsys "* ]] to switch between container images.
ARG OS_VER=24.04
ARG OS_IMAGE=ubuntu
FROM ${OS_IMAGE}:${OS_VER}
RUN apt-get update -y && apt-get install -y --no-install-recommends gnupg curl && \
apt-get clean -y && rm -rf /var/lib/apt/lists/*
ARG NVIDIA_DEVTOOLS_VER=3bf863cc
ARG NVIDIA_DEVTOOLS_REPO=http://developer.download.nvidia.com/compute/cuda/repos
RUN curl --netrc-optional --retry 10 --retry-connrefused -fsSL -o /tmp/${NVIDIA_DEVTOOLS_VER}.pub ${NVIDIA_DEVTOOLS_REPO}/$(. /etc/os-release;echo $ID$VERSION_ID | tr -d .)/$(uname -m)/${NVIDIA_DEVTOOLS_VER}.pub && \
gpg --yes --dearmor -o /usr/share/keyrings/nvidia-devtools.gpg /tmp/${NVIDIA_DEVTOOLS_VER}.pub && \
echo "deb [signed-by=/usr/share/keyrings/nvidia-devtools.gpg] ${NVIDIA_DEVTOOLS_REPO}/$(. /etc/os-release;echo $ID$VERSION_ID | tr -d .)/$(uname -m) /" > /etc/apt/sources.list.d/nvidia-devtools.list && \
apt-get update -y && apt-get install -y --no-install-recommends nsight-systems && \
apt-get clean -y && rm -rf /var/lib/apt/lists/*
ENV PATH=/usr/lib/nsight-systems/host-linux-x64:$PATH
...
RUN mkfifo /export-logs
CMD (nsys launch /run_test.sh; echo $? > status) 2>&1 | tee output.logs && \
echo "status output.logs" > /export-logs && \
sleep infinity
nsys launch.
The following points must be followed:
- The executable nsys must be on the PATH.
- The executable QdstrmImporter must be on the PATH.
If successful, your logs directory should contain the trace data:
$ ls
nsys-c0r1.logs nsys-c0r1.nsys-rep.logs
nsys-c0r1.nsys-rep nsys-c0r1.nsys-rep_nvtx_sum.csv
nsys-c0r1.nsys-rep_cuda_api_sum.csv nsys-c0r1.nsys-rep_openacc_sum.csv
nsys-c0r1.nsys-rep_cuda_api_sync.csv nsys-c0r1.nsys-rep_opengl_khr_gpu_range_sum.csv
nsys-c0r1.nsys-rep_cuda_gpu_kern_sum.csv nsys-c0r1.nsys-rep_opengl_khr_range_sum.csv
nsys-c0r1.nsys-rep_cuda_gpu_mem_size_sum.csv nsys-c0r1.nsys-rep_openmp_sum.csv
nsys-c0r1.nsys-rep_cuda_gpu_mem_time_sum.csv nsys-c0r1.nsys-rep_osrt_sum.csv
nsys-c0r1.nsys-rep_cuda_memcpy_async.csv nsys-c0r1.nsys-rep_um_cpu_page_faults_sum.csv
nsys-c0r1.nsys-rep_cuda_memcpy_sync.csv nsys-c0r1.nsys-rep_um_sum.csv
nsys-c0r1.nsys-rep_cuda_memset_sync.csv nsys-c0r1.nsys-rep_um_total_sum.csv
nsys-c0r1.nsys-rep_dx11_pix_sum.csv nsys-c0r1.nsys-rep_vulkan_gpu_marker_sum.csv
nsys-c0r1.nsys-rep_dx12_gpu_marker_sum.csv nsys-c0r1.nsys-rep_vulkan_marker_sum.csv
nsys-c0r1.nsys-rep_dx12_mem_ops.csv nsys-c0r1.nsys-rep_wddm_queue_sum.csv
nsys-c0r1.nsys-rep_dx12_pix_sum.csv nsys-collect.logs
nsys-c0r1.nsys-rep_gpu_gaps.csv TRACE_START
nsys-c0r1.nsys-rep_gpu_time_util.csv TRACE_STOP
The hlprof Trace Tool¶
Restrictions¶
The hlprof trace tool does not like other trace tools, which work on the host system, independent of the workload execution. The hlprof tool requires that the workload be launched with the environment variable HL_PROFILE=true. This limitation restricts the tool usage scenarios:
- The hlprof tool does not support :0 or :host tracing placements.
- The hlprof tool must be used to launch the workload executable. The current implementation limits hlprof to containerized workloads only, run under the docker engine.
- The hlprof tool cannot precisely stop a trace collection based on a stop phrase, unlike other trace tools. The stopping mechanism is controlled by the hlprof_options variable, which by default is defined as -g 1-2 -b 250, or capturing traces up to 2 enqueue invocations.
Create hlprof Containers¶
The hl-prof-config tool must be installed within the workload containers. This is usually preinstalled in the Habana Gaudi base-image containers.
ARG HABANA_VER="1.16.0-526"
ARG HABANA_IMG=vault.habana.ai/gaudi-docker/1.16.0/ubuntu22.04/habanalabs/pytorch-installer-2.2.2
FROM ${HABANA_IMG}:${HABANA_VER}
...
RUN mkfifo /export-logs
CMD (HABANA_PROFILE=1 /run_test.sh; echo $? > status) 2>&1 | tee output.logs && \
echo "status output.logs" > /export-logs && \
sleep infinity
HABANA_PROFILE=1 enables the Habana Gaudi trace system.
If successful, your logs directory should contain the trace data:
$ ls -1 -R workload/LLMs-PyTorch-OOB/logs-mygaudi_llms_pytorch_gaudi_inference_throughput_bfloat16_pkm/worker-0-1-hlprof/
workload/LLMs-PyTorch-OOB/logs-mygaudi_llms_pytorch_gaudi_inference_throughput_bfloat16_pkm/worker-0-1-hlprof/:
hlprof-c0r1
hlprof-c0r1.logs
hlprof-collect.logs
TRACE_START
workload/LLMs-PyTorch-OOB/logs-mygaudi_llms_pytorch_gaudi_inference_throughput_bfloat16_pkm/worker-0-1-hlprof/hlprof-c0r1:
hlprof-c0r1_84.hltv
Component Design Persistent Volumes¶
Source: doc/developer-guide/component-design/persistent-volumes.md
Introduction¶
The WSF supports OpenEBS or local-static-provisioner as optional Kubernetes plugins for local persistent volumes.
OpenEBS¶
Request OpenEBS support¶
Request to install the OpenEBS operator as follows in cluster-config.yaml.m4:
This requests that the OpenEBS operator be installed in the Kubernetes cluster. The default storage class is local-hostpath, which uses the storage path /mnt/disk1. You can define additional storage class in your workload.
Use Persistent Volume¶
In your workload Kubernetes deployment script (or in helm charts), declare PersistentVolumeClaim and VolumeMounts as follows:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: local-hostpath-pvc
spec:
storageClassName: local-hostpath
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5G
---
apiVersion: batch/v1
kind: Job
metadata:
name: dummy-benchmark
spec:
template:
spec:
containers:
- name: dummy-benchmark
image: IMAGENAME(Dockerfile)
imagePullPolicy: IMAGEPOLICY(Always)
env:
- name: `SCALE'
value: "SCALE"
- name: `RETURN_VALUE'
value: "RETURN_VALUE"
- name: `SLEEP_TIME'
value: "SLEEP_TIME"
volumeMounts:
- mountPath: /mnt/disk1
name: local-storage
volumes:
- name: local-storage
persistentVolumeClaim:
claimName: local-hostpath-pvc
restartPolicy: Never
Local-Static-Provisioner¶
Request Local-Static-Provisioner Support¶
Request to install the local-static-provisioner plugin as follows in cluster-config.yaml.m4:
cluster:
- labels:
HAS-SETUP-DISK-SPEC-1: required
terraform:
k8s_plugins:
- local-storage-provisioner
This requests that the local-static-provisioner plugin be installed in the Kubernetes cluster. The default storage class is local-static-storage, which uses the storage path /mnt/disk1. You can define additional storage class in your workload.
Use Persistent Volume¶
In your workload Kubernetes deployment script (or in helm charts), declare PersistentVolumeClaim and VolumeMounts as follows:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: local-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-static-storage
---
apiVersion: batch/v1
kind: Job
metadata:
name: dummy-benchmark
spec:
template:
spec:
containers:
- name: dummy-benchmark
image: IMAGENAME(Dockerfile)
imagePullPolicy: IMAGEPOLICY(Always)
env:
- name: `SCALE'
value: "SCALE"
- name: `RETURN_VALUE'
value: "RETURN_VALUE"
- name: `SLEEP_TIME'
value: "SLEEP_TIME"
volumeMounts:
- mountPath: /mnt/disk1
name: local-storage
volumes:
- name: local-storage
persistentVolumeClaim:
claimName: local-claim
restartPolicy: Never
Component Design Readme¶
Source: doc/developer-guide/component-design/readme.md
Each workload should have a README with the required instructions.
Sections¶
The workload README should have the following sections: - Introduction: Introduce the workload and any background information. - Test Case: Describe the test cases. - Configuration: Describe the workload configuration parameters. - Execution: Show some examples of how to run the workload. - KPI: Describe the KPI definitions and the meanings of the values. - Performance BKM: Describe system setup and any performance tunning tips. - Index Info: List the workload indexing information. - Validation Notes: This section is auto-inserted by the validation team. New workload should remove this section. - See Also: Add any workload-related references.
The dummy workload README for reference.
Performance BKM¶
It is recommended to include (but not limited to) the following information in the Performance BKM section: - The minimum system setup (and the corresponding testcase). - The recommended system setup (and the corresponding test case). - Workload parameter tuning guidelines. - Links to any performance report(s).
Component Design Secrets¶
Source: doc/developer-guide/component-design/secrets.md
Introduction¶
This document describes how to handle secrets (such as an access token) in the workload development.
Configure Secrets¶
Store user secrets under $PROJECTDIR/script/csp/.<domain>/config.json (with mode 600), where $PROJECTDIR is the root of the repository, and .<domain>/config.json is a domain specific configuration file. The JSON format is preferred but it can be any convenient format.
Read Secrets¶
The workload validate.sh can read the workload secrets into environment variables. Special care must be taken not to expose the secret values:
- Declare the secret variable in
WORKLOAD_PARAMSwith a leading-. This will ensure that the secret values won't be accidentally shown on the screen, in any of the visible configuration files, or be uploaded to the WSF dashboard in subsequent operations.
- The WSF assumes a limited set of host-level utilities that can be used in bash scripts.
jq(a popular utility to access json constructs) is not one of them. You can instead usesedto parse the json configuration file. While parsing the secret values, pay attention not to expose the values directly on the command line.
Use Secrets in Docker¶
To use the workload secrets in a docker execution, declare DOCKER_OPTIONS in validate.sh:
or use a dedicated docker-config.yaml:
Do not expose the TOKEN value on the command line. Let docker read from the environment instead.
Use Secrets in Docker-Compose¶
To use workload secrets in a docker-compose file, use the environment variables to access the secret values:
Use Secrets in Kubernetes Scripts/Helm Charts:¶
Use the workload-config secret (auto-generated) to access the workload secrets in a Kubernetes configuration file or in Helm Charts:
# kubernetes-config.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: my-workload
spec:
template:
spec:
containers:
- name: my-workload
image: ...
env:
- name: TOKEN
valueFrom:
secretKeyRef:
name: workload-config
key: TOKEN
...
Access Secretes in Native Ansible Scripts¶
Use the following code snippets to use the workload secrets as environment variables. Be careful not to show the secret values in the ansible debugging output or on the command line on a SUT.
# deployment.yaml
- name: Use my secret
command: curl --header $TOKEN ...
environment: "{{ workload_secrets }}"
vars:
workload_secrets: "{{ lookup('file',wl_logs_dir+'/.workload-secret.yaml) | from_yaml }}"
Component Design Stack¶
Source: doc/developer-guide/component-design/stack.md
A software stack is the underlying software layers that a workload is constructed upon. The software layers include reusable software libraries, ansible scripts, docker images, and microservices.
Structure¶
Software stack consists of the following elements, some described in this document and others in the linked document.
- Dockerfiles: A software stack may contain one or many Dockerfiles.
- CMakeLists.txt: A manifest to configure
cmake. - build.sh: A script for building the workload docker image(s).
Optionally, software stacks can define unit tests similar to how workloads work to verify software stack functionalities.
See Also¶
Component Design Template¶
Source: doc/developer-guide/component-design/template.md
There are templating systems, based on M4 macros (*.m4 files) and Jinja2 (*.j2 files), built into the workload build process. You can use them to simplify the workload recipe development by encapsulating any duplicated steps.
Note: This document lacks information about Jinja templating. We are working on improvements.
Usage¶
To use the template system, create a (or more) .m4/.j2 files under your workload folder, and put any shared templates .m4/.j2 under the template folder under the workload, feature, platform or the top directory. During the build process, those .m4/.j2 files will be expanded to either .tmpm4.xyzt or .tmpj2.xyzt, where xyzt is a random string. The temporary files will be removed after the build.
Example¶
The following sample uses ippmb.m4 to encapsulate the IPP library installation steps:
where ippmb.m4 will be expanded to:
# SPR/Crypto/template/ippmb.m4
ARG IPP_CRYPTO_VERSION="ippcp_2020u3"
ARG IPP_CRYPTO_REPO=https://github.com/intel/ipp-crypto.git
RUN git clone -b ${IPP_CRYPTO_VERSION} --depth 1 ${IPP_CRYPTO_REPO} && \
cd /ipp-crypto/sources/ippcp/crypto_mb && \
cmake . -B"../build" \
-DOPENSSL_INCLUDE_DIR=/usr/local/include/openssl \
-DOPENSSL_LIBRARIES=/usr/local/lib64 \
-DOPENSSL_ROOT_DIR=/usr/local/bin/openssl && \
cd ../build && \
make crypto_mb && \
make install
Pre-defined Variables:¶
- PLATFORM: The platform name that the workload is defined for.
- FEATURE: The hero feature name that the workload is defined under.
- WORKLOAD: The workload name.
- REGISTRY: The private registry.
- RELEASE: The release version.
Component Design Timezone¶
Source: doc/developer-guide/component-design/timezone.md
Introduction¶
It is not a general requirement to align container time zone with what is on the SUT host. However, if you need to sync the container date time with an external PDU, it is desired to align the container time zone with what is on the SUT host.
The time zone information is in the file /etc/localtime and optionally with an environment variable TZ.
Docker Execution¶
For workloads that run with docker, the validation script automatically exposes the TZ environment variable.
The workload should perform the following steps to properly use the TZ value:
- Install the tzdata package.
- Link /etc/localtime: ln -sf /usr/share/zoneinfo/$TZ /etc/localtime.
Most of the time, however, you can bypass the above steps by just mounting /etc/localtime from the host, i.e., specify -v /etc/localtime:/etc/localtime:ro in DOCKER_OPTIONS.
Docker Compose¶
The TZ environment variable is exposed to the docker-compose file. You should use it in your docker-compose file:
Kubernetes¶
The validation script automatically exposes a workload-config secret in your namespace. The secret contains:
- TZ: The time zone string.
You can configure it in your Kubernetes/Helm scripts:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-pod
image: my-pod-image
env:
- name: TZ
valueFrom:
SecretKeyRef:
name: workload-config
key: TZ
Component Design Validate¶
Source: doc/developer-guide/component-design/validate.md
The validate.sh script initiates the workload execution.
Example¶
An example of typical validate.sh is shown as follows:
#!/bin/bash -e
# Read test case configuration parameters
...
# Logs Setting
DIR=$(dirname $(readlink -f "$0"))
. "$DIR/../../script/overwrite.sh"
# Workload Setting
WORKLOAD_PARAMS=(CONFIG1 CONFIG2 CONFIG3)
# Docker Setting
DOCKER_IMAGE="Dockerfile"
DOCKER_OPTIONS=""
# Kubernetes Setting
RECONFIG_OPTIONS="-DCONFIG=$CONFIG"
JOB_FILTER="job-name=benchmark"
. "$DIR/../../script/validate.sh"
where . "$DIR/../../script/overwrite.sh" is a script to support workload parameter overwrite via ctest.sh command line, and . "$DIR/../../script/validate.sh" is a script for workload execution. The validate.sh saves any validation results to the current directory.
Reserved variables¶
The following script variables are reserved. Avoid overwriting their values in validate.sh:
- PLATFORM
- WORKLOAD
- TESTCASE
- DESCRIPTION
- REGISTRY
- RELEASE
- IMAGEARCH
- IMAGESUFFIX
- TIMEOUT
- SCRIPT
Optional parameters¶
Optionally, after . "$DIR/../../script/overwrite.sh", you can invoke the . "$DIR/../../script/sut-info.sh", which queries the Cloud CLI for SUT information. The SUT information is saved as shell variables, example as follows:
SUTINFO_CSP=gcp
SUTINFO_WORKER_VCPUS=6
SUTINFO_WORKER_MEMORY=4096
SUTINFO_CLIENT_VCPUS=2
SUTINFO_CLIENT_MEMORY=2048
SUTINFO_CONTROLLER_VCPUS=2
SUTINFO_CONTROLLER_MEMORY=2048
Validation Parameters¶
WORKLOAD_PARAMS: Specify the workload configuration parameters as an array variable of workload variables. The configuration parameters will be shown as software configuration metadata in the WSF dashboard. Workload configuration parameters can be also be accessed in Ansible usingwl_tunables.<VARIABLE_NAME>e.g.wl_tunables.SCALE.
- prefix to the workload parameter to specify that this workload parameter is a secret. The script will ensure that the value is not exposed to any console print out.
- You can append, after #, workload parameter description to the workload parameter variable to print help messages to the user. You can use backslash escapes or the cat workaround for multiple line descriptions.
WORKLOAD_PARAMS=(
"SCALE#This parameter specifies the number of PI digits."
"RETURN_VALUE#$(cat <<EOF
You can emulate the workload exit code by explicitly
specifying the return exit code.
EOF
)"
)
WORKLOAD_TAGS: Specify any workload related tags as a space separated string.DOCKER_IMAGE: If the workload is a single-container workload and support docker run, specify either the docker image name or theDockerfileused to compile the docker image. If the workload does not support docker run, leave the variable value empty.DOCKER_OPTIONS: Specify any docker run options, if the workload supports docker run.J2_OPTIONS: Specify any configuration parameters when expanding the Jinja2.j2templates.RECONFIG_OPTIONS: Specify any configuration parameters when expanding any Kubernetes deployement script as a.m4template.HELM_OPTIONS: Specify any helm charts build options. This applies to any Kubernetes workloads with the deployment scripts written as helm charts.JOB_FILTER: Specify which job/deployment is used to monitor the validation progress and after validation completion, retrieve the validation logs. You can specify multiple job/deployment filters, using the,as a separator. The first filter is for the benchmark pods, and the rest are service pods. For jobs with multiple containers, you can specify the container name as a qualifier, for example,job-name=dummy-benchmark:dummy-benchmarkSCRIPT_ARGS: Specify the script arguments for thekpi.shorsetup.sh.
Event Tracing Parameters¶
EVENT_TRACE_PARAMS: Specify the event tracing parameters:roi: Specify the ROI-based trace parameters:roi,<start-phrase>,<end-phrase>[,roi,<start-phase>,<stop-phrase> ...]. For example, the trace parameters can beroi,begin region of interest,end region of interest. The workload must be instrumented to print these phrases in the console output.
For more sophisticated multi-line context-based ROI, if the string of the start-phrase or stop-phrase starts with and ends with
/, then the string is a regular expression. Use~to represent any new line character. For example,/~iteration 10.*start workload/triggers the start of the ROI after the 10th iteration.Additional delay can be appended to the start/stop string as follows:
START_BENCHMARK+5s, which specifies that the ROI starts 5 seconds after identifying the starting phraseSTART_BENCHMARK.
time: Specify a time-based trace parameters:time,<start-time>,<trace-duration>[,time,<start-time>,<end-time>]. For example, if the trace parameters aretime,30,10, the trace collection starts 30 seconds after the workload containers become ready and the collection duration is 10 seconds.
For short-ROI workloads (less than a few seconds), it is recommended that you specify the
EVENT_TRACE_PARAMSvalue as an empty string, meaning that the trace ROI should be the entirety of the workload execution, which ensures that the trace collection catches the short duration of the workload execution.Between
roiandtime, useroiif possible and usetimeas the last resort if the workload does not output anything meaningful to indicate a ROI.Note that none of the event tracing mechanisms is timing accurate. You need to define the event trace parameter values with a high timing tolerance, at least in seconds.
PRESWA Parameters¶
The Pre-Si analysis pipeline requires to additionally identify the Process of Interest (POI) of a workload. For example, in a client-server workload, the POI is the service process. The POI is specified as a regular expression that can be used to match the workload process. If the workload uses Kubernetes orchestration, the workload must specify a pod filter to uniquely identify the pod.
PRESWA_POI_PARAMS: Specify the Pre-Si POI parameters as follows:process-name-filter [pod-label-filter], where the process filter is a regular expression string to filter the process names, and the pod-filer (optional for docker) is the Kubernetes label filter to uniquely identify the pod. Some example:mongo app=server
With docker, the process info can be obtained through
/sys/fs/cgroup/systemd/docker/<container-id>/cgroup.procs.With Kubernetes with containerd runtime, the process info can be obtained through
/sys/fs/cgroup/systemd/system.slice/containerd.service/kubepods-besteffort-pod<pod-uid>.slice:cri-containerd:<container-uid>/cgroup.procs, where<pod-uid>andcontainer-uidcan be obtained fromkubectl get pod -A -o json.
Component Design Workload¶
Source: doc/developer-guide/component-design/workload.md
The Workload Service Framework (WSF) supports the following types of workloads:
- Native workloads: The workload runs directly on the SUT (System Under Test) hosts. The workload logic is implemented by Ansible scripts.
- Containerized workloads: The workload runs under either docker or Kubernetes. The workload logic is implemented by a set of Dockerfiles and docker/Kubernetes configuration files.
Native Workloads¶
A native workload consists of the following elements:
- CMakeLists.txt: A manifest to configure how to build and test the workload.
- build.sh: A script for building the workload. Strictly speaking, native workloads do not need a separate build process. They build on the SUTs if required. This is just a place holder script for scanning and listing workload ingredients.
- validate.sh: A script to define how to execute the workload.
- kpi.sh: A script for extracting KPI data out of the workload execution logs.
- cluster-config.yaml.m4: A manifest to describe how to provision the SUTs.
- Native Scripts: The native scripts that implement the workload logic, including Ansible scripts (workload execution logic) and optional Terraform scripts (SUT provisioning logic).
- README: A README to introduce the workload, configure parameters, and provide other related information.
Containerized Workloads¶
A containerized workload can run under docker (single-container) or Kubernetes (single- or multiple-containers). The workload consists of the following elements:
- CMakeLists.txt: A manifest to configure how to build and test the workload.
- build.sh: A script for building the workload docker image(s).
- validate.sh: A script for executing the workload.
- kpi.sh: A script for extracting KPI data out of the workload execution logs.
- compose-config.yaml.m4/j2: An optional manifest to describe how to schedule the containers with docker-compose.
- cluster-config.yaml.m4/j2: A manifest to describe how to provision a machine or a set of machines for running the workload.
- Dockerfiles: A workload may contain one or multiple Dockerfiles.
- kubernetes-config.yaml.m4/j2 or helm charts: An optional manifest to describe how to schedule the containers to a Kubernetes cluster.
- Native Scripts: Optionally, the workload may provide native scripts for customizing the workload execution logic (Ansible scripts) or the SUT provisioning logic (Terraform scripts).
- README: A README to introduce the workload, configure parameters, and provide other related information.