NGINX AIO Thread with Intel® DLB on 4th Gen Intel® Xeon® Scalable Processors¶
Introduction¶
As one of the most popular CDN framework, NGINX is widely used as CDN cache server. This document is for users who are already familar with NGINX and Intel DLB technology. To improve the performance, NGINX introduces the AIO thread module to solve the blocking issue. In this workload, Intel® DLB hardware queue is used to replace the software queue in the AIO thread module to improve the performance.

Test Setup¶
The workload is based on 4th Gen Intel® Xeon® Scalable Processor family. Below is a reference system and software configuration for the workload evaluation. Some of the configurations can be changed according to your own system configuration.

Reference System Configuration¶
| Name | Version |
|---|---|
| BIOS Version | EGSDCRB1.86B.0091.D05.2210161326 |
| OS CentOS Stream 8 | Kernel 5.15.0-spr.bkc.pc.12.7.15.x86_64 |
| Microcode | 0xab000110 |
| IRQ Balance | Disabled |
| QDF/Stepping | Q23U |
| Base Frequency | 1.9GHz |
| Maximum Frequency | 3.5GHz |
| All-core Maximum Frequency | 2.9GHz |
| CPU(s) | 120 |
| Thread(s) per Core | 2 |
| Core(s) per Socket | 60 |
| Socket(s) | 2 (both socket is populated ,tests run 1 socket) |
| NUMA Node(s) | 1 |
| Prefetchers | L2 HW, L2 Adj., DCU HW, DCU IP |
| Turbo | Enabled |
| PPIN(s) | 2921684c17583b38 |
| Power & Perf Policy | Performance |
| TDP | 350 watts |
| Frequency Driver | intel_pstate |
| Frequency Governer | Performance |
| Frequency (MHz) | 2880 |
| Max C-State | 9 |
| Installed Memory | 256GB (8x32GB 4800 MT/s [4400 MT/s]) on each socket |
| Huge Pages Size | 2048 kB |
| Transparent Huge Pages | Always |
| Automatic NUMA Balancing | Disabled |
| NIC Summary | Ethernet Controller I225-LM Ethernet Controller E810-C for QSFP, Ethernet Controller E810-C for QSFP |
| Drive Summary | INTEL SSDPF2KX038TZ |
Reference Software Configuration¶
| Name | Version |
|---|---|
| NGINX version | 1.16.1 |
| Compiler | gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC) |
| wrk ( client side) | wrk 4.0.0 [epoll] Copyright (C) 2012 Will Glozer |
| DLB Driver | RELEASE_VER_7.8.0 |
| NIC FW ICE driver version | NETWORK_DRIVERS_V27.6.1 |
| Kernel Parameters | intel_iommu=on,sm_on iommu=on dma_debug=off |
Test setup¶
Set the IRQ and CPU performance mode for origin server, cacher sever and client server. Run the script in the configs folder.
pkill -9 irqbalance
./set_performance_governor.sh
./set_irq_affinity_cpulist.sh 0-119 ens1f1 #change the cpu cores and network interface based on your environment.
Content Server Setup¶
- Download the NGINX 1.16.1 version source code
- Install the dependency library, take Ubuntu as example,other OS please install corresponding tools and library.
- Build the NGINX with the command below
## set your prefix path mkdir -p $PREFIX/var/www/tmp ./configure \ --prefix=$PREFIX \ --sbin-path=$PREFIX/sbin/nginx \ --modules-path=$PREFIX/lib64/nginx/modules \ --conf-path=$PREFIX/etc/nginx/nginx.conf \ --error-log-path=$PREFIX/var/www/log/error.log \ --pid-path=$PREFIX/var/www/nginx.pid \ --lock-path=$PREFIX/var/www/nginx.lock \ --http-log-path=$PREFIX/var/www/log/access.log \ --http-client-body-temp-path=$PREFIX/var/www/tmp/client_body \ --http-proxy-temp-path=$PREFIX/var/www/tmp/proxy \ --http-fastcgi-temp-path=$PREFIX/var/www/tmp/fastcgi \ --http-uwsgi-temp-path=$PREFIX/var/www/tmp/uwsgi \ --http-scgi-temp-path=$PREFIX/var/www/tmp/scgi \ --user=nobody --group=nobody \ --with-select_module --with-poll_module --with-threads \ --with-pcre make -j`nproc` make install - Copy the nginx.conf in the origin folder to replace the $PREFIX/etc/nginx/nginx.conf and change the server IP address in the configuration according to your environment.
- Copy the bin folder in the origin folder to $PREFIX folder.
- Start the content generator and origin server
cd $PREFIX ##start 10K, 100K, 1M object generator ./bin/http_obj_gen_10k.py --host localhost --port 8888 --obj-dist fixed > /dev/null 2> /dev/null ./bin/http_obj_gen_100k.py --host localhost --port 8888 --obj-dist fixed > /dev/null 2> /dev/null ./bin/http_obj_gen.py --host localhost --port 8888 --obj-dist fixed > /dev/null 2> /dev/null ##start origin server ${PWD}/sbin/nginx -c ${PWD}/etc/nginx/nginx.conf
Cache Server Setup¶
Note: For none dlb case, please ignore the the dlb related opereration in the below steps 1. mount the NVME disk
##modify the nvme disk name according to your environment
mount -o rw,noatime,seclabel,discard /dev/nvme1n1 /cdn/cache1
mount -o rw,noatime,seclabel,discard /dev/nvme2n1 /cdn/cache2
mount -o rw,noatime,seclabel,discard /dev/nvme3n1 /cdn/cache3
mount -o rw,noatime,seclabel,discard /dev/nvme4n1 /cdn/cache4
mount -o rw,noatime,seclabel,discard /dev/nvme5n1 /cdn/cache5
##set the DLB library path
export LIBDLB_LOCAL_PATH=<libdlb path>
##set your prefix path
mkdir -p $PREFIX/var/www/tmp
./configure \
--prefix=$PREFIX \
--sbin-path=$PREFIX/sbin/nginx \
--modules-path=$PREFIX/lib64/nginx/modules \
--conf-path=$PREFIX/etc/nginx/nginx.conf \
--error-log-path=$PREFIX/var/www/log/error.log \
--pid-path=$PREFIX/var/www/nginx.pid \
--lock-path=$PREFIX/var/www/nginx.lock \
--http-log-path=$PREFIX/var/www/log/access.log \
--http-client-body-temp-path=$PREFIX/var/www/tmp/client_body \
--http-proxy-temp-path=$PREFIX/var/www/tmp/proxy \
--http-fastcgi-temp-path=$PREFIX/var/www/tmp/fastcgi \
--http-uwsgi-temp-path=$PREFIX/var/www/tmp/uwsgi \
--http-scgi-temp-path=$PREFIX/var/www/tmp/scgi \
--user=nobody --group=nobody \
--with-select_module \
--with-poll_module \
--with-threads \
--with-file-aio \
--with-pcre
make -j`nproc`
make install
***Note: Before each interation test, please run the clean page cache script clr_drop_cache.sh in the cache-server folder***
Client Server Setup¶
- Download the wrk 4.0.0 version and compile it refer to README.
- Copy the files in wrk-client folder into your wrk folder.
- Modify the IP address in the wrk1mb.sh, wrk100kb.sh and wrk10kb.sh, then run the run_mix.sh.
- Finally, you can see the latency,RPS and bandwidth information in the log file such as below:
connection delay time is 1628664 ns Running 5m test @ http://192.168.100.1:8082 50 threads and 30700 connections(ramping 30700 per sec) Thread Stats Avg Stdev Max +/- Stdev Latency 3.52s 544.01ms 9.18s 72.66% Connect 512.39ms 324.55ms 1.12s 57.46% Delay 558.98ms 214.80ms 2.00s 68.72% Req/Sec 157.87 80.47 1.15k 70.67% Latency Distribution 50.00% 3.51s 75.00% 3.84s 90.00% 4.19s 99.00% 4.96s 99.90% 5.70s 99.99% 7.13s Connect Distribution 50.00% 508.61ms 75.00% 792.06ms 90.00% 965.97ms 99.00% 1.09s 99.90% 1.11s 99.99% 1.11s Delay Distribution 50.00% 541.15ms 75.00% 692.84ms 90.00% 842.74ms 99.00% 1.13s 99.90% 1.41s 99.99% 1.73s 2324796 requests in 5.00m, 2.23TB read Socket errors: connect 0, read 0, write 0, pconn 33, nodata 0, timeout 0, connect_timeout 0, delay_timeout 265 Requests/sec: 7746.40 Transfer/sec: 7.60GB
Copyright (c) Intel Corporation.
Neither the name of Intel Corporation nor the names of its suppliers may be used to endorse or promote products derived from this software without specific prior written permission.
No reverse engineering, decompilation, or disassembly of this software is permitted.
DISCLAIMER. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.