Full Program »
Scalability Performance Analysis and Tuning on OpenFOAM in HPC Cluster Environment
To achieve good scalability performance on OpenFOAM and other HPC scientific applications across many compute systems in an HPC cluster, it involves understanding of the workload through profile analysis, and comparison of different hardware which helps pinpointing the bottlenecks on the components in the HPC cluster.
One of the bottlenecks in OpenFOAM and also HPC application scalability performance is the usage of MPI collective operations. Typically MPI collective operations can cause user processes to wait for incoming data to arrive on the compute nodes and consume CPU clock cycles which limits the application’s ability to scale. As the number of nodes scale, the time for the network to process such MPI collective operations would increase also substantially. By reducing the time needed to perform such collectives operations, it would free the CPU from performing these operations, thus allowing more CPU cycles to work on the application and greatly improves application scalability.
In this study, the HPC Advisory Council investigates ways to improve scalability performance on HPC cluster on OpenFOAM solvers by using different collective offload technology available on network that would allow OpenFOAM to scale. We investigate on the OpenFOAM solvers by running MPI profile analysis to understand its performance and scaling capabilities on the high-speed, low-latency InfiniBand networks. We demonstrate via various methods of profiling and analysis to determine the bottleneck, and the effectiveness of tuning to improve on the application performance.
We also present the optimization techniques and networking profiling results to further understand the dependencies of some of the OpenFOAM solvers on the network and the MPI library and options for optimizations using MPI offloads to particular OpenFOAM solvers.