CPU/GPU and Parallel Libraries#
Introduction#
Ibex provides a wide range of specialized libraries that enable developers to accelerate their applications, leverage hardware capabilities, and achieve optimal performance.
CPU/GPU Libraries#
HPC clusters often offer a variety of libraries optimized for specific hardware architectures, such as CPUs and GPUs. Some of the common CPU/GPU libraries include:
BLAS (Basic Linear Algebra Subprograms): A library for linear algebra operations like matrix multiplications and vector operations.
cuBLAS: A GPU-accelerated version of BLAS, designed for NVIDIA GPUs and CUDA programming.
FFTW (Fastest Fourier Transform in the West): A library for computing discrete Fourier transforms efficiently.
cuFFT: The GPU-accelerated counterpart of FFTW, optimized for CUDA programming.
LAPACK (Linear Algebra PACKage): A library for solving linear systems, eigenvalue problems, and singular value decomposition.
Parallel Libraries#
Parallel libraries enable developers to efficiently distribute tasks across multiple processing units, such as CPU cores or GPUs, to achieve improved performance. Some widely used parallel libraries include:
OpenMP (Open Multi-Processing): A popular API for shared-memory parallel programming, commonly used to parallelize loops and sections of code.
MPI (Message Passing Interface): A standard for distributed-memory parallel programming, enabling communication and synchronization between processes.
CUDA: A parallel computing platform and programming model for NVIDIA GPUs, allowing developers to leverage the power of parallel processing.
OpenACC: A directive-based approach to parallel programming, designed for GPU acceleration of CPU-bound applications.
Loading Libraries as Modules#
To use these libraries, you can load them as modules using the module system:
module load library-name
For example, to load the CUDA library, use:
module load cuda
Parallel Programming with Libraries#
On Ibex there various version of OpenMPI, MPICH, and CUDA are installed for users to compile with.
Loading the modules for these libraries updates the environments so that build tools such as cmake
or autoconf
can discover their existence.
If the discovery failes, preset environment variables can be used to point to the installation paths of these libraries. For example, for an MPI code to compiled:
module load openmpi
module load gcc
mpicc -c my_app.c -I${OPENMPI_HOME}/include
mpicc -o my_app my_app.o -L${OPENMPI_HOME}/lib -lmpi
The above code compiles and links the openmpi
library and make availabe the compiler headers needed by the soure code during the process.
Once compiled, the executable my_app
can be launched in a SLURM batch script either with mpirun
or srun
, the later is recomended.
For source codes requiring CUDA Toolkit for compiling NVIDIA GPU enabled binaries, load the cuda
module and use the nvcc
compiler to build the device code.
It is recomended to allocate a GPU node and compile your source either as interactive session or in batch jobscript.
srun --gpus=1 --time==0:10:0 --pty bash
module load cuda
module load gcc
nvcc -c device_code.cu -I${CUDATOOLKIT_HOME}/include
gcc -o my_gpu_app device_code.o -L${CUDATOOLKIT_HOME}/lib -lcudart -lcublas
When building your CUDA code on login node and the intent is the it runs on multiple NVIDIA GPU microarchitectures, the following nvcc
flags will help:
module load cuda
nvcc -c device_code.cu -I${CUDATOOLKIT_HOME}/include \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_75,code=compute_75 \
-gencode=arch=compute_70,code=compute_70 \
-gencode=arch=compute_61,code=compute_61
gcc -o my_gpu_app device_code.o -L${CUDATOOLKIT_HOME}/lib -lcudart -lcublas