Skip to main content
Back to top
Ctrl
+
K
Checkout,
Frequently Asked Questions
!
Quickstart
System Architecture
Software ecosystem
Policy
Data Management
Training
Blogs
Contact Us
Quickstart
System Architecture
Software ecosystem
Policy
Data Management
Training
Blogs
Contact Us
Section Navigation
Software environment
Environment modules
Basic commands
Shaheen III
Ibex
Compiler suites
CPU/GPU and Parallel Libraries
Ibex Machine Learning Module
Data Science on Ibex
Self-Managed Python packages
Installing packages using
pip
Conda package manager
Using
conda
on Shaheen III
Using
conda
on Ibex
Self-Managed R packages
Containers
Using Containers in Modern Software Development
Containerization platforms on KSL Systems
Creating Singularity Containers
Using Image Registries with Singularity
Running Jobs with Singularity
Customizing Singularity Containers
Best Practices for Singularity in HPC
Container runtime examples
Using Bind Mounts and ACLs in Singularity
Using conda from singularity container
Containers by KRCCL
MySQL container using Singularity on Ibex
MongoDB on compute nodes of Shaheen
MongoDB on compute nodes of ibex
Singularity MPI containers on Shaheen and Ibex
Quick Start for Using Singularity on Shaheen III
Podman
Applications catalogue
Shaheen III
Ibex
Job Scheduling
SLURM
Common commands
SLURM jobscript explained
Shaheen III example jobscripts
CPU jobs
GPU jobs
Pre/Post processing jobs
Ibex example jobscripts
CPU jobs
GPU jobs
Complex workflows with SLURM
Job Arrays
Job Dependency
Interactive jobs with SLURM
Jupyter Notebooks
R Studio
VScode
Profiling and Debugging tools
Using nvprof
Nsight-systems
Debugging
gdb4hpc
valgrind4hpc
Profiling
Cray Performance Measurement and Analysis Tools (CrayPat)
AMD μProf
Tuning and Analysis Utilities
Science Platforms
Data Science platform
Quickstart guide
Machine Learning module on Ibex
Example Jobscripts for common Data Science workloads
Distributed ML/DL on KSL systems
Accelerating Machine Learning with Scikit Learn
PyTorch Distributed Data Parallel (DDP)
Microsoft DeepSpeed
Accelerate API by Hugginface
Cray Machine Learning Development Environment
Pytorch Lightning
Horovod for Distributed Data Parallel training
Distributed Deep Learning with Tensorflow 2.x
MATLAB Deep Learning Toolbox
Ray Tune for Hyperparameter Optimization experiments
Big data processing on KSL platform
Dask
NVIDIA RAPIDS
mpi4py
Profiling and performance tuning
NVIDIA NSight Tools for profiling and tuning performance on GPUs
Using NVDashboard for monitoring GPU metrics on Ibex
Computational Fluid Dynamics platform
Quick Start Guide for CFD
Weather and Climate platform
Using CDO on Shaheen III
Using NCO on Shaheen III
Using wgrib2 on Shaheen III
Bioscience platform
Quickstart guide for Bio user
Chemistry, Physics & Materials Science
Quick Start Guide for Chemistry, Physics & Materials Science
Visualization
Visualization Best Practices
ParaView @ KAUST
VisIt @ KAUST
In Situ @ KAUST
Software ecosystem
Distributed ML/DL on KSL systems
Distributed...
Distributed Deep Learning with Tensorflow 2.x
#
Note
Page under construction