Distributed PyTorch on Great Lakes

Distributed Data Parallel (DDP) Guide for Great Lakes

A step-by-step guide to setting up multi-node and multi-GPU training using PyTorch’s DistributedDataParallel (DDP) framework. The focus will be on configuring DDP to work efficiently on Great Lakes, which is a SLURM cluster. Ensuring scalability and efficient resource utilization, along with debugging techniques and GPU monitoring tips to diagnose and optimize performance. The implementation will include reusable code to streamline distributed training across multiple nodes.

PyTorch Build History

The following table lists the verified PyTorch versions that have been successfully built and tested on the Great Lakes HPC cluster. These versions were used in this software guide to ensure compatibility and optimal performance. Each entry includes details on the build time, compiler version, CUDA, cuSPARSELt, cuDNN, NCCL versions, and MPI support. Build Instructions & configuration details along with specific compiler flags, dependencies, and configuration settings, can be found in the installation section. These details ensure reproducibility and compatibility with the Great Lakes environment.

PyTorch Ver Build Time Compiler CUDA cuSPARSELt cuDSS cuDNN NCCL Support GLOO Support MPI Support CXX11 ABI
2.6.0 1.1 h GCC/10.3.0 12.6.3 0.5.2 0.4.0 9.6.0 Yes Yes Yes Yes
2.4.0 1.2 h GCC/10.3.0 12.6.3 0.5.2 0.4.0 9.6.0 Yes Yes No No
2.4.0 54 m GCC/10.3.0 12.3.0 0.5.2 0.4.0 9.6.0 Yes Yes No No
2.4.0 55 m GCC/10.3.0 11.8.0 0.5.2 0.4.0 9.6.0 Yes Yes No No

"PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation."