Configuration

Hardware

Node TypeStandardLarge MemoryGPU (TitanV)GPU (V100)GPU (RTX6000 Pro Blackwell)
Number of Nodes955311
Processors2x 2.5 GHz Intel Haswell (Xeon E5-2680v3)4x 2.2 GHz Intel Westmere (Xeon E7-4850)2x 2.10GHz Intel Broadwell (Xeon E5-2620V4)2x 2.5 GHz Intel Cascade Lake (Xeon Gold 6248)2x AMD EPYC ( 9575F )
Cores per Node24561640128
RAM128 GB (122.8 GB requestable)1.5 TB (1,542 GB requestable)125 GB (122.88 GB requestable)191 GB (184.3 GB requestable)1.5 TB
GPUN/AN/A4x Nvidia TitanV3x Nvidia Tesla V1008x RTX6000 Pro Blackwell

GPUs

Armis2 has 1 nodes with 8 RTX6000 Pro Blackwell GPUs, 1 node with 3 NVIDIA Tesla V100 GPUs, and 3 nodes with 4 NVIDIA TitanV 80GB GPUs.

GPU ModelNVIDIA Tesla V100NVIDIA TitanVNVIDIA RTX6000 Pro Blackwell
GPU ArchitectureVoltaVoltaBlackwell
Peak double precision floating point perf.7.066 TFLOPS7.450 TFLOPSNA
Peak single precision floating point perf (FP32)14.13 TFLOPS14.90 TFLOPS126.0 TFLOPS
Memory bandwidth (ECC off)897.0 GB/s651.3 GB/s1597 GB/s
Memory size (GDDR5)16 GB HBM212 GB HBM296 GB GDDR7
CUDA cores5120512024064
RT coresN/AN/A188
Tensor cores640640752

 

Networking

The compute nodes are all interconnected with InfiniBand networking. The InfiniBand fabric is based on the Mellanox enhanced data rate (EDR) platform in the Voltaire GridDirector 4700, which provides 100 Gbps of bandwidth and sub-5μs latency per host. Five Grid Director 4700 switches are connected to each other with 240 Gbps of bandwidth each.

In addition to the InfiniBand networking, there is a gigabit Ethernet network that also connects all of the nodes. This is used for node management and NFS file system access.

To discuss high-speed connections to the Armis2 cluster, please contact [email protected].

Storage

The high-speed home and scratch file systems are provided by Turbo Research Storage. Turbo is a high-capacity, fast, reliable, and secure data storage service that allows investigators across U-M to connect their data to the computing resources necessary for their research, including our Armis2 HPC cluster. Turbo supports storage of sensitive data.

Operation

Computing jobs on Armis2 are managed completely through the Slurm workload manager. See the Armis2 User Guide for directions on how to submit and manage jobs. For advanced information on how to use Slurm on Armis2, see the Slurm User Guide for Armis2.

Software

Operating Software

The Armis2 cluster runs Redhat 8. We update the operating system on Armis2 as Red Hat releases new versions and our library of third-party applications offers support. Due to the need to support several types of drivers (AFS file system drivers, InfiniBand network drivers and NVIDIA GPU drivers) and dozens of third party applications, we are cautious in upgrading and can lag Red Hat releases by months.

Compilers, Parallel, and Scientific Libraries

Armis2 supports the Gnu Compiler Collection, the Intel Compilers, and the PGI Compilers for C and Fortran. The Armis2 cluster’s parallel library is OpenMPI, and the default versions are 1.10.7 (i686) and 3.1.2 (x86_64), and there are limited earlier versions available. Armis2 provides the Intel Math Kernel Library (MKL) set of high-performance mathematical libraries. Other common scientific libraries are compiled from source and include HDF5, NetCDF, FFTW3, Boost, and others.

Software installed on Armis2 must be compatible with these compilers and libraries.

Application Software

Armis2 supports a wide range of application software. We license common engineering simulation software (e.g. Ansys, Abaqus, VASP). We also have software for statistics, mathematics, debugging and profiling, etc. Please contact us if you wish to inquire about the current availability of a particular application.