RELION on Great Lakes
- Introduction & Documentation
- Connecting to Great Lakes (GUI)
- Compute & Running Tabs
- Basic Parameters / Partitions
- Running Modes & MPI
- Using GPUs
- Motion Correction & CTF
- SBATCH Template
- Known Problems
Introduction & Documentation
RELION (REgularised LIkelihood OptimisatioN — pronounce rely-on) is a program implementing an empirical Bayesian approach for cryo-EM refinement.
Useful links from the original document:
Connecting to Great Lakes (GUI)
Two supported ways to open the RELION GUI on Great Lakes:
Open On Demand (OOD) — preferred
- Open
greatlakes.arc-ts.umich.eduin a browser. - Go to Interactive Apps > Basic Desktop.
- Fill form: partition standard, 2 hours, 4 cores, 8 GB memory.
- Launch and open a terminal on the allocated desktop.
- Load RELION module:
module load relion-***/x.x.x - Change to your project directory and run:
relion
SSH with X-forwarding
ssh -X greatlakes.arc-ts.umich.edumodule load relion-***/x.x.xcd /path/to/your/relion/project/directoryrelion
Compute and Running tabs (overview)
Compute tab options control how RELION reads data and uses memory / I/O:
- Use parallel disc I/O? — Yes: all MPI followers read images; No: only leader reads and distributes (--no_parallel_disc_io).
- Number of pooled particles — affects memory usage.
- Pre-read all particles into RAM? — adds
--preread_images; can improve performance if dataset small. - Copy particles to scratch directory — e.g.
/tmpssd/$SLURM_JOB_ID. - Combine iterations through disk? — usually set to No to avoid extra I/O.
- Use GPU acceleration? — Yes for relion-gpu; select appropriate SBATCH flags to allocate GPUs.
Running tab options map to SBATCH arguments:
- Number of MPI procs →
--ntasks - Number of threads →
--cpus-per-task - Submit to queue? — set to Yes
- Queue name →
--partition - Queue submit command — usually
sbatch - Account →
--account - Walltime →
--time - Memory Per Thread →
--mem-per-cpu
Note on --preread_images
where:
- N = Number of particles
- boxsize = Box size (in pixels) per particle
- 4 = Number of bytes per pixel (float32)
- 230 = Number of bytes in a gibibyte (GiB)
N = 100,000 particles, boxsize = 350:Basic Parameters to Get Started
The original document gives example configurations by job type. Below are short summaries for typical settings.
Class2D & Class3D on GPUs (GPU partition)
- Compute: Use parallel disc I/O? Yes
- Running: Number of MPI procs: 3; Number of threads: 4; Partition: gpu; Walltime: 8:00:00; Memory per thread: 10g
- SBATCH extras:
--gpus=v100:2,--nodes=1
SPGPU Partition
Similar to GPU partition but examples use A40 GPUs and spgpu partition.
2D Classification & 3D Class (CPU)
- Partition: standard
- Number of MPI procs: 32; threads: 8; Walltime: 8:00:00; Memory per thread: 5g
Running Modes
RELION supports both multi-threading and distributed MPI tasks. Some job types only run single-task; others support MPI and multi-threading. Examples:
- Import: single task only.
- CTF estimation (ctffind): multiple distributed tasks, single-threaded.
- MotionCor2: distributed tasks with multi-threading.
When MPI procs > 1 you must run MPI-enabled executables, typically launched via srun --mpi=pmix_v4.
Using GPUs
To enable GPUs: set Use GPU acceleration? to Yes and allocate GPUs in SBATCH directives, e.g. --gpus=v100:4. Leave "Which GPUs to use" blank in the GUI unless you need specific devices.
Match the partition to GPU availability (gpu, spgpu, gpu_mig40) and set threads per GPU to 4. Increasing threads too much may crash jobs.
Motion Correction & CTF
RELION's CPU motion correction
CPU-only motion correction can be memory intensive (~8g per CPU). The document includes an equation to estimate memory required based on image size and frames.
MotionCor2 (external GPU tool)
- Set RELION motion implementation to No and provide correct MotionCor2 executable path (example:
/sw/pkgs/lsi/motioncor2/1.4.7/bin/motioncor2).
CTFFIND-4
CTFFIND-4 is CPU-only; point the CTFFIND-4.1 executable to the correct path (example: /sw/pkgs/lsi/ctffind/4.1.14/bin/ctffind).
SBATCH submission template
#!/usr/bin/env bash
#SBATCH --job-name=XXXnameXXXrun
#SBATCH --ntasks=XXXmpinodesXXX
#SBATCH --partition=XXXqueueXXX
#SBATCH --cpus-per-task=XXXthreadsXXX
#SBATCH --error=XXXerrfileXXX
#SBATCH --output=XXXoutfileXXX
#SBATCH --open-mode=append
#SBATCH --account=XXXextra1XXX
#SBATCH --time=XXXextra2XXX
#SBATCH --mem-per-cpu=XXXextra3XXX
#SBATCH XXXextra4XXX
#SBATCH XXXextra5XXX
#SBATCH XXXextra6XXX
#SBATCH XXXextra7XXX
module --redirect list
cmd_exec=$(cat << EOF
XXXcommandXXX
EOF
)
# Source additional MPI utilities if available
mpi_utils="/sw/pkgs/lsi/relion-gpu/utils/add_extra_mpi_task.sh"
if [[ -f "$mpi_utils" ]]; then
# shellcheck source=/dev/null
source "$mpi_utils"
fi
# RELION will have two commands on separate lines in certain job types.
# This deals with that case.
lines=$(wc -l <<< "$cmd_exec")
counter=0
while IFS=$'\n' read -r cmd; do
counter=$((counter + 1))
if [ "$lines" -gt 1 ]; then
echo ""
echo "Command $counter:"
echo "srun --mem-per-cpu=XXXextra3XXX --mpi=pmix_v4 ${cmd}"
echo ""
srun --mem-per-cpu=XXXextra3XXX --mpi=pmix_v4 ${cmd}
echo ""
else
echo ""
echo "Command:"
echo "srun --mem-per-cpu=XXXextra3XXX --mpi=pmix_v4 ${cmd}"
echo ""
srun --mem-per-cpu=XXXextra3XXX --mpi=pmix_v4 ${cmd}
echo ""
fi
done <<< "$cmd_exec"
The file add_extra_mpi_task.sh is below:
#!/usr/bin/env bash
# Don't bother unless nodes have been allocated
if [[ -z $SLURM_JOB_NODELIST ]]; then
if [[ -n $SLURM_HOSTFILE ]]; then
unset SLURM_HOSTFILE
fi
return
fi
# Don't bother unless nodes have GPUs
if [[ -z $SLURM_JOB_GPUS ]]; then
if [[ -n $SLURM_HOSTFILE ]]; then
unset SLURM_HOSTFILE
fi
return
fi
# Don't bother unless multiple tasks have been allocated, and the number is odd
if [[ -z $SLURM_NTASKS_PER_NODE ]]; then
if [[ -n $SLURM_HOSTFILE ]]; then
unset SLURM_HOSTFILE
fi
return
# Check if SLURM_NTASKS_PER_NODE is less than 2
elif [[ ${SLURM_NTASKS_PER_NODE} -lt 2 ]]; then
if [[ -n $SLURM_HOSTFILE ]]; then
unset SLURM_HOSTFILE
fi
return
# Check if SLURM_NTASKS_PER_NODE is an even number
elif [[ $((SLURM_NTASKS_PER_NODE%2)) == 0 ]]; then
if [[ -n $SLURM_HOSTFILE ]]; then
unset SLURM_HOSTFILE
fi
return
fi
# Don't bother unless there is more than one node
mapfile -t array < <(scontrol show hostname $SLURM_JOB_NODELIST)
file=$(mktemp --suffix .SLURM_JOB_NODELIST)
if [[ ${#array[@]} -eq 1 ]]; then
for (( j = 0; j < $((SLURM_NTASKS_PER_NODE)); j++ )); do
echo ${array[0]} >> $file
done
else
echo ${array[0]} > $file
for (( i = 0; i < ${SLURM_JOB_NUM_NODES}; i++ )); do
for (( j = 0 ; j < $((SLURM_NTASKS_PER_NODE-1)); j++ )); do
echo ${array[${i}]} >> $file
done
done
fi
# All conditions met, set hostfile and distribution, unset ntasks per node
export SLURM_HOSTFILE=$file
export SLURM_DISTRIBUTION=arbitrary
unset SLURM_NTASKS_PER_NODE
Known Problems
Zombification
Sometimes an MPI rank exits and the leader waits indefinitely. Cancel the job and restart or continue from the last checkpoint.
Not enough GPU memory
Reduce number of classes, pool size, or box size — or run on CPUs.
Overburdening GPUs
Assigning too many threads or tasks per GPU will cause CUDA allocation errors. Match tasks to available GPUs.
Empty or Corrupted Micrographs
If a micrograph has 0 bytes, RELION may fail. Check file sizes (ls -l *.mrc) or use relion_image_handler --stats --i to inspect images.
