AlphaFold 3

Alphafold 3 (AF3) is a protein structure prediction pipeline created by Google DeepMind.

Using AF3

AF3 is available on Great Lakes as a software module. To enable it, load it as you would any other.

module load Bioinformatics alphafold/3.0.0

The module will add the installation directory to your $PATH as well as set the following environment variables:

  • AF3_PARAMS_DIR – directory containing parameters used for running AF3. The model parameters and databases that come with AF3 are enormous so users should use the files provided in the directory pointed to by $AF3_PARAMS_DIR
  • AF3_SCRIPT_DIR – installation directory containing python script used to run AF3

When the module loads, it will activate a virtual environment containing python packages required to run AF3. Users do not have access to modify this virtual environment. Users should load the AF3 module, run AF3 (see instructions below), then unload the AF3 module to exit the virtual environment:

module unload alphafold

For more information on software modules, see the module documentation here.

Submitting an AF3 Job

You will need to use slurm to access GPUs to run AF3. For information on job submission syntax, see the Slurm User Guide.

In order to run AF3, you will need to run $AF3_SCRIPT_DIR/run_alphafold.py within your slurm batch file. Make sure that you load the AF3 module immediately before running this script and unload it immediately after to avoid issues with the virtual environment

module load Bioinformatics alphafold/3.0.0
python $AF3_SCRIPT_DIR/run_alphafold.py --json_path=$AF3_PARAMS_DIR/input/fold_input.json --output_dir=your_output_dir --db_dir=$AF3_PARAMS_DIR/db_dir --model_dir=$AF3_PARAMS_DIR/af3_model_params
module unload alphafold

The only GPUs on Great Lakes that AF3 can run on are the NVIDIA A100 GPUs on the gpu_mig40 partition. Therefore you will need to make sure you submit jobs running AF3 to the gpu_mig40 partition. You will also want to export a few environment variables (see example script below) to optimize memory utilization.

Tying everything together, your job submission script for this run would look like this:

#!/bin/bash
#SBATCH --job-name=af3
#SBATCH --mail-user=[email protected]
#SBATCH --mail-type=BEGIN,END
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=128gb
#SBATCH --time=1:00:00
#SBATCH --account=example
#SBATCH --partition=gpu_mig40
#SBATCH --output=/home/%u/%x-%j.log

export XLA_PYTHON_CLIENT_PREALLOCATE=false
export TF_FORCE_UNIFIED_MEMORY=true
export XLA_CLIENT_MEM_FRACTION=3.2
export XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
module load Bioinformatics alphafold/3.0.0
python $AF3_SCRIPT_DIR/run_alphafold.py --json_path=$AF3_PARAMS_DIR/input/fold_input.json --output_dir=your_output_dir --db_dir=$AF3_PARAMS_DIR/db_dir --model_dir=$AF3_PARAMS_DIR/af3_model_params
module unload alphafold

AF3 Github

https://github.com/google-deepmind/alphafold3/