How do I take software commands I usually issue one at a time and create the list of commands to use as lines or a file I call in my cluster batch file?
This really depends on the software. For example, Matlab has instructions. R has some tasty recipes. Stata has do files. Software Carpentry can show you many more including python scripts.
I have my software script. Is there more?
Yes. You need a batch script for the cluster to know what resources your batch script will need to run successfully.
Before you move on to adding the commands to ask for what is needed on the cluster, take a moment to understand the environment you will be submitting that batch file on.
The clusters run a linux operating system, the Slurm (schedule, billing, management), and the lmod (software manager) to make the cluster work.
What do I need to know about navigating when I log in?
- This environment has a shell by default rather than a desktop like a PC. The HPC clusters run
bash
It leverages environmental variables that configure how you use the system. The most used environment variable is thePATH
variable, which lists all the directories in which the shell will search for a command, but there may be many others, depending on the particular software package. If this environment is new to you, visit - The directory structure starts with /
- /home is your local home directory like your documents folder on your pc or laptop and this is where you want to put your batch file
- /scratch is a folder where you can write temporarily needed files
What to I need to know about Slurm?
Slurm is the software that schedules, bills and manages jobs submitted by users. You need to add slurm commands to the top of your batch files to tell the cluster where to run your job and what resources you need your job to have. In return, Slurm will answer back after you submit that job as to when it may run. It's much like making a reservation at a restaurant. If you need a large table or go at a busy time, you will wait longer for your job.
What do I need to know about lmod?
lmod is a software management tool that lets the cluster provide multiple versions of the same software to ensure researchers have consistency of version for a given project. So perhaps you have a project that started in Matlab 5.1 and must be run in that version for the length of the grant. Having the Matlab 5/1 module allows for that and for other researchers to run on the newest version. It's also something you need to put in your batch file to indicate what version of the software ou expect your commands to run on.