HPC Clusters - Command Line for Beginners / ITS Documentation

Welcome to using the cluster from the command line where you can do different things than using Open On Demand.

If you are using data types that may need special considerations be sure to visit the Safe Computing Data Guide

Why use the command line?

One of the more typical reasons to use the command line access to the cluster is to run jobs without having to watch or interact with them in real time. This is particularly helpful for jobs that run many hours or to use the clusters at off peak times so your job(s) get scheduled sooner. If you typically use software with a browser like feel (matlab, stata, etc) the most similar environment is Open on demand. Once you have a set of commands you are running in your software against a data set, they can be put in a file and fed through the cluster to run while you walk away.

Is the command line hard to learn?

It is different more than hard. It requires testing and preparation.

First test your code for the software and firm up the commands needed to be run and how to call the data to run them on in the linux environment.
Learn how to set up the environment for you to log in and to use the software you need.
Get the files you need to use to the appropriate location on the cluster.
Second, create a batch script to tell the cluster:
- What Slurm Account to run the job on
- What type of resources you need for the job to complete (often referred to as the geometry) which can be the hardest part of getting your job to start running. You can find out about these resources in the user guides (Great Lakes, Armis, Lighthouse)
  - nodes
  - cpu needed
  - memory per cpu
  - partition
  - time you want or expect it to rub
- Where your files are
  - Data input
- Where to send your output and notifications
  - Data output
  - Errors
  - Emails of job status
- How long you expect your job to run (Note: You cannot run longer than the wall time on the cluster you are using. Look at it's configuration to find the max wall time)
Finally monitor your job regularly from when it is queued to start until it ends or needs to be terminated due to errors.