Policies / ITS Documentation

Partition Policies

Slurm partitions represent collections of nodes for a computational purpose, and are equivalent to Torque queues. For more Great Lakes hardware specifications, see the Configuration page.

Partitions:

debug: The goal of debug is to allow users to run jobs quickly for debugging purposes.
- Maximum jobs per user: 1
- Maximum walltime: 4 hours
- Maximum processors per job: 8
- Maximum memory per job: 40 GB
- Higher scheduling priority
standard: Standard compute nodes used for most work.
- Max walltime: 14 days
- Default partition if none specified
standard-oc: These nodes will be configured with additional software that can only be used on-campus, but are otherwise identical to standard compute nodes.
- Max walltime: 14 days
gpu: Allows use of NVIDIA Tesla V100 GPUs.
- Max walltime: 14 days
spgpu: Allows use of NVIDIA A40 GPUs.
- Max walltime: 14 days
largemem: Allows use of a compute node with 1.5 TB of RAM.
- Max walltime: 14 days

Account/Association Limits

In order to facilitate fairness between accounts, we have set resource limits on each Great Lakes root account which are described in the Great Lakes User Guide.

Limits can be set on a Slurm association or on an Slurm account. This allows a PI to limit individual users or the collective set of users in an account as the PI sees fit. The following values can be used to limit either an account or user association, unless noted otherwise below:

Current Great Lakes partition limits:

MaxJobs
- Maximum number of jobs allowed to run at one time
- Account example: testaccount can have 10 simultaneously running jobs (testuser1 has 8 running jobs and testuser2 has 2 running jobs for a total of 10 running jobs)
- Association example: testuser can have 2 simultaneously running jobs
MaxWall
- Maximum duration of a job
- Account example: all users on testaccount can run jobs for up to 3 days
- Association example: testuser’s jobs can run up to 3 days
MaxTRES (CPU, Memory, GPU or billing units)
- Maximum number of TRES the running jobs can simultaneously use
- NOTE: CPU, Memory, and GPU can also be limited on a user’s individual job
- Account example: testaccount’s running jobs can collectively use up to 5 GPUs (testuser1’s jobs are using 3 GPUs and testuser2’s jobs are using 2 GPUs for a total of 5 GPUs)
- Association example: testuser’s running jobs can collectively use up to 10 cores
- Job example: testuser can run a single job using up to 10 cores
GrpTRESMins (billing units)
- The total number of TRES minutes that can possibly be used by past, present and future jobs. This is primarily used for setting spending limits
- Account example: all users on testaccount share a spending limit of $1000
- Association example: testuser has a spending limit of $1000
GrpTRESRunMins
- The total number of TRES minutes used by all running jobs. This takes into consideration the time limit of running jobs. If the limit is reached no new jobs are started until other jobs finish.
- Account example: all users on testaccount share a pool of 1000 CPU minutes for running jobs (users have 10 serial jobs each with 100 minutes remaining to completion)
- Association example: testuser can have up to 100 CPU minutes of running jobs (1 job with 100 CPU minutes remaining, 2 with 50 minutes remaining, etc.)

Periodic Spending Limits

The PI has the ability to set a monthly or yearly (fiscal year) spending limit on a Slurm account. Spending limits will be updated at the beginning of each month. As an example, if the testaccount account has a monthly spending limit of $1000 and this is used up on January 22nd, jobs will be unable to run until February 1st when the limit will reset with another $1000 to spend.

Please contact ARC if you would like to implement any of these limits.

Billing Policies

A job will be charged based on the percentage of the node it uses (based on CPU, memory, and GPU usage). This is done by using the maximum of the weighted charges for CPU, memory (and GPU if appropriate). If you use 1 core and all the memory of the machine or all the cores and minimal memory, you’ll be charged for the entire machine.

Multiple Shortcodes, up to four, can be used. If more than one Shortcode is used, the amount charged to each can be split by percentage.

Great Lakes accounts can be initiated by sending an email to [email protected].

Refund Policy

ARC operates our HPC clusters to the best of our abilities, but there can be events, both within and outside of our control, which may cause interruptions to your jobs. You are responsible for due diligence around your use of the ARC HPC resources and taking measures to maximize your research. These actions may include:

Backing up data to permanent storage locations
Checkpointing your code to minimize impacts from job interruptions
Error checking in your scripts
Understanding the operation of the system and the user guide for the HPC cluster, including per job charges which may be greater than expected

Any refunds (if any) are at the discretion of ARC and will only only be enacted during system-wide preventable issues. This does not include hardware failure, power failures, job failures, or similar issues.

Scratch Purge Evasion Policy

Effective Date: June 2025

This policy outlines the procedures for identifying and addressing individuals who circumvent the monthly ARC scratch purge for Great Lakes.

Scratch Usage Policy Overview:

The /scratch storage on all HPC clusters is intended as fast temporary storage for active research data being used by cluster jobs, not for medium- or long-term storage. /scratch resources are limited and shared among all cluster users, and attempts to bypass this policy negatively impact the broader research community. Data that has not been used for 60 days is subject to deletion. No data stored on /scratch should be considered permanent, and critical data should be stored elsewhere for safekeeping. Attempts to evade this policy by modifying file metadata, including artificially updating file timestamps to make inactive data appear recently used, are inconsistent with acceptable /scratch usage and may be treated as a violation of this policy.

Prohibited Actions:

- Users are prohibited from implementing any code or tool that alters files' access or modification times outside normal usage to prevent the removal of inactive data during the regular /scratch purge process.

Enforcement Process:

First Offense: Users found to be violating the policy for the first time will receive an emailed notification from ARC staff advising them of the detected evasion and warning against future violations. A list of files will also be provided via email. ARC or Unit IT staff will be available to discuss alternative tools and processes that align with responsible scratch usage.

- Second Offense: Repeat offenders will have access to the cluster suspended until they meet with the ARC director or an appointed staff member. During this meeting, the user must demonstrate understanding of the policy and acknowledge the need for compliance. A list of files will be provided via email. ARC or Unit IT staff will be available to recommend acceptable tools and processes.

- Third Offense: Users committing a third offense will have access to the cluster suspended until they meet with the ARC director, attended by their faculty advisor or another authority. This meeting aims to ensure all parties involved understand the policy. A list of files will be provided, and alternative practices will be suggested by ARC or Unit IT staff to support responsible scratch usage.

- Fourth Offense: Users found in violation for the fourth time will face permanent revocation of cluster access.

This policy is designed to maintain fair and efficient use of ARC resources.

Terms of Usage and User Responsibility

Data is not backed up. None of the data on Great Lakes is backed up. The data that you keep in your home directory, /tmp or any other filesystem is exposed to immediate and permanent loss at all times. You are responsible for mitigating your own risk. ARC provides more durable storage on Turbo, Locker, and Data Den. See Storage Systems & Services for more information on these.
Your usage is tracked and may be used for reports. We track a lot of job data and store it for a long time. We use this data to generate usage reports and look at patterns and trends. We may report this data, including your individual data, to your adviser, department head, dean, or other administrator or supervisor.
Maintaining the overall stability of the system is paramount to us. While we make every effort to ensure that every job completes with the most efficient and accurate way possible, the stability of the cluster is our primary concern. This may affect you, but mostly we hope it benefits you. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hours a day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC on Twitter (@umichARC).
Great Lakes is intended only for non-commercial, academic research and instruction. Commercial use of some of the software on Great Lakes is prohibited by software licensing terms. Prohibited uses include product development or validation, any service for which a fee is charged, and, in some cases, research involving proprietary data that will not be made available publicly. Please contact [email protected] if you have any questions about this policy, or about whether your work may violate these terms.
You are responsible for the security of sensitive codes and data. If you will be storing export-controlled or other sensitive or secure software, libraries, or data on the cluster, it is your responsibility that is is secured to the standards set by the most restrictive governing rules. We cannot reasonably monitor everything that is installed on the cluster, and cannot be responsible for it, leaving the responsibility with you, the end user.
Data subject to HIPAA regulations may not be stored or processed on the cluster.

For more information on HIPAA, see the ITS Guide
For questions about Protected Health Information (PHI), contact Michigan Medicine Corporate Compliance at [email protected].

User Responsibilities

Users must manage data appropriately in their various locations:

/home
- 80 GB quota, mounted on Turbo
/scratch (more information below)
/tmp
/tmpssd
customer-provided NFS

Scratch Storage Policies

File quotas on /scratch are per root account (a PI or project account) and shared between child accounts (individual users):

10 TB storage limit
1 million file (inode) limit

These limits may be increased if needed. If you are in need of more scratch space or a greater file limit for your account please email us at [email protected]. Please note that these requests need to come from an administrator on the account and should include an explanation of why the increase is required.

Users should keep in mind that /scratch has an auto-purge policy on unaccessed files, which means that any unaccessed data will be automatically deleted by the system after 60 days. Scratch file systems are not backed up. Critical files should be backed up to another location.

Login Node Usage

Appropriate uses for the Great Lakes login nodes include:

Transferring small files to and from the cluster
Ordinary data management tasks, such as moving files, creating directories, etc.
Creating, modifying, and compiling code and submission scripts
Submitting and monitoring the status of jobs
Testing executables to ensure they will run on the cluster and its infrastructure.

Please be aware that the system now limits the use of Great Lakes login nodes to a maximum of 2 CPUs and 4 GB of memory. The login nodes should primarily be used for interacting with Slurm, managing your files, and quick debugging tasks. For any programs requiring more extensive resources or longer run times, we encourage you to use interactive jobs. We reserve the right to terminate any processes that we believe may disrupt other users.

Security on Great Lakes & Use of Sensitive Data

Applications and data are protected by secure physical facilities and infrastructure as well as a variety of network and security monitoring systems. These systems provide basic but important security measures including:

Secure access – All access to Great Lakes is via SSH or Globus. SSH has a long history of high-security.
Built-in firewalls – All of the Great Lakes servers have firewalls that restrict access to only what is needed.
Unique users – Great Lakes adheres to the University guideline of one person per login ID and one login ID per person.
Multi-factor authentication (MFA) – For all interactive sessions, Great Lakes requires both a UM Kerberos password and Okta authentication. File transfer sessions require a Kerberos password.
Private subnets – Other than the login and file transfer computers that are part of Great Lakes, all of the computers are on a network that is private within the University network and are unreachable from the Internet.
Flexible data storage – Researchers can control the security of their own data storage by securing their storage as they require and having it mounted via NFSv3 or NFSv4 on Great Lakes. Another option is to make use of Great Lakes’ local scratch storage, which is considered secure for many types of data. Note: Great Lakes is not considered secure for data covered by HIPAA.