2025 ARC Winter Maintenance - /scratch Data migration Assistance

/scratch on Great Lakes will be deleted January 17th.
Read Carefully below to avoid data loss

From January 17 through 24, we will be updating the hardware that manages the Great Lakes /scratch file system. This replacement will necessitate a complete rebuild of the /scratch file system, meaning all files currently in /scratch or /ime will be wiped for a new filesystem

It is up to you to backup any critical contents of your /scratch directory before maintenance.
The last day scratch will be available will be January 16, so data must be copied before that date.
Ensure that you leave enough time for that data to be copied.

ARC provided Copy of Scratch will be limited to data before December 18th

For any Data Created or Modified after December 17th, the user needs to take action to save to another location before January 16th

On December 18th ARC will start making a copy of data on /scratchThus any data created or modified after December 17th might not be in that copy. Access to this copy will be made available for 45 days after the cluster comes back online. Users are responsible for keeping any data they need to keep that was created between December 17th and January 17th by copying to another location.

NOTE: Scratch is a global temporary space and high-value data you cannot part with should not reside there for any extended period of time.  It has no backups or replication and is subject to loss in a failure.

Data you cannot part with should be quickly copied to persistent storage such as Turbo or Data Den. See Scratch Policies

Data Storage Options other than Scratch

ARC Provides 2 Primary Services for Persistent Storage:

Both will remain available while Great Lakes is down through Globus.org or direct mounts of Turbo.

No Cost starter amounts of Turbo (10TB) and Data Den (100TB) are available to every PI under the UM Research Computing Package (UMRCP)
Additional capacity is available for a cost. Medicine and LSA provide support for these services to their researchers to lower costs beyond UMRCP limits.

To Get Ready to Copy Data

If you plan to copy data to Turbo or Data Den things need to be in place:

  1. Your PI has signed up for UMRCP - Do not delay. Processing requests takes a few workdays, the PI is emailed when services are ready and needs to notify users
  2. Your PI or project admin as listed in the Portal has given you access to a Turbo or Data Den volumes created by UMRCP or other source

Granting Access to Turbo or Data Den

User access to Turbo and Data Den on ARC services is done through our RMP Portal. Once a user is added syncing happens every hour and will require you to log out and back in for the appropriate permissions to be set.  Users are added under the USERS tab in the portal under the specific Turbo or Data Den volume.

Only Project Admins listed in the portal can add/remove users in the portal or email [email protected] to grant access. Existing admins can add others to be admins on the portal.  This allows the management of ARC services to be delegated by PIs to other lab members.

How to Copy /scratch Data to Turbo

Turbo if mounted on Great Lakes is mounted at /nfs/turbo/<volume-name>. The volume name is most commonly school-uniquename eg coe-brockp but could be different.  Turbo is active storage and can be used on any computer system on campus and Globus.  This is generally your group's work-space for high-value data in use and is fast enough for the cluster but provides features such as replication and snapshots (on by default) to protect data in ways /scratch does not.

Because Turbo is mounted on Great Lakes, you can use commands like cp or the Globus website to move data. Be careful if you copy other people’s data you have access to with Globus, it may overwrite the file owner as your username and a different group on the destination. The cp command will preserve file permissions with the -p flag.

Copy command example:
cp -rp /scratch/project_root/account0/username/importantfolder  /nfs/turbo/<volume-name>/folder/

Globus Example Settings

Using Syncing Tools - rsync and Globus

You can also use the 'sync' option under Globus Transfer & Timer Options as below.  This is similar to using rsync or similar syncing tools.  This can be used to update a folder in place to match the source folder on scratch and is faster than make a new copy where most the data is unchanged.

How to Copy /scratch Data to Data Den

Data Den is an archive and is only interacted with for most users through Globus.org.  It also requires a small number of very large files and normally requires users to tar

 or zip  to bundle files before upload.  ARC provides an automated tool archivetar that manages the tar + Globus upload for you.  Thus it is often possible to place a copy of data in the archive with only 3 commands.

From the Archivetar Quick Start

  #load the archivetar tools
module load archivetar

  #Change to directory to archive
cd /scratch/project_root/project0/<toarchive>/

  #Run archive command to archive all data in the current folder
  # tar and upload my-prefix-1.tar my-prefix-2.tar ... my-prefix-N.tar in Data Den folder at --destination-dir <path>
archivetar --prefix my-prefix --destination-dir /datadenvolume/<target-dataden-folder>/

Post Maintenance Workflow

When Great Lakes returns to service the copy of /scratch from December 18th will be available at /scratch_bak for 45 days.  You will need to copy back the data to where you want it before then or the data will be lost forever.  Any data you saved to your own locations that was modified after the 17th you will need to work to create a consistent view of all your data.

Lastly, ensure to understand the /scratch policies and that high-value data, scratch should never be the only copy with the 100TB of Data Den offered through UMRCP often a good second copy.

Important dates and reminders

  • Great Lakes /scratch will be wiped January 17th

  • ARC will provide a copy of /scratch from December 18th

  • After the maintenance, /scratch_bak will be mounted read-only to copy any data back from the December 18th copy

  • /scratch_bak will be decommissioned after Monday, March 10th, 2025. Ensure you move your data before that date

If you have questions about how to move data off of /scratch, please email us.

FAQ/Notes

Q: I don't store data in /scratch or /ime and use only /home  or /nfs/turbo/  paths, does this impact me?
A: No only /scratch and /ime are impacted

Q: I am a PI and my students use Great Lakes. What are my responsibilities?
A: Contact all your students and ensure they have backed up everything they need and understand this message or another service well before January 17th.  If not already register for UMRCP to provide an alternative storage location for your lab members and grant them access to the Portal once resources are provisioned.

Q: I'm not sure if my group uses /scratch,how can I tell?
A: In most cases, you can check in the Portal or using the command scratch-quota <project_root>  (eg scratch-quota brockp_root) if blocks is 0 nothing is stored.  If a user has access to other accounts such as a departmental project (eg engin_root or lsa_root) They may store data there. Only that user can confirm if they store any data in /scratch

Q: I use ARC services but not Great Lakes. Does this impact me?
A: No, only data in /scratch or /ime on Great Lakes and no other cluster or storage system is impacted

Q: My cp / dcp / rsync command is taking to long and my terminal times out, how can I avoid this?
A: You can try using Globus which will email you when complete, you can monitor progress online in the Activity tab.  You can also use a tool like tmux or screen. If you use these tools, ensure when you reconnect, you connect to the specific login node the session was started on.  For example, if you see when you login before you start tmux user@gl-login1, when you reconnect, ssh to gl-login1.arc-ts.umich.edu rather than the normal greatlakes.arc-ts.umich.edu.

Q: I depend on /scratch for everything important, this is really inconvenient. 
A: We are sorry we have to make this change, but please re-read the the Scratch Storage Policies in the Great Lakes User Guide, Scratch is a global 'scratch pad' and has never had backups, replication, or snapshots therefore data is always at risk of loss. It is for in-flight data that is very short-lived and all data not used in the last 60 days is deleted. It is more of a data buffer as part of a workflow than storage. Scratch is free to use but it is a high-risk environment and high-value data should always be moved to Data Den Archive or Turbo with Replication and Snapshots quickly after creation. All research groups can get no-cost starter volumes through the U-M Research Computing Package.