Read Carefully below to avoid data loss
From January 17 through 24, we will be updating the hardware that manages the Great Lakes /scratch
file system. This replacement will necessitate a complete rebuild of the /scratch
file system, meaning all files currently in /scratch
or /ime will be wiped for a new filesystem.
It is up to you to backup any critical contents of your /scratch directory before maintenance.
The last day scratch will be available will be January 16, so data must be copied before that date.
Ensure that you leave enough time for that data to be copied.
ARC provided Copy of Scratch will be limited to data before December 18th
On December 18th ARC will start making a copy of data on /scratch
. Thus any data created or modified after December 17th might not be in that copy. Access to this copy will be made available for 45 days after the cluster comes back online. Users are responsible for keeping any data they need to keep that was created between December 17th and January 17th by copying to another location.
Data you cannot part with should be quickly copied to persistent storage such as Turbo or Data Den. See Scratch Policies
Data Storage Options other than Scratch
ARC Provides 2 Primary Services for Persistent Storage:
Both will remain available while Great Lakes is down through Globus.org or direct mounts of Turbo.
Additional capacity is available for a cost. Medicine and LSA provide support for these services to their researchers to lower costs beyond UMRCP limits.
To Get Ready to Copy Data
If you plan to copy data to Turbo or Data Den things need to be in place:
- Your PI has signed up for UMRCP - Do not delay. Processing requests takes a few workdays, the PI is emailed when services are ready and needs to notify users
- Your PI or project admin as listed in the Portal has given you access to a Turbo or Data Den volumes created by UMRCP or other source
Granting Access to Turbo or Data Den
User access to Turbo and Data Den on ARC services is done through our RMP Portal. Once a user is added syncing happens every hour and will require you to log out and back in for the appropriate permissions to be set. Users are added under the USERS tab in the portal under the specific Turbo or Data Den volume.
How to Copy /scratch Data to Turbo
Turbo if mounted on Great Lakes is mounted at /nfs/turbo/<volume-name>
. The volume name is most commonly school-uniquename
eg coe-brockp
but could be different. Turbo is active storage and can be used on any computer system on campus and Globus. This is generally your group's work-space for high-value data in use and is fast enough for the cluster but provides features such as replication and snapshots (on by default) to protect data in ways /scratch
does not.
Because Turbo is mounted on Great Lakes, you can use commands like cp
or the Globus website to move data. Be careful if you copy other people’s data you have access to with Globus, it may overwrite the file owner as your username and a different group on the destination. The cp
command will preserve file permissions with the -p
flag.
cp -rp /scratch/project_root/account0/username/importantfolder /nfs/turbo/<volume-name>/folder/
Globus Example Settings
Using Syncing Tools - rsync and Globus
You can also use the 'sync' option under Globus Transfer & Timer Options as below. This is similar to using rsync
or similar syncing tools. This can be used to update a folder in place to match the source folder on scratch and is faster than make a new copy where most the data is unchanged.
How to Copy /scratch Data to Data Den
Data Den is an archive and is only interacted with for most users through Globus.org. It also requires a small number of very large files and normally requires users to tar
or zip
to bundle files before upload. ARC provides an automated tool archivetar that manages the tar + Globus upload for you. Thus it is often possible to place a copy of data in the archive with only 3 commands.
From the Archivetar Quick Start
#load the archivetar tools
module load archivetar
#Change to directory to archive
cd /scratch/project_root/project0/<toarchive>/
#Run archive command to archive all data in the current folder
# tar and upload my-prefix-1.tar my-prefix-2.tar ... my-prefix-N.tar in Data Den folder at --destination-dir <path>
archivetar --prefix my-prefix --destination-dir /datadenvolume/<target-dataden-folder>/
Post Maintenance Workflow
When Great Lakes returns to service the copy of /scratch
from December 18th will be available at /scratch_bak
for 45 days. You will need to copy back the data to where you want it before then or the data will be lost forever. Any data you saved to your own locations that was modified after the 17th you will need to work to create a consistent view of all your data.
Lastly, ensure to understand the /scratch
policies and that high-value data, scratch
should never be the only copy with the 100TB of Data Den offered through UMRCP often a good second copy.
Important dates and reminders
-
Great Lakes
/scratch
will be wiped January 17th -
ARC will provide a copy of
/scratch
from December 18th -
After the maintenance,
/scratch_bak
will be mounted read-only to copy any data back from the December 18th copy -
/scratch_bak
will be decommissioned after Monday, March 10th, 2025. Ensure you move your data before that date
If you have questions about how to move data off of /scratch
, please email us.
FAQ/Notes
Q: I don't store data in /scratch
or /ime
and use only /home
or /nfs/turbo/
paths, does this impact me?
A: No only /scratch
and /ime
are impacted
Q: I am a PI and my students use Great Lakes. What are my responsibilities?
A: Contact all your students and ensure they have backed up everything they need and understand this message or another service well before January 17th. If not already register for UMRCP to provide an alternative storage location for your lab members and grant them access to the Portal once resources are provisioned.
Q: I'm not sure if my group uses /scratch,
how can I tell?
A: In most cases, you can check in the Portal or using the command scratch-quota <project_root>
(eg scratch-quota brockp_root
) if blocks is 0 nothing is stored. If a user has access to other accounts such as a departmental project (eg engin_root
or lsa_root
) They may store data there. Only that user can confirm if they store any data in /scratch
Q: I use ARC services but not Great Lakes. Does this impact me?
A: No, only data in /scratch
or /ime
on Great Lakes and no other cluster or storage system is impacted
Q: My cp
/ dcp
/ rsync
command is taking to long and my terminal times out, how can I avoid this?
A: You can try using Globus which will email you when complete, you can monitor progress online in the Activity tab. You can also use a tool like tmux or screen. If you use these tools, ensure when you reconnect, you connect to the specific login node the session was started on. For example, if you see when you login before you start tmux user@gl-login1
, when you reconnect, ssh to gl-login1.arc-ts.umich.edu
rather than the normal greatlakes.arc-ts.umich.edu
.
Q: I depend on /scratch
for everything important, this is really inconvenient.
A: We are sorry we have to make this change, but please re-read the the Scratch Storage Policies in the Great Lakes User Guide, Scratch is a global 'scratch pad' and has never had backups, replication, or snapshots therefore data is always at risk of loss. It is for in-flight data that is very short-lived and all data not used in the last 60 days is deleted. It is more of a data buffer as part of a workflow than storage. Scratch is free to use but it is a high-risk environment and high-value data should always be moved to Data Den Archive or Turbo with Replication and Snapshots quickly after creation. All research groups can get no-cost starter volumes through the U-M Research Computing Package.