Storage & Filesystems
Rockfish uses the General Parallel File System (GPFS) or IBM Spectrum Scale to deploy a scalable high performance scratch file system. GPFS provides a global namespace, shared file system with simultaneous file access from multiple compute nodes. It also provides high recover-ability and data availability. All allocations on these file sets are shared by the research group and have a quota per group.
Storage provided by ARCH is only for research and educational data, code, and documents for use on the Rockfish cluster. ARCH creates, modifies, and enforces quotas based on group allocations. In addition, ARCH reserves the right to delete, move, or otherwise make data unavailable on any storage system as deemed necessary by ARCH personnel to maintain the overall quality of service.
While ARCH makes every effort to maintain the availability and integrity of our storage systems, the storage systems are not backed up by default. Users are responsible for purchasing backup services or setting up their own backups of their data.
As ARCH has a finite amount of resources, increases to storage quotas are given out on an as-needed basis at the discretion of ARCH staff based on system conditions, available space, and needs. Before requesting an increase in storage, please remove any data from the system that is no longer needed.
If you have a project requiring temporary storage, please reach out to ARCH staff to discuss your needs.
Filesystems At a Glance
|Filesystem||System Type||Total Size||Block Size||Default Quota||Files Per T||Backed Up?|
|/scratch4/||IBM GPFS||3.8PB||4MB||10T||2,000 files per TB||No|
|/scratch16/||IBM GPFS||3.6PB||16MB||N/A||1000 files per TB||No|
|/data/||IBM GPFS||5.1PB||16MB||20T||400 files per TB||No|
- Each user on the Rockfish cluster is provided 50GB of home directory storage through ZFS on Linux.
- The home filesystem uses NVMe SSD drives.
- /home/ is intended for small, frequently used items such as source code, scripts, and cannot be used for Input/Output (I/O i.e. Reading or Writing) from jobs and programs.
- Limited file recovery is available for home areas.
- Scratch space is held on ARCH’s high-performance GPFS parallel file system.
- Scratch is intended to be used for staging data which is required/generated by computational processes running on the cluster. This type of data is often referred to as “intermediate data” and should be data that can easily be deleted at the conclusion of a project.
- Files in the scratch filesystems are not backed up or recoverable. ARCH does not back up files in scratch directories. If you accidentally delete a file, old files are purged, or the filesystem crashes, they cannot be restored. Files in either scratch directory with an access time older than 30 days will be automatically purged and are irrecoverable. It is the users responsibility to regularly move important files to /data or backed up locally.
- Scratch space is available on two different file systems with 4mb and 16mb block size.
- 4mb blocksize
- Groups are given access to a 10T allocation by default.
- If your research involves genomics, bioinformatics, mechanical engineering – it is likely that /scratch4/ is where you would receive the best performance.
- 16mb blocksize
- Groups are not given a /scratch16/ allocation by default, but can request one if their workflow would be improved by a larger block size.
- /scratch16/ is optimized for sequential IO and streaming data flows.
- If your research involves physics, some chemistry – it is likely that /scratch16/ is where you would receive the best performance.
- 16mb blocksize
- All groups are given access to a 20T allocation by default.
- /data/ is intended to accumulate “high-value” data that may be needed in the future for follow-ups, mining, etc.
- Files on /data/ are not backed up, but are protected with disaster recovery. Files are not snapshotted and thus are not protected against users moving or deleting files.
- PIs interested in backups of their data can contact email@example.com