Frequently Asked Questions

ARCH
Support
FAQ

General Topics:

What type of data can I upload to the Rockfish cluster?

Data subject to restrictions – including but not limited to, HIPAA, PHI, or CUI is not permitted on ARCH Systems. If your research involves an IRB and the data is de-identified, please reach out to help@arch.jhu.edu for further guidance, prior to storing or processing any data.

Allocations and Accounts

Rockfish uses “coldfront” to allow PIs and users to easily request and manage allocations and user accounts.

NOTE: As of April 15,2022 all “active” PIs using the Bluecrab cluster have a startup allocation on Rockfish (50,000 hours). Use this trial access to benchmark your codes and obtain information to submit a proposal by November 2022

Default resources for users

All users get a 50GB HOME directory. This directory is backed up once a week.

All groups will get a 10TB allocation on the parallel file system (GPFS) file system (see below for file system organization)

Backups

HOME directories are backed once a week to an off-site location. Other file systems TBD

How do I connect to the Rockfish cluster?

ssh [-XY] login.rockfish.jhu.edu -l userid

The userid for most users is the JHED (example jcombar1)

Suggested paragraph to acknowledge Rockfish resources

"This work was carried out at the Advanced Research Computing at Hopkins (ARCH) core facility (rockfish.jhu.edu), which is supported by the National Science Foundation (NSF) grant number OAC
1920103. "

Add any other funding agencies and grants

What is the condominium model

The base system for rockfish.jhu.edu was deployed using a grant from the National Science Foundation (NSF). In addition to providing HPC and data intensive computing resources to accomplish the projects described in the MRI proposal, the cluster will provide the common infrastructure for other research groups to add resources (condos), to increase compute capacity and to gain access to larger allocations.

Videos:

Coldfront. Allocations and Accounts

PIs and Users should login to coldfront, create an account, request Allocations and add user accounts to allocations. Allocations are linked to the PI and users but it has no resources. Resources will be added after the Advanced Computing Committee approves required proposals (see below Allocations). This video describes the process to create accounts, request allocations, request user accounts and designate a proxy.

Allocations:

How do I request an allocation on Rockfish?

PI’s need to submit a short proposal
PI’s may request three types of allocations: regular, large memory and GPU
Rockfish also provides “Startup” allocations for new research groups to become familiar with the environment, run benchmarks and have a better basis to submit proposals. This is a one time proposal. Please send an email to help requesting access to the GPU, LM or regular compute nodes.

How do I find my utilization?

user “sbalance -a “group”
To find out user utilization: “test-sbalance -u $USER”

Job scheduling and management (SLURM):

How do I find what queues are available

Type the command “sinfo -s” to get a list of the partition/queues

“sinfo -p partiition-name” will display the utilization for this partition.

How do I request interactive access to compute nodes

use the “interact” command “interact -usage” for help

How do I find out if there are nodes available to run my job(s)

“sinfo -s” or “sinfo -q name-of-queue”

Large memory jobs and accounting

Rockfish has a limited amount of large memory nodes (10). These nodes should be used ONLY if the job needs more than 192GB or memory. PIs will be given an allocation (PI-userid_bigmem) to be used only submitting jobs to the LM nodes.

GPU nodes and accounting

Likewise, Rockfish has a limited number (10) of GPU nodes. Each node has 48 cores and 4 A100 GPUs. PIs should request an allocation (PI-userid_a100) for the GPU nodes. All jobs submitted to this queue will use this allocation. Each GPU is associated with 12 cores.

Multiple account management

Users who belong to multiple groups or have different Slurm allocations, for example, regular memory, gpu and bigmem, need to use the Slurm flag (#SBATCH -A account-name) to select the Slurm account they want to use. For example, to use a second PI allocation (PI=johnDoe1);

#SBATCH -A johndoe1

Access to Compute nodes where JOBS are running

You will need to run the “sqme” command to find out the JobID and the node(s) where your job is running. For example JobID=123456789 Node=c001

Type at the prompt:

srun –jobid=123456789 -w c001 –pty /bin/bash” (Note: it is dashdashJobID)

Help:

How do I request help?

Submit a ticket to help@rockfish.jhu.edu Include: Detailed description of the problem, userid, a snapshot if possible

How do I find the status of Rockfish?

check website
twitter

How do I compile my code?

BASIC commands.

(Intel compilers) ifort/icc -xHOST -O3 -o code.x code.f90 [other flags]

(gnu compiler) gcc/gfortran -O3 -march=native -mtune=native -march=cascadelake-avx2

How do I connect to Rockfish using a Windows 10 machine?

Windows 10 users can take advantage of “Shell” on Windows by installing the “Linux Bash Shell” (link)

ssh [-XY] login.rockfish.jhu.edu -l UserID [-p 22] [Optional]

CPU Efficiency

Users can find out the efficiency of their jobs, CPU Efficiency and Memory Efficient by using the SLurm command “seff JobID”, for example:

seff 19474002
Job ID: 19474002
Cluster: slurm
User/Group: jcombar1/rfadmin
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 32
CPU Utilized: 06:51:09
CPU Efficiency: 99.34% of 06:53:52 core-walltime
Job Wall-clock time: 00:12:56
Memory Utilized: 2.07 GB
Memory Efficiency: 0.41% of 500.00 GB

NOTE: to find out CPU/Memory efficiency your jobs need to complete. If jobs aborted, failed no accurate information will be obtained

GPU Utilization

For GPU jobs, users can obtain additional information by running the command “jobstats JobID” from a login node. For example:

Job ID: 20303181
NetID/Account: jcombar1/rfadmin
Job Name: scr4
State: COMPLETED
Nodes: 1
CPU Cores: 24
CPU Memory: 48GB (2GB per CPU-core)
GPUs: 2
QOS/Partition: qos_gpu/a100
Cluster: slurm
Start Time: Wed Jan 22, 2025 at 12:17 PM
Run Time: 00:32:21
Time Limit: 12:00:00

Overall Utilization
=====================================
CPU utilization           [39%]
CPU memory usage     [ 34%]
GPU utilization               [ 62%]
GPU memory usage    [97%]

Data Transfer:

What is a “data transfer node” DTN?

DTNs are a set of dedicated nodes for file transfer. These servers are GlobusConnect end points and should be used to transfer large amounts of data (> 100s GBs)

How do I transfer large amounts of data?

the Rockfish end point is “Rockfish User data”

Use the Globus connect end point
Request a GlobusConnect account
Login into your GlobusConnect account
Select the end points (for example MARCC or Rockfish)
Authenticate to your end points
Select the file(s) to transfer
Start the file transfer

If you need to transfer many (Thousands) of small files:

Compress many files into a tar file of at least 100GB in size. This will give better performance and will not ‘break’ the data transfer node. For example: “tar -zcvf junk.tgz JUNK”. This command will compress all the files in directory JUNK into the compressed file junk.tgz
Follow the same process as above

** Please note that if you have terabytes of data to move, the DTN will give better performance if you split them into several chunks instead of one big file.

How do I use “aspera” to download files to Rockfish?

Aspera can be used from any login node but if you are planning to transfer large amounts of data we strongly recommend you use teh Data transfer Nodes (rfdtn1 or rfdtn2).
module load Aspera-Connect
ascp -T -l8G -i /data/apps/extern/Aspera-Connect/4.1.1/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:gene/DATA /scratch4/jcombar1

-T do not encrypt,
-l8G 8000MB bandwidth.

How do I move small files/directories to/from Rockfish using Filezilla?

Download Filezilla (Web search)
Install Filezilla (local machine)
Launch Filezilla. Your local machine files and folders should be visible on the left side
Click on the top left “icon” or click File-> Site Manager. A new window pops up
Click on New site and name it “Rockfish”
Click on “General”
Host: rfdtn1.rockfish.jhu.edu Port 22 (or rfdtn2, rfdtn3 for HORNet connectivity)
Protocol: SFTP – SSH File Transfer Protocol (select)
Logon Type: Interactive (select)
User: Your Rockfish userid (for example: jdoe12345) (Type)
Password: Leave blank (recommended)
Click on “Transfer Settings”
Select the Limit number of simultaneous connections and set it to “1”
Click on “Connect”
You should be connected. Rockfish files and Folders should be visible on the right side
Select and Drag files/folders

Scripts

What is my user and group utilization?

Users and PIs may want to find out user and group utilization for the current quarter by using the script “user-sbalance”. This script will report the utilization for the individual user ($USER) and if $USER is the PI then it will report utilization for the whole group. For example:

drcomb1> user-sbalance    ## the user in this case is drcomb1
Allocation:              Userid:

jaimecomb                 4.3 / 5000.0   0.1%

drcomb1                   2.7                       0.1%

jaimecomb_gpu             212.3 / 50000.0         0.4%

drcomb1                   0.8                       0.0%

jaimecomb_bigmem          0.0 / 10000.0             0.0%

drcomb1                   0.0                       0.0%

Optionally, user-sbalance -g will display information for all members of the group.

How much data do I have in file-systems?

“quotas.py” can be used to find out how much data a group has in Rockfish file-systems. It will provide utilization and quotas as well as number of files. If a user belongs to several groups the information will be displayed per group.

Example:

drcomb1> quotas.py

Home Directory Usage for user drcomb1
Used	Quota	Percent	Files
5.34 GB	50.00 GB	10.68%	153,396

Quota Usage for Group jaimecomb:
FS	Used	Quota	Used %	Files	Files Quota	Files %
data	6.52 GB	1.00 TB	O%	4,061	409,600	0%
scratch4	470.68 GB	10.00 TB	4.00%	12,140	20,971,520	0%
scratch16	1.24 TB	10.00 TB	12.00%	75,121	10,485,760	0%

Frequently Asked Questions

General Topics:

FAQ

Videos:

Allocations:

Job scheduling and management (SLURM):

Help:

Data Transfer:

If you need to transfer many (Thousands) of small files:

Scripts

Address

Resources

Training

Stay connected with alerts and newsletters

ARCH is supported by: