Frequently Asked Questions

General Topics:

The Rockfish cluster is a resource available to researchers from Johns Hopkins University, Morgan State University and XSEDE. Data that is subject to restrictions like HIPAA/PHI is NOT allowed. If your research involves an IRB and the data is de-identified please contact our help system for additional information.

Rockfish uses “coldfront” to allow PIs and users to easily request and manage allocations and user accounts.

NOTE: As of April 15,2022 all “active” PIs using the Bluecrab cluster have a startup allocation on Rockfish (50,000 hours). Use this trial access to benchmark your codes and obtain information to submit a proposal by November 2022

All users get a 50GB HOME directory. This directory is backed up once a week.

All groups will get a 10TB allocation on the parallel file system (GPFS) file system (see below for file system organization)

HOME directories are backed once a week to an off-site location. Other file systems TBD

ssh [-XY] login.rockfish.jhu.edu -l userid

The userid for most users is the JHED (example jcombar1)

This work was carried out at the Advanced Research Computing at Hopkins (ARCH) core facility  (rockfish.jhu.edu), which is supported by the National Science Foundation (NSF) grant number OAC
1920103.

Add any other funding agencies and grants

The base system for rockfish.jhu.edu was deployed using a grant from the National Science Foundation (NSF). In addition to providing  HPC and data intensive computing resources to accomplish the projects described in the MRI proposal, the cluster will provide the common infrastructure for other research groups to add resources (condos), to increase compute capacity and to gain access to larger allocations.

FAQ

Videos:

PIs and Users should login to coldfront, create an account, request Allocations and add user accounts to allocations. Allocations are linked to the PI and users but it has no resources. Resources will be added after the Advanced Computing Committee approves required proposals (see below  Allocations). This video describes the process to create accounts, request allocations, request user accounts and designate a proxy.

Allocations:

  1. PI’s need to submit a short proposal
  2. PI’s may request three types of allocations: regular, large memory and GPU
  3. Rockfish also provides “Startup” allocations for new research groups to become familiar with the environment, run benchmarks and have a better basis to submit proposals. This is a one time proposal.  Please send an email to help requesting access to the GPU, LM or regular compute nodes.
  • user “sbalance -a “group”
  • To find out user utilization:  “test-sbalance -u $USER”

Job scheduling and management (SLURM):

Type the command “sinfo -s” to get a list of the partition/queues
 
sinfo -p partiition-name” will display the utilization for this partition.

use the “interact” command  “interact  -usage”  for help

sinfo -s”  or “sinfo -q  name-of-queue”

 

Rockfish has a limited amount of large memory nodes (10). These nodes should be used ONLY if the job needs more than 192GB or memory. PIs will be given an allocation (PI-userid_bigmem)  to be used only submitting jobs to the LM nodes.

Likewise, Rockfish has a limited number (10) of GPU nodes.  Each node has 48 cores and 4 A100 GPUs. PIs should request an allocation (PI-userid_a100)  for the GPU nodes. All jobs submitted to this queue will use this allocation.  Each GPU is associated with 12 cores.

Users who belong to multiple groups or have different Slurm allocations, for example, regular memory, gpu and bigmem, need to use the Slurm flag (#SBATCH -A account-name) to select the Slurm account they want to use. For example, to use a second PI allocation (PI=johnDoe1);

#SBATCH -A johndoe1

You will need to run the “sqme” command to find out  the JobID and the node(s) where your job is running. For example  JobID=123456789  Node=c001

Type at the prompt:

srun –jobid=123456789 -w c001   –pty /bin/bash”  (Note: it is dashdashJobID)

Help:

Submit a ticket to help@rockfish.jhu.edu  Include:  Detailed description of the problem,  userid, a snapshot if possible

BASIC commands.

(Intel compilers)  ifort/icc   -xHOST -O3 -o code.x code.f90  [other flags]

(gnu compiler)  gcc/gfortran -O3 -march=native -mtune=native -march=cascadelake-avx2

Windows 10 users can take advantage of “Shell” on Windows by installing the “Linux Bash Shell” (link)

 

ssh [-XY] login.rockfish.jhu.edu -l UserID  [-p 22]  [Optional]

Data Transfer:

DTNs are a set of dedicated nodes for file transfer. These servers are GlobusConnect end points and should be used to transfer large amounts of data (> 100s GBs)

the Rockfish end point is “Rockfish User data”

  • Use the Globus connect end point
  • Request a GlobusConnect account
  • Login into your GlobusConnect account
  • Select the end points (for example MARCC or Rockfish)
  • Authenticate to your end points
  • Select the file(s) to transfer
  • Start the file transfer
 
If you need to transfer many (Thousands) of small files:
  • Compress many files into a tar file of at least 100GB in size. This will give better performance and will not ‘break’ the data transfer node. For example: “tar -zcvf junk.tgz JUNK”. This command will compress all the files in directory JUNK into the compressed file junk.tgz
  • Follow the same process as above

** Please note that if you have terabytes of data to move, the DTN will give better performance  if you split them into several chunks instead of one big file.

  • Aspera can be used from any login node but if you are planning to transfer large amounts of data we strongly recommend you use teh Data transfer Nodes (rfdtn1  or rfdtn2).
  • module load Aspera-Connect
  • ascp -T -l8G -i /data/apps/extern/Aspera-Connect/4.1.1/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:gene/DATA /scratch4/jcombar1
  1.  -T do not encrypt,
  2. -l8G 8000MB bandwidth.
  • Download Filezilla  (Web search)
  • Install Filezilla (local machine)
  • Launch Filezilla. Your local machine files and folders should be visible on the left side
  • Click on the top left “icon” or click File-> Site Manager. A new window pops up
  • Click on New site and name it “Rockfish”
  • Click on “General”
  • Host: rfdtn1.rockfish.jhu.edu Port 22   (or rfdtn2,  rfdtn3  for HORNet connectivity) 
  • Protocol: SFTP – SSH File Transfer Protocol   (select)
  • Logon Type: Interactive (select)
  • User: Your Rockfish userid  (for example: jdoe12345)  (Type)
  • Password: Leave blank (recommended)
  • Click on “Transfer Settings”
  • Select the Limit number of simultaneous connections and set it to “1”
  • Click on “Connect”
  • You should be connected. Rockfish files and Folders should be visible on the right side
  • Select and Drag files/folders

Scripts

Users and PIs may want to find out user and group utilization for the current quarter by using the script “user-sbalance”.  This script will report the utilization for the individual user ($USER) and if $USER is the PI then it will report utilization for the whole group.  For example:
1 2 3 drcomb1> user-sbalance    ## the user in this case is drcomb1 Allocation:              Userid: jaimecomb                 4.3 / 5000.0   0.1% drcomb1                   2.7                       0.1% jaimecomb_gpu             212.3 / 50000.0         0.4% drcomb1                   0.8                       0.0% jaimecomb_bigmem          0.0 / 10000.0             0.0% drcomb1                   0.0                       0.0%
Optionally,  user-sbalance -g will display information for all members of the group.

“quotas.py” can be used to find out how much data a group has in Rockfish file-systems.  It will provide utilization and quotas as well as number of files. If a user belongs to several groups the information will be displayed per group.

Example:

drcomb1>  quotas.py

Home Directory Usage for user drcomb1
UsedQuotaPercentFiles
5.34 GB50.00 GB10.68%153,396
Quota Usage for Group jaimecomb:
FSUsedQuotaUsed %FilesFiles QuotaFiles %
data6.52 GB1.00 TBO%4,061409,6000%
scratch4470.68 GB10.00 TB4.00%12,14020,971,5200%
scratch161.24 TB10.00 TB12.00%75,12110,485,7600%