Best Practices

Guidelines to optimize resource utilization on Rockfish

One of the hardest tasks on HPC is to estimate the ‘job’ parameters (memory, timelimit, number of cores, etc.) to avoid waste of resources or fall short and abort due to limits on the resources. This guide may be of help for many new users.

1. Try to understand if your application runs in serial mode (one process) or uses multiple processes either via threads or MPI libraries. This is important. OpenMP (o threaded) applications will run only within a node. MPI-based codes can use many nodes.

2. If the application uses threads (for example Matlab or Gaussian), try to determine the best number of threads to be used.  If the application by default uses all available cores, it may be a waste and the job may take longer to complete.

3. Run some benchmarks, hopefully short time (1-2 hours) and a few number of processes/cores. Recommendation, use the “interact” command to request an interactive session. Keep in mind:

3.1 for the parallel queue, each node has 48 cores and each node is associated with roughly 4 GB of memory.

3.2  The memory for the job is determined by the number of tasks (–ntasks-per-node, “n”) your job requests. For example , if “n” is set to 1  then the whole job will have a maximum of 4 GB of memory. If you set “n” to 2, then the whole job will have 8 GB  for the maximum amount of memory. 

3.3  If you have a serial  job (creates just one running process) but needs more than 4 GB  or memory, requesting 2 cores  will increase/double the amount of memory available for the job.

4. Check job performance: 

4.1  Users should be able to connect to the nodes where their jobs are running and check their running jobs by using tools like top or ‘gpustat’ for gpu  jobs. You will need the jobid and the node where it is running:   srun –jobid=12345 -w  c001 –pty /bin/bash. If you are running a  GPU code, use  “ml nvitop; nvitop”  to get a good picture of the GPU utilization while the jobs are running.

4.2 If the job completes with no errors, like out of time or out of memory, we strongly recommend users check the job performance using “seff” seff JobID; For example:

				
					seff 12345
Job ID: 12345
Cluster: slurm
User/Group: User/PI
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 8
CPU Utilized: 23:35:58
CPU Efficiency: 11.94% of 8-05:38:48 core-walltime
Job Wall-clock time: 1-00:42:21
Memory Utilized: 3.90 GB
Memory Efficiency: 12.99% of 30.00 GB

				
			

This job requested 8 cores and shows a CPU efficiency of  only 12%. Memory efficiency of 13%. This user may want to request a maximum of 2 cores per job and save 80% of the time.

 

5. Tips for GPU jobs
For the a100 partition, each GPU is mapped to 12 cores. If your job will be using ONE GPU then do not request more than 12 cores. Also, each core is mapped to about 3.89GB of memory. The memory per job will be given by the number of cores:

#SBATCH -N 1

#SBATCH –ntasks-per-node=12

#SBATCH –gres=gpu:1

This piece of code will give you access to one GPU, 12 cores and about 46GB of memory.

Keep in mind that if you request more than 12 cores (for example 13), automatically your job will be assigned TWO GPUs and if not in use you will be wasting valuable resources.

 

Follow the same tips for the ica100 but  each GPU is mapped to 16 cores