GM4 Cluster Overview

This section of the documentation provides an overview of how the GM4 cluster is organized.

Partitions

The GM4 cluster consists of 29 4-way GPU nodes and 4 CPU-only nodes for a total of 33 nodes. The nodes are accessible thourgh two partitions, gm4 and gm4-pmext. The gm4-pmext partition includes all 33 nodes and is accessible to PI groups affiliated with the Pritzker School of Molecular Engineering (PME). The gm4 partition contains a subset (23 nodes) of the 33 nodes and is accessible to non-PME GM4 participants.

Partition Name Users Nodes Node List
gm4-pmext PME users 29 GPU nodes and 4 CPU-only nodes midway2-[0631-0663]
gm4 non-PME GM4 users 19 GPU nodes and 4 CPU-only nodes midway2-[0641-0663]

All non-PME users will use the gm4 partition to submit jobs, whereas PME users will use the gm4-pmext partition for job submission.

Slurm Quality of Service (QOS)

The fair-share use of the GM4 resources is managed through the Slurm scheduler Quality of Service (QOS) settings. The quality of service defines the type of resource, GPU or CPU-only, that a job can request and whether the job is a production or debug run. There are three QOS that can be used with both the gm4 and gm4-pmext partitions. The QOS settings are the same for the gm4 and the gm4-pmext partitions. The resource settings for each QOS are defined as follows:

QOS Name: gm4

(default if no QOS specified)

Per User Settings Per Account Settings
Max Wall Time QOS Priority Max Running Jobs Max Jobs Submit Max CPUs Max GPUs Max Jobs Submit Max CPUs Max GPUs
1-12:00:00 10000 28 28 320 32 48 320 32

QOS Name: gm4-cpu

Per User Settings Per Account Settings
Max Wall Time QOS Priority Max Running Jobs Max Jobs Submit Max CPUs Max GPUs Max Jobs Submit Max CPUs Max GPUs
1-12:00:00 10000 28 28 320 N/A 56 320 N/A

QOS Name: gm4-debug

Per User Settings Per Account Settings
Max Wall Time QOS Priority Max Running Jobs Max Jobs Submit Max CPUs Max GPUs Max Jobs Submit Max CPUs Max GPUs
0-30:00:00 100000 1 1 40 4 2 80 8