HPC Vocabulary
Using High Performance Computing (HPC) resources involves learning the language commonly used to describe experiment setups, resources, cluster configuration, etc. Here are some commonly used terms.
Experiment Terms
- Job
-
A single workload submitted to the scheduling system. For example, running a script or program on a node.
Resource Terms
- Cluster
-
A group of computers (called nodes) linked together with an internal network (called an interconnect).
- Condo
-
A method of node purchasing and setup where a researcher purchases their own compute hardware, subject to approval from the cluster operator, and the cluster operator installs and manages it. Condo nodes are "exclusive access" to the purchaser and are generally available only to their group.
- Interconnect
-
The internal network that allows cluster nodes to communicate with one another.
- Node
-
one computer composing an HPC cluster. Jobs run on HPC clusters can often use more than one node.
- Job Scheduler
-
manages access to the computing resources on the cluster. UT Dallas largely uses Slurm.
- Partition
-
groups of nodes with imposed constraints (e.g., allowed users).
- Processor
-
sometimes referred to as a core. Cores can execute instructions independently, which enables parallel programs.
- GPU
-
Graphics Processing Unit. Dedicated hardware for highly parallel processing.
- Queue
-
the list of jobs currently running or waiting for resources for a particular partition.