Titan
Titan, named after the moon of Saturn, is a Cyberinfrastructure Research Computing (CIRC) cluster built for providing access to GPUs.
Titan is currently slated for migration into Ganymede2, the next iteration of the Ganymede campus condo cluster. As such we are no longer offering buy-ins to Titan, rather to Ganymede2. |
Titan node setup
Titan is configured so multiple jobs can run on one
node, depending on the GPUs and memory specified. The
following nodes are available in the normal
queue:
Node Name | GPU Types | Number of GPUs | Cores | Memory |
---|---|---|---|---|
compute-[01-02, 04-08] |
NVIDIA GeForce RTX 3090 |
4 |
8 |
193 GB |
compute-03 |
NVIDIA GeForce RTX 3090 |
8 |
16 |
386 GB |
Using Titan
Logging in
Titan is accessed via SSH. Once your account
is activated, you can connect to Titan at titan.circ.utdallas.edu
. For
example, in a typical terminal client run the command:
ssh <NetID>@titan.circ.utdallas.edu
More information on setting up SSH access to CIRC machines on your computer can be found here.
Requesting resources for jobs
Titan, like most of the CIRC systems, uses Slurm for job submission and scheduling. However, Titan has a few special requirements due to its configuration.
GPUs are considered Slurm Generic Resources. Slurm defines several convenience settings for configuring GPU access for your jobs. For example, if your job needs 4 CPUs, 2 GPUs, and 16 GB of memory on one node, your Slurm batch script should include:
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --gpus=2
#SBATCH --mem=16G
In order for jobs to share nodes, |
Similarity, for an interactive job, run
srun -N 1 -n 4 --gpus=2 --mem=16G --pty /bin/bash
Using containers
Many GPU codes are distributed via Docker containers. Docker is not allowed on CIRC systems due to security issues. However, Docker containers can be used by running them with Apptainer/Singularity. For example, you can use the TensorFlow Docker container from DockerHub with the following commands:
# Loads the Singularity Module
module load singularity
# Pull the TensorFlow Docker container and transform
# it into a Singularity sandbox
singularity build --sandbox tensorflow_sandbox/ docker://tensorflow/tensorflow
# Run your Python script with the tensorflow container
singularity run -u --nv tensorflow_sandbox python <your_python_script.py>
Using Singularity on Titan requires passing the |
Available software
You can view all available modules on Titan by running the command module
spider
. If you need new software installed or a different version than is
provided, please contact circ-assist@utdallas.edu.
Troubleshooting on Titan
The following scenarios commonly come up on Titan. If your problem isn’t listed below, please contact circ-assist@utdallas.edu for help.
- My code isn’t using all requested GPUs
-
Some code requires MPI with one GPU per task to use multiple GPUs. If this matches your code setup, the following Slurm batch script settings assign one GPU and four GB of memory per MPI Task:
#SBATCH -N 1 #SBATCH -n 2 #SBATCH --gpus-per-task=1 #SBATCH --mem-per-gpu=4G