Ganymede2

Ganymede2, the successor cluster to Ganymede, is (like its predecessor) named after Jupiter’s largest satellite. Ganymede2 is, like its predecessor, built on the condo model. Presently, Ganymede2 is in mid-to-late "beta" stage and thus only has a handful of dedicated compute nodes available to all UT Dallas researchers. A large majority of the hardware that makes up Ganymede2 is owned by individual researchers.

For information about purchasing nodes to add to Ganymede, email circ-assist@utdallas.edu.

Despite the fact that Ganymede2 is primarily owned by private researchers, the system has what are called "preempt" queues, which allow job submission from all Ganymede2 users. These preempt jobs (named cpu-preempt and gpu-preeempt) are heavily de-prioritized to the actual queue owner, so any workloads submitted to these queues should be seen as volatile and heavily utilize checkpointing.

When a preempt job is preempted, that job is killed immediately and forcefully. If data isn’t being constantly saved to an output file, DATA LOSS WILL OCCUR.

Ganymede2 node setup

Ganymede2, unlike its predecessor, allows multiple jobs per node. Nodes can be "mixed"state, which indicates that node is currently processing multiple jobs at once. GPU nodes with multiple GPUs can have individual GPUs queued up to different jobs, or in some cases each GPU can run multiple jobs at the same time.

The following partitions are available to all users:

Queue Name Number of nodes Cores/Threads (CPU Architecture) Memory Time Limit ([d-]hh:mm:ss) GPUs? Use Case

dev

2

64/128 (Ice Lake)

256GB

2:00:00

No

Code debugging, job submission testing

normal

4

64/128 (Ice Lake)

256GB

2-00:00:00

No

Normal code runs, CPU only

cpu-preempt

8

VARIOUS

VARIOUS

7-00:00:00

No

Volatile CPU job submission

gpu-preempt

6

VARIOUS

VARIOUS

7-00:00:00

Yes, VARIOUS types

Volatile GPU job submission

Ganymede2 storage

Ganymede2 has multiple user-writable storage directories, accessible from the login node (ganymede2.circ.utdallas.edu) and the compute nodes:

Directory Filesystem Type Network Speed Filesystem Size User Quota (Soft/Hard) Backup Frequency

/home

NFS

25 gigabit/s

500 GB[1]

None[2]

None[3]

/mfs/io/groups

MooseFS

10 gigabit/s

VARIES

VARIES

Nightly

/scratch

WekaFS

200 gigabit/s

20 TB[4]

None

None

/home

/home on Ganymede2 is not in its final form; it needs to be expanded and have quotas enforced. The CIRC team is presently working towards this goal and will update the documentation here to reflect its final details. However, like Ganymede1, /home is recommended to be used for scripts, runfiles, and smaller output files. Please don’t run jobs from /home as the filesystem and network can be easily saturated, reducing user experience for others. MPI jobs read and write a lot of data, so just a single multi-node MPI job can slow the /home filesystem drastically.

Recall from the chart that /home is NOT BACKED UP. This is the case temporarily and a more robust solution will be implemented at a later date.

Using Ganymede2

Logging in

Ganymede2 is accessed via SSH. Once your account is activated, you can connect to Ganymede at ganymede2.circ.utdallas.edu. For example, in a typical terminal client run the command:

ssh <NetID>@ganymede2.circ.utdallas.edu

More information on setting up SSH access to CIRC machines on your computer can be found here.

Submitting jobs

Information on submitting jobs to Ganymede2 can be found here.


1. This filesystem was created when Ganymede2 was created and has not been scaled to full production size.
2. There are no quotas on /home, however it is asked that all Ganymede2 users keep their usage less than 50 GB. A proper /home filesystem is to be added at a later date with quotas.
3. This is no mistake, /home on Ganymede2 is not backed up. It’s highly recommended to keep important files off of the /home filesystem and on dedicated group storage.
4. The WekaFS was purchased to replace Petastore on Ganymede. Its addition to Ganymede2 is limited until the filesystem can be expanded.