Maths GPU clusters and servers

Currently there are a number of GPU facilities available to Maths users - some are used by individual research groups and will not be detailed here but there are 4 generally accessible facilities consisting of two separate clusters plus two stand-alone GPU servers. The clusters each have their own host servers, and are server blades installed into the same chassis, which share the same PCI-express expansion chassis that physically accommodates the GPU cards. Both clusters are identical except for the GPU cards installed:

Programs you can run on the clusters may either be pre-compiled binaries that have been built and linked on another compatible GPU system or ones you have written yourself (or using source code given to you by others) as a CUDA source file and compiled using the nvcc compiler. By convention, CUDA source files have the suffix .cu but may contain a mix of C, C++ and CUDA statements; nvcc uses the system's gcc compiler to generate non-GPU object code when necessary, switching automatically to the nVidia PTX compiler for GPU object code.

Getting started

Access to either of the GPU clusters or the stand-alone servers is remotely via ssh and to begin with, you need an account on one or both of the host servers - simply email Andy Thomas requesting an account. Once this is set up, the account details will be mailed to you - the password is a random password and you are strongly encouraged to change it when you log in for the first time, using the 'passwd' utility and following the prompts.

Before you start writing and compiling your own CUDA programs, you might want to have a look at some examples and you'll find a comprehensive selection of ready-to-compile programs in /usr/local/cuda/samples. A script called is provided for you on nvidia3 and nvidia4 ( on nvidia1 and nvidia2) to make a writable copy of these read-only examples in your own home directory so that you can compile and run your own versions - here's an example of its use: ~/my_samples

will copy the entire set of examples to a directory called my_samples/NVIDIA_CUDA-11.0_Samples in your home directory. Once you have done this, you can explore the examples and if you want to build and run the binary, just change into the directory containing your chosen example and type 'make'. For example, deviceQuery is a useful utility that displays the characteristics of each GPU card attached to the server so to compile and run your own copy of this, do the following:

cd ~/my_samples/NVIDIA_CUDA-11.0_Samples/1_Utilities/deviceQuery

The utility should report it has found 3 GPUs for nvidia1 (two in the case of nvidia2 and nvidia3, eight for nvidia4) and provide a detailed listing of the features for each of them.

nvcc does have a man page on the server but it's not very useful since it just lists the main nVidia CUDA utilities with very little information on their usage. You'll find a selection of nVidia documentation in PDF format right here on this server and you can also access nVidia's own online documentation for full information on the CUDA Toolkit.

Checking the status of the GPUs

If you want to find out what all the GPU cards are doing, use the nvidia-smi utility. Typing 'nvidia-smi' with no parameters produces a summary of their status as shown below:

screenshot of output from nvidi-smi command

which shows both GPUs in nvidia2 fully loaded although only using about 20% of the total available memory; the PIDs and names of the processes running on the host server are also listed and normal Linux utilities such as 'ps ax' can be used to find further information on these.

Typing 'nvidia-smi -q' produces a very detailed status report for all GPUs in the system but this can be limited to a given GPU of interest with the -i N option, where N is the GPU identifier (0,1 or 2 for nvidia1 and 0 or 1 for nvidia2 and so on). For example, the command

nvidia-smi -q -i 1

will show the full information for GPU 1 only. Unlike most other nVidia CUDA programs, nvidia-smi has extensive man page documentation although many of the available options are reserved for the root user since they affect the operation of the GPU card.

Are disk quotas imposed on the GPU cluster servers?

No but as with all Maths systems disk usage is continuously monitored and those who have used a large proportion of the available home directory storage will be asked to move data to one of the silo storage servers, to clustor or clustor2 or delete unwanted data, etc.

Is user data on the cluster servers backed up?

Yes, all four servers are mirrored daily to our onsite backup servers which in turn are mirrored to the Maths offsite servers in Milton Keynes and Slough.

What about job scheduling and fair usage controls?

Job queueing and resource management is not being used on the GPU clusters or the stand-alone servers at present because, unlike the Maths compute cluster in the past, fair usage and contention for resources has not been a problem with the GPU facilities. Also, it is very difficult to implement traditional HPC-style cluster job management on GPU cards because there is no low-level interface to core and memory resources on any given GPU card, although it is possible to control use of entire GPU cards. But with the present small-scale clusters used by a small group of regular users, it currently is not worth implementing any form of job control.

About the GPU clusters

The host servers nvidia1 and nvidia2 are blade servers fitted into a Dell C6100 chassis, with each server separately connected via iPASS links to a Dell C410x PCI-express expansion chassis which is capable of housing up to 16 GPU cards. The chassis is configured so that 8 GPU card bays connect to one server and the other 8 bays to the other server although not all of the bays are populated with GPU cards. The servers each have two 2.67 Ghz quad-core Xeon CPUs and 72 GB of main memory.

nvidia3 is a SuperMicro GR1027GR-72R2 GPU server that can accomodate up to 3 double-width GPU cards although only two are fitted at present. Two 2.5 Ghz quad-core CPUs are fitted and 64 GB of memory is available.

GPUs fitted into nvidia4

nvidia4 is a large Tyan FT77DB7109 server fitted with eight nVidia GeForce RTX 2080Ti GPUS (pictured left) two 16-core 2.8 GHz Xeon CPUs, 1.5 TB of memory and 14 hard disks, 2 of which are fast SAS disks arranged as a mirrored pair for the system while the other 12 form a XFS disk pool, with one disk being reserved as a 'hot spare' and having a total usable capacity of 22 TB.

Andy Thomas

Research Computing Manager
Department of Mathematics

last updated: 9.08.2020