Current status of Maths computing facilities...

November 24th: nvidia1 out of service, all other systems and services operational

October 23rd: NextGen cluster upgrade

The major upgrade of the NextGen cluster is nearly complete with all nodes with hardware faults now fixed (with one exception) and back in operation and all but one of the compute nodes have been fully updated to Ubuntu 20.04 LTS along with the latest R packages, etc. One node is still running existing user jobs that have over 2 months to run before they finish but it has been off-lined and will not run any fresh jobs - this is so that it can be taken out of service when these jobs complete and then be updated as well. Another 8 nodes have been replaced with new servers, adding another 256 processors to the cluster, boosting its capacity by over 50%.

July 20thst: known issues

Current known problems are listed below:

Although nvidia1 is now back up again, it can no longer communicate with the three GPUs it hosts owing to some low-level failure between the server and the GPU card cage. This is likely to take some time to resolve but is not a high priority failure since owing to the 2nd generation GPU cards it is fitted with, it is now seldom used so will be investigated later.

March 20th, 2020: legacy known issues

As always, hardware faults can occur when systems that have been powered on for years are first powered off and then back on again a short while later and the following systems were casualties of the scheduled power-downs on March 3rd last year:

  • fira: this elderly HP workstation in the Stats Linux cluster is awaiting two replacement disks but since the new madul, midal and model large compute servers which were introduced in February and March effectively supercede it, repair of this seldom-used system is not a high priority.

Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 24.11.2021