Current status of Maths computing facilities...

October 15th: nvidia1 out of service, all other systems and services operational

There are currently no known problems with Maths systems (other than those reported below) or services affecting users although a couple of servers need RAID backup batteries replacing which iare still on order although considerably delayed due to the current difficult supply conditions.

October 15th: NextGen cluster upgrade

The major upgrade of the NextGen cluster is nearly complete with all nodes with hardware faults now fixed (with one exception) and back in operation and all but 2 of the compute nodes have been fully updated to Ubuntu 20.04 LTS along with the latest R packages, etc. Of the remaining nodes, two are still running existing user jobs started long ago but have been off-lined and will not run any fresh jobs - this is so that they can be taken out of service as and when these jobs complete in the next few weeks or so and then be updated as well. Another 8 nodes have been replaced with new servers, adding another 256 processors to the cluster, boosting its capacity by over 50%.

July 20thst: known issues

Current known problems are listed below:

Although nvidia1 is now back up again, it can no longer communicate with the three GPUs it hosts owing to some low-level failure between the server and the GPU card cage. This is likely to take some time to resolve but is not a high priority failure since owing to the 2nd generation GPU cards it is fitted with, it is now seldom used so will be investigated later.

March 20th, 2020: legacy known issues

As always, hardware faults can occur when systems that have been powered on for years are first powered off and then back on again a short while later and the following systems were casualties of the scheduled power-downs on March 3rd last year:

  • fira: this elderly HP workstation in the Stats Linux cluster is awaiting two replacement disks but since the new madul, midal and model large compute servers which were introduced in February and March effectively supercede it, repair of this seldom-used system is not a high priority.

Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 15.10.2021