Current status of Maths computing facilities...

June 14th: all systems and services operational

An ongoing problem all week with the nvidia4 and forrest GPU servers crashing under load appears to be due to them being overloaded thirough user jobs.

The StatML GPU server forrest has ran out of memory several times but the planned introduction of a Torque/Maui job management system on a dedicated server to control resources on forrest is expected resolve these issues.

May 21st: known issues

Current known problems are listed below:

  • Keaveny cluster: lack of on-server disk space on gehrig system

March 20th, 2020: legacy known issues

As always, hardware faults can occur when systems that have been powered on for years are first powered off and then back on again a short while later and the following systems were casualties of the scheduled power-downs on March 3rd last year:

  • fira: this elderly HP workstation in the Stats Linux cluster is awaiting two replacement disks but since the new madul, midal and model large compute servers which were introduced in February and March effectively supercede it, repair of this seldom-used system is not a high priority.

Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 14.6.2021