Current news...

2016: Late summer update

Quick summary: more power supplies and much better cooling in server room, resumption of full services

Unfortunately there's nothing very exciting to report this time - summer has literally flown by leaving behind a feeling that nothing substantial has been accomplished with advancing research IT facilities. Cooling problems in the Maths server room dominated much of spring and summer this year, with a lot of time spent continually juggling servers, services and power consumption to keep equipment temperatures at a safe level (we gave up trying to keep the room itself at sensible temperatures for mere humans). At times, this meant curtailing - or even completely halting - availability of some systems and services although with half of the compute cluster being accommodated in the ICT data centre, we were able to minimise the impact of these shutdowns on the compute cluster, which has remained at at least 50% utilisation throughout.

In September at long last, more power supplies were installed for future use along with two additional cooling systems; the latter are highly effective, with ducting designed to directly suck in hot air from behind the racks, chill it and dump a large mass of cold air into the centre of the room from which servers will draw in cold air. We can now devote a lot more time to IT instead of watching thermometers and responding to overheat alerts!

Currently in October, work is under way on the following initiatives:

  • adding 2 TB of fast local storage to each GPU cluster

  • moving the job accounting/tracking databases used by the compute cluster website to a separate and much faster database server, which will speed up the website and possibly allow us to reintroduce some features we once had back in the early days, when there were only a few thousand jobs in the system

  • building more resilience into the server room infrastructure, by upgrading and adding mirrored/failover systems and better facilities for remote management

  • migrating the Stats Hadoop cluster internal network from 1 gigabit to 10 gigabit optical fibre

  • upgrading the clustor to cluster-backup link to 10 gigabit to speed up server mirroring

