2017: Early summer update

Hardware upgrades for the GPU cluster, replacement of the calculus server

upgrades for nvidia1 and nvidia2

The memory available on the GPU cluster host servers nvidia1 and nvidia2 is no longer sufficient to run the larger jobs that are currently being run on these, causing both to use disk swap space instead which is much slower.

A rolling upgrade of both servers has started, with nvidial initially being upgraded from 24 GB to 48 GB of memory and the two original 470 watt power supplies in the blade enclosure have been replaced with 1100 watt units to meet the increased power demand from the memory upgrades as well as the additional storage disks that were added recently.

nvidia2 will be upgraded in the same way later this month.

calculus replacement.

The "instant extra filespace" facility is provided by the calculus server which has started to suffer an intermittent hardware problem with its disk controller. This will be replaced by a new, higher spec server in the week beginning June 4th.

Important information

Finally, we have started planning ahead for the eventual shutdown of the ICT data centre in the City & Guilds building over the next 18-24 months and its migration to a new facility in Slough. We have over 30 servers hosted in ICT - 24 of these are Maths compute cluster nodes which will probably move to Huxley 616 later this year along with a non-rackmount storage server while the rest will move to Slough, with hardware upgrades where necessary to support full remote management including "bare metal" installs, (the ability to install an operating system remotely from media in South Kensington without going anywhere near Slough, for example).

Some private rackmount systems currently in Huxley 616 may have to move to Slough although this is not certain - if it's not possible to install an operating system remotely or if the owner is unwilling to pay for upgrades to allow this to be done, or replace the server with one that has full remote media support, then that server will have to stay in the Maths server room. The main issue with accommodating more systems in the Maths server room is it is not possible to install any more college network connections on Huxley level 6 since the racks in the network wiring cupboard opposite the south lifts are now full. So an influx of another 25 systems into Huxley 616 will potentially be a problem.

For this reason we will have to move nearly all Maths compute cluster nodes onto a private dedicated network which means direct access to a particular node from the college network will no longer be possible although it would still be accessible from a designated head node. This is the norm on HPC clusters and should not inconvenience users that much and it does have the real advantage of faster cluster performance since cluster network traffic will not be mixed with non-compute traffic on the general college network.

