Current news...

2017: mid-June update

Hardware upgrades for the GPU clusters, calculus replaced

nvidia3 upgraded

nvidia3 has been upgraded this week with an additional 2TB of local disk storage, an operating system update to Ubuntu 16.04 LTS, the latest nVidia CUDA 8.0 development toolkit, cudNN 5.1 and the Tensorflow libraries and is available for use now.

upgrades for nvidia1 and nvidia2

The memory available on the GPU cluster host servers nvidia1 and nvidia2 is no longer sufficient to run the larger jobs that are currently being run on these, causing both to use disk swap space instead which is much slower.

A rolling upgrade of both servers has started, with nvidial initially being upgraded from 24 GB to 48 GB of memory and the two original 470 watt power supplies in the blade enclosure have been replaced with 1100 watt units to meet the increased power demand from the memory upgrades as well as the additional storage disks that were added recently.

nvidia2 will be upgraded in the same way later this month after June 19th.

calculus replaced.

The calculus "instant extra filespace" server has been replaced by a new HP MicroServer gen 8 machine with twice as much installed memory along with the disk set from the old machine, which was suffering intermittent disk controller problems.

Important information

Finally, we have started planning ahead for the eventual shutdown of the ICT data centre in the City & Guilds building over the next 18-24 months and its migration to a new facility in Slough. We have over 30 servers hosted in ICT - 24 of these are Maths compute cluster nodes which will probably move to Huxley 616 later this year along with a non-rackmount storage server while the rest will move to Slough, with hardware upgrades where necessary to support full remote management including "bare metal" installs, (the ability to install an operating system remotely from media in South Kensington without going anywhere near Slough, for example).

Some private rackmount systems currently in Huxley 616 may have to move to Slough although this is not certain - if it's not possible to install an operating system remotely or if the owner is unwilling to pay for upgrades to allow this to be done, or replace the server with one that has full remote media support, then that server will have to stay in the Maths server room. The main issue with accommodating more systems in the Maths server room is it is not possible to install any more college network connections on Huxley level 6 since the racks in the network wiring cupboard opposite the south lifts are now full. So an influx of another 25 systems into Huxley 616 will potentially be a problem.

For this reason we will have to move nearly all Maths compute cluster nodes onto a private dedicated network which means direct access to a particular node from the college network will no longer be possible although it would still be accessible from a designated head node. This is the norm on HPC clusters and should not inconvenience users that much and it does have the real advantage of faster cluster performance since cluster network traffic will not be mixed with non-compute traffic on the general college network.

Older news items:

June 2nd, 2017: Early summer update
April 20th, 2017: Spring update 2
March 22nd, 2017: Early spring update
March 10th, 2017: Winter update 2
February 22nd, 2017: Winter update
November 2nd, 2016: Autumn update
October 21st, 2016: Late summer update 2
October 14th, 2016: Late summer update
February 19th, 2016: Winter update
December 11th, 2015: Autumn update
September 14th, 2015: Late summer update 2
May 2nd, 2015: Spring update 2
April 26th, 2015: Spring update
November 11th, 2014: Autumn update
September 17th, 2014: Summer update 2
July 17th, 2014: Summer update
March 15th, 2014: Spring update
November 2nd, 2013: Summer update
May 24th, 2013: Spring update
January 23rd, 2013: Happy New Year!
November 22nd, 2012: No news is good news...
November 17th, 2011: A revamp for the Maths SSH gateways
September 7th, 2011: Failed systems under repair
August 14th, 2011: Introducing calculus, a new NFS home directory server for research users
July 19th, 2011: a new staging server for the compute cluster
July 19th, 2011: A new Matlab queue and improved queue documentation
June 30th, 2011: Updated laptop backup scripts
June 18th, 2011: More storage for the silo...
June 16th, 2011: Yet more storage for the SCAN...
June 10th, 2011: 3 new nodes added to the Maths compute cluster
May 21st, 2011: Announcing SCAN large storage and subversion (SVN) servers
May 26th, 2011: Reporting missing scratch disk on macomp01
May 21st, 2011: Announcing completion of silo upgrades
May 16th, 2011: Announcing upgrades for silo
April 14th, 2011: Goodbye SCAN 3, hello SCAN 4
March 26th, 2011: quickstart guide to using the Torque/Maui cluster job queueing system
March 9th, 2011: automatic laptop backup/sync service, new collaboration systems launched
May 20th, 2010: Scratch disks are now available on all macomp and mablad compute cluster systems
March 11th, 2010: Introduing job queueing on the Fünf Gruppe compute cluster
October 16th, 2008: Introduing the Fünf Gruppe compute cluster
June 18th, 2008: German City compute farm now expanded to 22 machines
February 7th, 2008: new applications on the Linux apps server, unclutter your desktop
November 13th, 2007: aragon and cathedral now general access computers, networked Linux Matlab installation upgraded to R2007a
September 14th, 2007: Problems with sending outgoing mail for UNIX & Linux users
July 23rd, 2007: SCAN available full-time over the summer vacation, closure of Imperial's Usenet news server
May 15th, 2007: Temporary SCAN suspension, closure of the Maths Physics computer room, new research computing facilities
January 14th, 2005: Exchange mail server upgrade, spam filtering with pine and various other enhancements

Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 16.6.2017