Advance warning of scheduled systems work and downtime

There is a fair amount of internal work going on the Maths server room at the moment but every effort will be made to do this as transparently as possible to users. The only noticeable event might be the absence of silo2 for upgrade in December but this is unlikely to affect many and alternative servers will continue to be available.

November: Stats systems storage & network upgrades

More local storage is being added this month to the two main compute servers in the Statistics section (modal and medial) and a dedicated internal network is being introduced to link the other Stats compute systems to it, which will enable faster access to home directories on both modal and medial from all Stats compute systems. In addition, the existing small Netflow LAN is being extended to reach all Stats Linux systems.

Since these systems are heavily used this work is being done in stages, working around user requirements with the minimum of downtime. medial has already had 10 TB of extra storage added while modal is awaiting a suitable opportunity to add the additional disks.

November: extended internal backup LAN

With the introduction of silo3 and its mirror, and the recent addition of a LTO 6 tape drive to the replicant3 tape backup server, more systems are being connected to the latter for automated tape backups using the Bacula client-server backup system. Up until now, remote client systems have been backed up via their college network connections but this is set to change, with backups being done over an existing small-scale internal network which will be considerably expanded to connect more systems. This will result in faster backups and also free up bandwidth on the college network connections.

The enhanced separate backup network will also make it easier to manage and monitor mirror servers which do not have (nor need) a college network connection of their own. Fitting in with our policy of conserving relatively scarce college network connections in the Maths server room by only connecting systems that really need to be directly accessible from the college network, this also reduces unnecessary traffic on the college network and improves security for sensitive data contained on some systems.

November to January: silo2 and mirror updates

silo2 and its mirror silo2-backup have been in operation since 2012 and their disk pools still use 512 byte logical sectors even though the disks themselves are the newer AF (Advanced Format) type with 4096 byte physical sectors. This is because operating system support for native 4k sector sizes on AF disks was incomplete at the time this server was introduced, so the read/write I/O performance is not as good as it could be. Changing logical sector sizes in existing ZFS pools is not trivial - it involves destroying the old pool and recreating it with a new configuration and then restoring all the data to it from a backup. It's the restore operation that takes a lot of time especially when there is a lot of data involved - it takes 5-6 days to restore 10 TB of data over a standard gigabit network connection.

With the introduction of a separate backup network and revised power distribution arrangements, the opportunity is now being taken to upgrade the operating systems on both silo2 and silo2-backup and at the same time, rebuild the disk pools to use native 4k sectors. silo2-backup has already been upgraded and the pool rebuilt using 4k sectors but transferring around 9.5 TB of user data to it from silo2 will take some time. Once this has been completed, silo2 will be taken out of service and upgraded in the same way, and user data transferred back to it from silo2-backup. The necessary outage for silo2 will be announced well in advance.

There is no other scheduled work planned for the foreseeable future.

Andy Thomas

Research Computing Manacer,
Department of Mathematics

last updated: 3.1.2019