The Maths SCAN
LATEST NEWS: SCAN operating full-time on Saturday, May 19th in Huxley 410
SCAN will be operating continuously for 36 hours in Huxley 410 from 8pm Friday, May 18th until 8am on Sunday, May 20th 2018. Access to the room will not be possible during this time but the computers will reboot back to Windows at 8am Sunday and will once again be available for normal use. Alternative PC computing facilities will continue to be available in the Maths Learning Centre nearby, room 414.
What is the Maths SCAN?
The new servers introduced with SCAN 4
- The Maths SuperComputer At Night initiative harnesses the power of many individual PCs to form a supercomputer capable of carrying out very large computational tasks such as Monte Carlo simulation, climate change modelling, etc. If you can imagine a large computer containing not just one or two CPUs (Central Processing Unit, or processors) but 50, 100 or 200 of them plus a huge amount of memory, this is a very good approximation of the Maths supercomputer. Currently, all 25 PCs in Huxley 410 and the 40 PCs in Huxley 215 form this cluster outside of normal college hours - this gives us 65 x 64-bit quad-core CPUs, 260 CPU cores and 520 gigabytes of memory. As these computers would otherwise be idle at night and at weekends and during college holidays, this gives us many megaflops of raw processing power for no real cost.
The technology behind the SCAN
- When the Huxley 215 and 410 computer rooms close at the end of the day, the PCs shut down and then reboot as diskless FreeBSD systems, loading and then booting FreeBSD over the network from a remote boot server. The hard disks in the PCs remain untouched and are not used in any way while the systems are running FreeBSD. There are a lot of advantages to setting up large scale computation facilities in this way; one is that it is very easy to control, maintain and upgrade since there is only one operating system image that is used by all of the clients. So instead of having to make changes to, upgrade or even rebuild each machine individually, this work is carried out on the image that will be served by the network boot server only and the next time the client nodes boot from this, they too will be running the new or changed image. It is of course possible to customise the image and the start-up scripts to some extent so that machines in one group - those in Huxley 215, say - load a different configuration at boot time, for example. And in the current SCAN 5 implementation, much of the booted PCs live filesystem is hosted on a disk on the boot server which makes it easy to make immediate 'hot' changes to the operating system that is running on all of the client PCs, tune the kernel while it is running, add new user accounts, etc - previously, a reboot would have been required to load a new operating system image.
- But the real beauty of the system is the almost infinite scalability and ease with which more nodes can be added to the SCAN; anything from a single computer to many thousands can be added simply by enabling booting from the network and adding them to a particular machine group that will access the SCAN boot server. Currently the system operates with 25 nodes in Maths but as many as 160 nodes have been operational in the past, encompassing teaching clusters in the departments of Physics and Chemistry for an experimental period. Unfortunately, the fact that the system does not fit easily into the college scheme of things has largely limited its use to Mathematics although a few non-Maths users have used it.
- Here is a diagramatic representation of the SCAN (please note: this diagram is out of date as it depicts SCAN 3, not the current SCAN 5).
How does it work?
- All of these machines have Windows XP Professional installed on their local hard disks and operate as normal Windows PCs during the daytime, as required for departmental teaching purposes. At the end of the day when the room is closed to students, the machines shut down automatically and then boot FreeBSD UNIX from the network boot server, running UNIX entirely in RAM (memory) and leaving the machine's own hard disk untouched.
- Each system is essentially an autonomous node but they are all networked together and can communicate with each other and with the controller. So each system could be thought of as a CPU with its own memory attached and is linked to other CPUs and memories in the SCAN via the network.
- The user's compute job resides in the user's home directory on the nolichucky fileserver and the programs have usually been written in such a way that it knows how to divide up the tasks involved and distribute them to each PC in the SCAN for processing. Output from the computations is written to disk files, etc in the usual way in the user's home directory on nolichucky, not the PC's hard disk. There are various way in which this parallel processing can be implemented - one is to use the MPI (Message Passing Interface) protocol which is fully supported on the SCAN but some users have written their own low-level network stacks which offer higher performance as the code interacts directly with the network interface rather than through a multi-layer API (applications programming interface).
- The computers in the SCAN do not have to be operated as a massively parallel cluster - they can can be used individually too. Some tasks may be difficult to code for parallel computation or in some cases it may simply not be worth the time and effort to make a program parallel-capable but a lot of data needs to be processed as quickly as possible; you can then run multiple instances of that program on some or all of the CPU cores in the SCAN to achieve this.
- Early in the morning of the following day, after the SCAN has worked all night, the cluster shuts down automatically and reboots back into Windows, ready for the room's re-opening to students.
- If you want to learn more about the technical details of the SCAN, you may find this 2 part article published in the UK UNIX Users Group newsletter of some interest: part 1 and part 2.
At what times is the SCAN operational?
- The SCAN is operational in Huxley 410 every night from 8 pm onwards until 5.50am the following morning except for Wednesday nights - on these nights, the systems shut down at midnight Wednesday instead of 05:50 on Thursday morning. In Huxley 215, it is operational every night from 11 pm until 05:50am the following morning.
- During the Easter, summer and Christmas vacations, room 410 is closed altogether from the end of term and the cluster will be running full time as part of the SCAN over this period. We will get an awful lot of computing done!
If rooms 410 is closed, are there any Windows PCs I can use?
- The Maths Learning Centre (MLC, room 414) houses 66 PCs running Windows 7 which are available to all users during MLC opening hours. In addition, room 409 has a number of PC's running Windows 7 which are available to postgraduates and year 4 undergraduates of the Mathematics department. Finally, the undergraduate common room 212 is home to 8 new HP PC's running Windows 7. These systems are accessible to all Maths users at any time.
- These computers should satisfy the requirement for undergraduate computing facilities during college vacations but do let me know if these reduced general computing facilities cause you any undue hardship or inconvenience.
How powerful is the SCAN?
Percolation code written by Dr Gunnar Pruessner, a researcher in Math Physics, and run on the Maths SCAN has broken several records that were previously set by a Cray MP3 supercomputer, completing simulations in a shorter time than this million+ dollar machine.
Why FreeBSD and not Linux?
- We are often asked why the SCAN runs FreeBSD UNIX and not the more popular Linux so here's an explanation: when the project was first conceived in 2001, the only Linux distribution that had any support for diskless booting was SuSE Linux and early attempts to realise the SCAN were based on SuSE 7.1 using PCs fitted with 3Com's 3c590 and 3c905 network interface cards (NICs). These had 28-pin DIL chip sockets that allowed third-party boot ROMs or locally-programmed EPROMs to be fitted, making it possible to boot the system from a network boot server. However at about this time, PC technology was moving on and separate PCI and ISA bus NICs in desktop PCs were rapidly giving way to NICs embedded on the motherboard with the boot ROMs being replaced by various implementations of Intel's PXE (Preboot eXEcution) standard.
- Support for these early on-board NICs and the PXE environment in Linux was lagging behind the new technology and we had a lot of problems getting PCs with embedded NICs to boot Linux from the boot server. But on the other hand FreeBSD supported both the on-board NICs and PXE literally 'out of the box'; historically, support for diskless booting has always been good in UNIX as many UNIX operating systems date from the days when hard disks were expensive items and it often made good financial sense to have a single file/boot server with one or more hard disks and then arrange to boot a large number of diskless workstations from this over the network. Linux on the other hand is a relative newcomer and arrived at a time when widespread adoption of IDE interface disks was driving down the cost of large hard disks so Linux has always been very much a disk-based system.
- Since one of the two developers of the SCAN, Gunnar Pruessner, uses FreeBSD as his main desktop operating system and is very familiar with it, the decision was taken to switch to FreeBSD and almost overnight, a working and fully-functional SCAN was born. FreeBSD also has other advantages over Linux - the codebase is more mature, it is demonstrably more stable (over 50% of the web servers in Netcraft's top 20 uptime league tables run FreeBSD) and it is also considerably more secure than Linux. It is sometimes pointed out that the range of commercial software available for FreeBSD is small compared with Linux but most Linux software can be run on FreeBSD systems if the kernel is compiled with Linux ABI (Application Binary Interface) support.
The SCAN 3 servers, potomac3 and nolichucky
- Introduced in October 2016, SCAN is now in its fifth generation and is known as SCAN 5; the previous SCAN 4 used the same HP DL380 servers and Dell PV220 disk array but ran FreeBSD 8.2. Even older was SCAN 3, based on potomac3, a 64-bit AMD server, and nolichucky, a 32-bit Pentium 4 tower server crammed full of SCSI disks. (potomac3 lives on as the desktop PC in the Maths server room but now runs Linux).
- There are a lot of Windows PCs sitting idle at night and at weekends in not just the Maths department but the college as a whole; the ongoing desktop PC renewal programme is putting increasingly more powerful computers onto people's desktops which are mostly very under-utilised. There is a vast pool of unused compute resources sitting idle, all of which could be put into use with with little or no effort, and with no changes made to the system's local hard disk installation. And above all, in these times of fiscal stringency, all of this costs nothing!
Older SCAN items:
- September 10th, 2011: Announcing the summer 2011 timetable
- June 18th, 2011: Additional directly-attched disk array added to the SCAN
- May 28th, 2011: New data storage facility added to the SCAN
- April 29th, 2011: Announcing SCAN 4 and the Easter 2011 timetable
- December 20th, 2010: Christmas 2010 timetable
- April 3rd, 2009: the SCAN goes 64-bit!
- July 2nd, 2007: announcing the summer vacation timetable
- May 15th, 2007: Suspension of SCAN in Huxley 410/411 for 6 weeks owing to student projects
- November 24th, 2006: operating system upgraded to FreeBSD 6.2
- March 24th, 2006: operating system upgraded to FreeBSD 6.0
- March 1st, 2004: SCAN news update
- July 21st, 2003: Introducing the SCAN
Research Computing Manager
Department of Mathematics
last updated: 18.05.2018