The Maths SCAN
Latest update: SCAN Christmas vacation timetable
- The Christmas 2010 timetable for the SCAN in the Huxley teaching cluster rooms is a little more restricted than last year's as the closure of Huxley 411 means there will be fewer systems available this winter. Here are the details:
|215||continuously available from 11pm Friday, December 17th until 5:50am Monday, January 10th|
|410||continuously available from 11pm Friday, December 17th until 5:50am Monday, January 10th|
What is the Maths SCAN?
- The Maths SuperComputer At Night initiative harnesses the power of many individual PCs to form a supercomputer capable of carrying out very large computational tasks such as Monte Carlo simulation, climate change modelling, etc. If you can imagine a large computer containing not just one or two CPUs (Central Processing Unit, or processors) but 50, 100 or 200 of them plus a huge amount of memory, this is a very good approximation of the Maths supercomputer. Currently, all 36 of the HP dc7900 PCs in Huxley room 215 plus the 43 HP dc7800 PCs in 410 and 411 form part of this cluster outside of normal college hours - this gives us 36 x 64-bit quad-core CPUs and 43 x 64-bit dual-core CPUs with a total of 230 CPU cores and 316 gigabytes of memory. As these computers would otherwise be idle at night and at weekends and during college holidays, this gives us many megaflops of raw processing power for no real cost.
The technology behind the SCAN
- When these computer rooms close at the end of the day, the PCs shut down and then reboot as diskless FreeBSD systems, loading and then booting FreeBSD over the network from a remote boot server. The hard disks in the PCs remain untouched and are not used in any way while the systems are running FreeBSD. There are a lot of advantages to setting up large scale computation facilities in this way; one is that it is very easy to control, maintain and upgrade since there is only one operating system image that is used by all of the clients. So instead of having to make changes to, upgrade or even rebuild each machine individually, this work is carried out on the image that will be served by the network boot server only and the next time the client nodes boot from this, they too will be running the new or changed image. It is of course possible to customise the image and the start-up scripts to some extent so that machines in one group - those in Huxley 215, say - load a different configuration at boot time, for example. And in the current SCAN 3 implementation, much of the booted PCs live filesystem is hosted on a disk on the boot server which makes it easy to make immediate 'hot' changes to the operating system that is running on all of the client PCs, tune the kernel while it is running, add new user accounts, etc - previously, a reboot would have been required to load a new operating system image.
- But the real beauty of the system is the almost infinite scalability and ease with which more nodes can be added to the SCAN; anything from a single computer to many thousands can be added simply by enabling booting from the network and adding them to a particular machine group that will access the SCAN boot server. Currently the system operates with 79 nodes in Maths but as many as 160 nodes have been operational in the past, encompassing teaching clusters in the departments of Physics and Chemistry. Unfortunately, political issues and the fact that the system does not fit easily into the ICT scheme of things have largely limited its use to Mathematics although a few non-Maths users have used it.
- Here is a diagramatic representation of the SCAN - the computational portion of SCAN 3 is entirely 64-bit and runs FreeBSD 7.2 while the potomac3 controller runs FreeBSD 7.1. The 32-bit file server was recently upgraded to FreeBSD 8.0.
How does it work?
- All of these machines have Windows XP Professional installed on their local hard disks and operate as normal Windows PCs during the daytime, as required for departmental teaching purposes. At the end of the day when the room is closed to students, the machines shut down automatically and then boot FreeBSD UNIX from the network boot server, running UNIX entirely in RAM (memory) and leaving the machine's own hard disk untouched.
- Each system is essentially an autonomous node but they are all networked together and can communicate with each other and with the controller. So each system could be thought of as a CPU with its own memory attached and is linked to other CPUs and memories in the SCAN via the network.
- The user's compute job resides in the user's home directory on the nolichucky fileserver and the programs have usually been written in such a way that it knows how to divide up the tasks involved and distribute them to each PC in the SCAN for processing. Output from the computations is written to disk files, etc in the usual way in the user's home directory on nolichucky, not the PC's hard disk. There are various way in which this parallel processing can be implemented - one is to use the MPI (Message Passing Interface) protocol which is fully supported on the SCAN but some users have written their own low-level network stacks which offer higher performance as the code interacts directly with the network interface rather than through a multi-layer API (applications programming interface).
- The computers in the SCAN do not have to be operated as a massively parallel cluster - they can can be used individually too. Some tasks may be difficult to code for parallel computation or in some cases it may simply not be worth the time and effort to make a program parallel-capable but a lot of data needs to be processed as quickly as possible; you can then run multiple instances of that program on some or all of the CPU cores in the SCAN to achieve this.
- Early in the morning of the following day, after the SCAN has worked all night, the cluster shuts down automatically and reboots back into Windows, ready for the room's re-opening to students.
At what times is the SCAN operational?
- As the SCAN is distributed over three different rooms with differing opening times, the number of CPUs available in the SCAN varies according to the day of the week and also, the time of day. From Monday to Thursday the SCAN is operational with 18 CPUs from 6 pm in 411 when the Maths library closes, from 8 pm in 410 with another 25 CPUs and from 11pm in 215 with the full complement of 79 machines; all three SCAN CPU groups then run until 6.50am the following morning. On Friday evenings, the SCAN groups in rooms 410 and 411 start up at the same times as for the other weekdays and continue to run over the weekend until 6.50am the following Monday. However, the PCs in room 215 continue to be available for student use throughout Saturday and Sunday between 6.50 am and 11 pm.
- During the Easter, summer and Christmas vacations, rooms 410 and 215 are closed altogether from the end of term and the clusters there will be running full time as part of the SCAN over this period. We will get an awful lot of computing done! In addition, the 18 PCs in the library computing room (room 411) will join the SCAN whenever the library is closed.
If rooms 215 and 410 are closed, are there any Windows PCs I can use?
- The library computing room (room 411, inside the Maths Library) houses 18 PCs running Windows XP which are available to all users during library opening hours. During the night and at weekends, however, these machines reboot into FreeBSD UNIX and join the SCAN. In addition, room 409 has a number of PC's running Windows XP which are available to postgraduates and year 4 undergraduates of the Mathematics department. Finally, the undergraduate common room 212 is home to 8 new HP PC's running Windows XP. These systems are accessible to all Maths users at any time.
- These computers should satisfy the requirement for undergraduate computing facilities during college vacations but do let me know if these reduced general computing facilities cause you any undue hardship or inconvenience.
How powerful is the SCAN?
Percolation code written by Dr Gunnar Pruessner, a researcher in Math Physics, and run on the Maths SCAN has broken several records that were previously set by a Cray MP3 supercomputer, completing simulations in a shorter time than this million+ dollar machine.
Why FreeBSD and not Linux?
- We are often asked why the SCAN runs FreeBSD UNIX and not the more popular Linux so here's an explanation: when the project was first conceived in 2001, the only Linux distribution that had any support for diskless booting was SuSE Linux and early attempts to realise the SCAN were based on SuSE 7.1 using PCs fitted with 3Com's 3c590 and 3c905 network interface cards (NICs). These had 28-pin DIL chip sockets that allowed third-party boot ROMs or locally-programmed EPROMs to be fitted, making it possible to boot the system from a network boot server. However at about this time, PC technology was moving on and separate PCI and ISA bus NICs in desktop PCs were rapidly giving way to NICs embedded on the motherboard with the boot ROMs being replaced by various implementations of Intel's PXE (Preboot eXEcution) standard.
- Support for these early on-board NICs and the PXE environment in Linux was lagging behind the new technology and we had a lot of problems getting PCs with embedded NICs to boot Linux from the boot server. But on the other hand FreeBSD supported both the on-board NICs and PXE literally 'out of the box'; historically, support for diskless booting has always been good in UNIX as many UNIX operating systems date from the days when hard disks were expensive items and it often made good financial sense to have a single file/boot server with one or more hard disks and then arrange to boot a large number of diskless workstations from this over the network. Linux on the other hand is a relative newcomer and arrived at a time when widespread adoption of IDE interface disks was driving down the cost of large hard disks so Linux has always been very much a disk-based system.
- Since one of the two developers of the SCAN, Gunnar Pruessner, uses FreeBSD as his main desktop operating system and is very familiar with it, the decision was taken to switch to FreeBSD and almost overnight, a working and fully-functional SCAN was born. FreeBSD also has other advantages over Linux - the codebase is more mature, it is demonstrably more stable (over 50% of the web servers in Netcraft's top 20 uptime league tables run FreeBSD) and it is also considerably more secure than Linux. It is sometimes pointed out that the range of commercial software available for FreeBSD is small compared with Linux but most Linux software can be run on FreeBSD systems if the kernel is compiled with Linux ABI (Application Binary Interface) support.
- There are a lot of Windows PCs sitting idle at night and at weekends in not just the Maths department but the college as a whole; the ongoing desktop PC renewal programme is putting increasingly more powerful computers onto people's desktops which are mostly very under-utilised. There is a vast pool of unused compute resources sitting idle, all of which could be put into use with with little or no effort, and with no changes made to the system's local hard disk installation. And above all, in these times of fiscal stringency, all of this costs nothing!
Older SCAN items:
- April 3rd, 2009: the SCAN goes 64-bit!
- July 2nd, 2007: announcing the summer vacation timetable
- May 15th, 2007: Suspension of SCAN in Huxley 410/411 for 6 weeks owing to student projects
- November 24th, 2006: operating system upgraded to FreeBSD 6.2
- March 24th, 2006: operating system upgraded to FreeBSD 6.0
- March 1st, 2004: SCAN news update
- July 21st, 2003: Introducing the SCAN
Faculty of Natural Sciences
last updated: 20.12.2010