Stats section computing facilities


Following completion of extensive storage and network upgrades in December, four different types of compute facilities are available to users in the Stats section:

Because many of the various research projects undertaken in this section overlap to some extent and/or use overlapping/complementary comouting technologies, these compute facilities are interlinked to varying degrees so that data stored in one research area can be made available to another. Computing demands in Stats can be very reactive with a need to set up new facilities at short notice, so a flexible environment that suppprts this is required.

All of the compute systems, Hadoop cluster and compute servers run the server edition of Ubuntu Linux and have a wide range of mathematical and statistical software packages installed in addition to the standard Linux applications. On the other hand, the three storage servers run FreeBSD UNIX and use ZFS as the disk storage pool technology.

General purpose compute systems

Five high performance general purpose compute systems are available to all users in the section:

fallas          12 CPU cores, 24 GB memory
festival         8 CPU cores, 32 GB memory
fiesta         12 CPU cores, 16 GB memory
fira               8 CPU cores, 32 GB memory
hustler       12 CPU cores, 24 GB memory

Users in the Stats research section can log into these remotely using ssh and run jobs such as R, Matlab, Maple and many other packages as described for the Maths compute cluster but do note that unlike the main compute cluster, the Stats compute systems do not use job scheduling or any kind of user/task control. So you are free to run whatever you wish whenever you wish.

At present, home directories on these systems are by default on the college's ICNFS service but for reasons of performance and available storage space, the department's computing is gradually moving away from this to our own in-house fileservers and all of the Maths general access servers - silos 1-4, calculus, clustor and clustor2 - are mounted on these compute systems and can be used instead of ICNFS. In addition, data from other parts of the Stats compute facilities is available to you on these systems as follows:

if you have an account on the Stat's modal and medial compute servers, your local home directory on these systems will be available on fallas, festival, etc as follows:

  • for modal: /home/modal/username

  • for medial: /home/medial/username

if you have an account on the Stat's Hadoop cluster, your Hadoop HDFS storage can accessed at:

  • /home/hadoop

if you are working with Netflow data and are a member of the netflow group, you'll find the netflow data archive under /home/netflow_2013, /home/netflow_2014, etc where each folder contains the netflow data available for that year.

Bazooka Hadoop cluster

A 15 node Hadoop cluster is available to Stats users - this is often called the Bazooka cluster since the head node which users log into to use the cluster is known as bazooka.ma (there are two other small Hadoop clusters which are used for test and development purposes but these aren't generally available). For those interested in numbers(!), this cluster provides 448 cores of processor power, 524.5 GB of memory and 157.9 terabytes of storage.

If you want to use the Bazooka Hadoop cluster, just ask for an account; this will include both a local conventional home directory on bazooka as well as a Hadoop home directory whose storage is distributed throughout the entire cluster using the HDFS filesystem. Your Hadoop HDFS directory is available on all the general purpose compute systems as well as on the modal and medial compute servers (if you have an account on these) - you'll find this under /home/hadoop on all Stats systems. On request, your Hadoop home directory can also be mounted on your own desktop system(s) but for security reasons, these need to be named systems with wired Ethernet network connections and static IP addresses registered in the college's HDB (hosts database).

Compute servers for cyber security research

Known as modal and medial, two compute servers are available each having 64 CPU cores (four 16-core AMD Opteron CPUs), 512 GB of memory and plenty of local disk storage (19 TB on modal, 10 TB on medial) with home directories both local and remote. Accounts are set up on these servers upon request.

Storage servers

Four dedicated fileservers are installed in the Stats section - two have over 10 TB capacity and are named fusion and enkidu; fusion can be used by any Stats user and accounts are set up on request while enkidu is reserved for cyber security use. The other two servers, flowdata3 and an identical mirror server (flowdata3-backup), provide storage for Netflow data, with each storing 60 TB of data.

fusion is attached to all of the other Stats compute systems and is accesible under /home/fusion while flowdata3 is similarly connected and contains several distinct Netflow archives named netflow_2013, netflow_2016, etc and these can be reached through /home/netflow_2013, /home/netflow_2016 and so on on connected compute systems.

Note that not all datasources are accessible to all users - this is for security reasons with access being configured at both individual user and group levels on a "need to have" basis.

Internal networks

Because of the size of the datasets now being worked with in the Stats section, the bandwidth - that is, the speed of the interconnections between the Stats systems - is an important issue which has been taken care of by three dedicated networks, one each for modal/medial storage, Hadoop cluster data and Netflow data. With the exception of hustler, all of these systems are accommodated in the Maths server room and in addition to a normal college gigabit (1000 Mbits/second) network connection, each has separate connections each of these three additional gigabit networks to ensure quick data transfer between systems. (hustler is in a staff office on another floor where it is not possible to connect systems to server room networks).



Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 8.12.2018