Clustor2: high performance alternative home directories for cluster users
Bigger and faster storage for the compute cluster
- The old computing adage "you can never have too much storage" certainly holds good in Maths; at its introduction in the summer of 2015, clustor was primarily intended to provide a lot of storage for compute cluster use far into the future but at about that time a number of newcomers in the dept were proposing and then starting work on fresh research projects with large datasets. There was nowhere else to store this except on clustor so it wasn't long before the cluster's storage needs were overtaken by these new projects and clustor became rather full.
- Based on clustor, clustor2 was introduced at the beginning of 2018 after the NextGen cluster was launched the previous summer and was designed from the outset to be an integral part of it. Using a separate internal network to link it to the cluster nodes instead of the busy college network, it also has twice the physical usable storage capacity at 60 TB and embodying storage enhancements developed since clustor was built, clustor2 has an actual storage capacity of around 160 TB and is a lot faster in operation than clustor.
Where are the new clustor2 home directories?
- clustor2 home directories are in /home/clustor2/ma/u/username with the existing clustor home directories alongside in /home/clustor/ma/u/username, where the subdirectory 'u' is an alphabetic letter from a to z corresponding to the first letter of the usernames contained within it. Users' logical home directories are in /home/ma/u/username as before with the physical ICNFS home directories at /home/icnfs/ma/u/username
- So if your username is im304, then:
- your logical home directory is: /home/ma/i/im304
- your ICNFS home directory is: /home/icnfs/ma/i/im304
- your clustor home directory is: /home/clustor/ma/i/im304
- your clustor2 home directory is: /home/clustor2/ma/i/im304
- In all other respects, the way clustor2 is organised and used is identical to clustor and the clustor documentation largely applies to clustor2.
- To sum all this up: your logical home directory remains as it was before - for example, /home/ma/i/im304 - but it is linked to one of three physical home directories, either to your existing ICNFS home directory, your clustor home directory or your new clustor2 home directory.
I want to switch my default home directory on the cluster to clustor2 - what do I do?
- Simply email Andy Thomas and ask for this to be done.
- Important: once this has been done, do remember to copy your ~/.ssh folder and its contents from your ICNFS home directory to your clustor2 home directory otherwise you will have to re-create this folder, re-create your SSH keypair and run the update-ssh-known-hosts file again as described here before you can run jobs on the cluster again.
How long will I have to wait for this to be done?
- Good question! It depends on what you are doing on the cluster at the time - the changes need to be made on each node on the cluster and for obvious reasons, it is not a good idea to do this when you have one or more jobs running on the cluster nor when you are logged into the cluster. So if you're not using the cluster and are not logged into any of the nodes (including the test & development node macomp000 and the submission node macomp001), the changes will be made within a few minutes of receiving your email during normal working hours and within a few hours at other times. And if you have any jobs running, you will have to wait until these have completed before switching your home directory.
- Once a request for switching home home directories is received, automated checks will be made to ensure these conditions are met (no jobs running, no logins to the nodes) before doing this so there is no need for you to delay making the request just because you have some jobs running, for example. After the switchover has been done, you will receive an email to confirm the changes have been made.
How can I check whether my physical home directory is on ICNFS, on clustor or on clustor2?
- From any node simply typing the command:
ls -l $HOME | cut -d ' ' -f11
- will show you your physical home directory. For example, if your cluster home directory storage is using your ICNFS home directory, this command will respond with:
/home/icnfs/ma/i/im304
- while a logical home directory physically on clustor2 will appear as:
/home/clustor2/ma/i/im304
Is my clustor2 home directory backed up?
- Yes, but not in the same way as ICNFS. One nice feature of the central ICNFS service is it is backed up to tape nightly with incremental backups going back 4 months, so you can ask for a specific file or folder to be restored from any date over that period. With clustor2, it is impractical to offer the same kind of service because the amount of data is potentially vastly greater and we do not have the huge tape library systems that the central backup service has.
- However, clustor2 is mirrored to another identical backup server early each morning, but not continously, so there is deliberate hysteresis built into the system. This ensures we have a full backup in case of total failure of clustor2; the delay means that if you delete or over-write a file early in the morning and then realise the error later the same day, the file(s) or folder(s) can be replaced from the mirror as long as this is requested before 07:00am the next day.
- If there is demand, we may be able to offer daily tape backups in the future for some users in the same way as we already do for individual users and some research servers but this will not be a standard feature of clustor2 home directories.
Why is my clustor2 home directory empty?
- It is up to you to put data into your clustor2 home directory - we are not replicating your ICNFS home directory to your clustor2 home directory for you. In fact, we can't do this as we have no access to your ICNFS data - only ICT can do this. You can use all the standard Linux methods on macomp001 to do this - cp, tar, rsync, etc - as well as copy data from your ICNFS home directory into your clustor2 home directory directly.
What is my disk quota on the clustor2 home directory?
- There isn't one. We need to get away from old-fashioned restrictive disk quotas that hamper research and Maths' policy of not formally imposing usage quotas of any kind has worked well on the other storage servers for years. Obviously disk usage will be monitored and anyone using more than 2-3 terabytes will be encouraged to store data less essential to ongoing computational work on the other storage servers such as silo1 and silo2. So please store your holiday snaps, videos, MP3s, etc somewhere else.
About clustor2
- clustor2.ma is a Dell PowerEdge R740xd server with dual 12-core Intel Xeon CPUs and contains 14 disks. Currently two mirrored system disks are fitted and another 9 x 8 TB disks plus a 'hot spare' make up a ZFS RAIDz1 pool (equivalent to hardware RAID 5 without its drawbacks, such as the infamous 'write hole') of 72 TB raw capacity. In addition, a 256 GB SSD is used for the ZFS intent log and a second SSD is used as the read cache - these features account for the fast performance of this server compared with clustor. The server runs FreeBSD 11.1 - the backup mirror server, clustor2-backup.ma, is identical and has a separate dedicated point-to-point 10Gbit/s fibre network link used solely for inter-server mirroring.
Any questions?
- As always, I'll be happy to answer any questions you may have.
Andy Thomas
Research Computing Manager,
Department of Mathematics
last updated: 06.12.2023