Clustor: alternative home directories for cluster users


Why we have introduced alternative home directories

Ever since it was first introduced, the Maths compute cluster has by default used college Linux home directories hosted on the central ICNFS service for user data storage. But this has had its problems; the space available is very limited by today's standards, disk usage quotas are enforced which frequently cause problems for research users, there are sometimes incidents on ICNFS with the storage going off-line and becoming disconnected from Maths systems for extended periods, unexplained read-only filesystem states or input/output errors and, sometimes, user jobs running on the cluster actually cause problems for the ICNFS service owing to high data transfer rates into or out of user home directories.

To get around the storage capacity problem, local scratch disks, the calculus server and large data storage servers (the silos) with no usage quotas were introduced several years ago. But the scratch disks, although fast, are not intended for long-term data storage (although some users who like to live dangerously do use them for this!) and the silo servers are not intended for high speed real-time data storage from running jobs (again, some cluster users do use the silos in this way). And because none of these alternative storage schemes are in the same place as the user's default home directory, users must remember to clearly state the storage locations they want to use when submitting jobs to the Torque/Maui job management system.

Clearly, a better version of ICNFS would be the answer to these problems and we are now introducing alternative home directories for cluster users hosted on the new clustor.ma server. Clustor provides nearly 30 terabytes of storage to start with (nearly forty times the existing 770 GB capacity of ICNFS) with fast read/write performance over the network and with no per-user usage limits imposed. Increasing the storage capacity in the future is easy to achieve and backups are provided by a second identical server mirroring the clustor server.

How does the new scheme work?

When you log into the Maths cluster, your ICNFS home directory is at /home/ma/u/username where 'u' is the first letter of your username and 'username' is your own college username (sometimes also called 'login name'). So if your username happens to be im304, your home directory is /home/ma/i/im304 and this is the directory you will find yourself in immediately after logging in. If you are unsure where your home directory is, the command:

echo $HOME

on any Linux or UNIX-like system will tell you what it is; up until now this home directory has always been hosted on the ICNFS service.

With the new alternative home directory scheme, your home directory is logically still in the same place as before - for example, /home/ma/i/im304 - but you now have a choice of whether the physical backing storage providing this is ICNFS or the new Maths clustor home directory server. To begin with, under the new scheme, everyone's cluster storage is still on ICNFS but you can ask for your default home directory to be switched to clustor instead. Also, if you decide to stay with ICNFS as your default home directory on the cluster, you can still access your alternative clustor home directory at its parallel location and vice-versa - users who switch to clustor storage can still access their ICNFS home directory as well. You can copy or move files, directories/folders between your ICNFS and Clustor home directories whenever you wish.

Clustor should be regarded an optional (but bigger and faster) parallel home directory for Maths compute cluster use - it does not replace your existing ICNFS storage which is still available to you. You will still need to use ICNFS if you want to access your Linux data from Windows or if you have personal web pages hosted on the http://wwwf.imperial.ac.uk webserver farm.

To sum all this up: your logical home directory remains as it was before - for example, /home/ma/i/im304 - but it is linked to one of two physical home directories, either to your existing ICNFS home directory or your new clustor home directory.

Where are the new home directories?

As explained above, logical home directories are in /home/ma/u/username as before but the physical ICNFS home directories are now at /home/icnfs/ma/u/username and the new clustor-hosted home directories are in /home/clustor/ma/u/username.

So if your username is im304, then:

  • your logical home directory is: /home/ma/i/im304
  • your ICNFS home directory is: /home/icnfs/ma/i/im304
  • your clustor home directory is: /home/clustor/ma/i/im304

I want to switch my default home directory to clustor - what do I do?

Simply email Andy Thomas and ask for this to be done.

How long will I have to wait for this to be done?

Good question! It depends on what you are doing on the cluster at the time - the changes need to be made on each node on the cluster and for obvious reasons, it is not a good idea to do this when you have one or more jobs running on the cluster nor when you are logged into one or more nodes. So if you're not using the cluster and are not logged into macomp001, the changes will be made within a few hours. But if are logged into a cluster node, you'll have to log out before the switchover is done otherwise you may lose data and your personal settings for some applications, such as R and Matlab, etc. And if you have any jobs running, you will have to wait until these have completed before switching your home directory.

Once a request for switching home home directories is received, automated checks will be made to ensure these conditions are met (no jobs running, no logins to the nodes) before doing this so there is no need for you to delay making the request just because you have some jobs running, for example. After the switchover has been done, you will receive an email to confirm the changes have been made.

How can I check whether my physical home directory is on ICNFS or clustor?

From any node simply typing the command:

ls -l $HOME | cut -d ' ' -f11

will show you your physical home directory. For example, if your cluster home directory storage is using your ICNFS home directory, this command will respond with:

/home/icnfs/ma/i/im304

while a logical home directory physically on clustor will show as:

/home/clustor/ma/i/im304

Is my clustor home directory backed up?

Yes, but not in the same way as ICNFS. One nice feature of the central ICNFS service is it is backed up to tape nightly with incremental backups going back 4 months, so you can ask for a specific file or folder to be restored from any date over that period. With clustor, it is impractical to offer the same kind of service because the amount of data is potentially vastly greater and we do not have the huge tape library systems that the central backup service has.

However, clustor is mirrored to another identical backup server early each morning, but not continously, so there is deliberate hysteresis built into the system. This ensures we have a full backup in case of total failure of clustor; the delay means that if you delete or over-write a file early in the morning and then realise the error later the same day, the file(s) or folder(s) can be replaced from the mirror as long as this is requested before 07:00am the next day.

If there is demand, we may be able to offer daily tape backups in the future for some users in the same way as we already do for individual users and some research servers but this will not be a standard feature of clustor home directories.

Why is my clustor home directory empty?

It is up to you to put data into your clustor home directory - we are not replicating your ICNFS home directory to your clustor home directory for you. In fact, we can't do this as we have no access to your ICNFS data - only ICT can do this. You can use all the standard Linux methods - cp, tar, rsync, etc - on macomp001 to do this as well as copy data from your ICNFS home directory into your clustor home directory directly.

What is my disk quota on the clustor home directory?

There isn't one. We need to get away from old-fashioned restrictive disk quotas that hamper research and Maths' policy of not formally imposing usage quotas of any kind has worked well on the other storage servers for years. Obviously disk usage will be monitored and anyone using more than 2-3 terabytes will be encouraged to store data less essential to ongoing computational work on the other storage servers such as silo1 and silo2. So please store your holiday snaps, videos, MP3s, etc somewhere else.

About clustor

clustor.ma is a Dell PowerEdge R515 server with dual 12-core AMD Opteron CPUs and contains up to 14 disks. Currently two mirrored system disks are fitted and another 9 x 4 TB disks plus an additional 'hot spare' disk make up a ZFS RAIDz1 pool (equivalent to hardware RAID 5 without its drawbacks, such as the infamous 'write hole') of 28 TB raw capacity. The server runs FreeBSD 10.1 - the backup mirror server, clustor-backup.ma, is identical and has a separate dedicated point-to-point network link used solely for inter-server mirroring.

Update: from the outset clustor2, introduced at the beginning of 2018, uses a much improved disk pooling scheme with much faster read/write performance than the one in use on clustor so at some stage in the future, these enhancements will be added to clustor and clustor-backup as well.

Any questions?

As always, I'll be happy to answer any questions you may have.



Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 09.06.2018