Latest news for the Maths compute cluster

New matlab2018 queue

Most of the cluster has been upgraded to Matlab R2018b with just 6 nodes still running the old R2016a version owning to existing long-running jobs. If you want to run a Matlab compute job that needs to be run under Matlab R2018b, for example, because you want to use the new features in R2018b, please submit your job to the new matlab2018 queue. This guarantees your Matlab job will execute on a node that has been upgraded to R2018b while submitting your job to any other queue such as the default standard queue will run the job under whatever version of Matlab is installed on the node the cluster choses to run your job on.

To use the matlab2018 queue, in your qsub submission script just replace any existing PBS queue directive that starts with:

#PBS -q


#PBS -q matlab2018

If you don't have a #PBS -q directive in your submisssion script (perhaps because you are using the default standard queue), simply add this line to your script.

Ongoing Matlab upgrade

The upgrade of Matlab from version R2016a to the latest R2018b commenced in December but on a running cluster with many Matlab jobs already running, this has to be done piecemeal as and when nodes are free of Matlab jobs. So there is currently no guarantee which version your Matlab job will run under; to date 28 of the 36 nodes have been updated since the other nodes are already running jobs under R2016a. The last of these long-running jobs is due to complete on April 3rd and at this time, the upgrade to R2018b will be completed.

Apologies if this long drawn-out upgrade has inconvenienced you but unlike other HPC clusters, we have a policy of allowing users to run jobs for weeks or months instead of limiting them to just 3 days. Unfortunately, this does make cluster management much more difficult since "in-flight incidents" such as disk and memory failures, corrupted filesystems, etc have to be contained and workarounds implemented if that node is running a user job, with the ailing node being nursed along until all remaining jobs have finished and it can finally be taken out of service & repaired. (Two nodes still running jobs started last year are technically faulty - one with a failed disk and the other with a corrupted root filesystem - but they are still capable of running user jobs until they finish).

Using the Linux du command on the cluster

Those of you who are using clustor2 for file storage may be mystified by the large discrepancy between file and folder sizes reported by the 'ls -l' and 'du' commands. This is because clustor2 uses file compression internally to improve read/write performance so that file and folder sizes as seen by the 'du' utility are the compressed sizes as stored in clustor2's ZFS disk pool, not the sizes seen by the operating system or yourself!

To support ZFS compression-based storage systems such as that on clustor2, the latest version of du available for Ubuntu Linux now has the '--apparent-size' option which will report the actual file size, not the compressed size as seen on the clustor2 server. You can use this new option in conjunction with the existing du options such as 'h', 'm', 'c', etc. Here is an example, using a file that is 3.2 GB in size:

ls -l reports the file size is 3.2 GB as expected:

andy@macomp001:~$ ls -l 3_2GB_test_file.tar 
-rw-r--r-- 1 andy root 3343513600 Feb  2  2018 3_2GB_test_file.tar

but du shows it as less than half this size:

andy@macomp001:~$ du 3_2GB_test_file.tar 
1434360 3_2GB_test_file.tar

using the '--apparent-size' option to du now reports the size you would expect to see:

andy@macomp001:~$ du --apparent-size 3_2GB_test_file.tar 
3265150 3_2GB_test_file.tar

Using du to find sizes of files or folders on other servers attached to the compute cluster, for example silo2 or clustor, will show very similar sizes with or without the '--apparent-size' option since they do not use compression in their underlying storage systems.

major R upgrade for Maths compute cluster completed

The core R installation on the cluster has been upgraded from version 3.4.3 to 3.4.4 which is a very minor upgrade but at the same time, the large additional R package collection - mostly from the CRAN repository - has also been rebuilt from current sources, many for the first time ever since the cluster was introduced.

Any questions?

As always, I'll be happy to answer any questions you may have.

Andy Thomas

Research Computing Manager,
Department of Mathematics

last updated: 9.3.2019