Maths Compute Cluster job queuing system: a quickstart guide


Getting started as quickly as possible...

This is a brief guide for new users who want to start using the job queueing system as soon as possible - it assumes you want to run your job(s) on the standard Red Hat Linux default queue and that you are already able to log into the cluster, get your program code into your ICNFS home directory and are comfortable with typing Linux commands in a standard bash shell as well as using a basic text editor such as nano. For information on accessing and using the cluster, please read using the Maths Compute Cluster.

If any of these simplifications do not apply to you, then you should refer to the main job queueing documentation for more details; for example, if you are using the tcsh shell instead of bash, you will need to use a different command to set up your path and, optionally, you might want to run your wrapper script under the C shell instead of bash.

  1. first, make sure that the program you wish to run is in your college Linux (ICNFS) home directory - you may have written it there in the first place but if not you will need to upload it there. Make a note of where your program is - in these notes, we assume you have created a subdirectory in your home directory called 'programs' and that you have an R program in there called 'test.r'

  2. next, using a SSH client log into one of the compute cluster nodes (macomp02 to macomp11 inclusive and mablad01 to mablad10) as in this example:

    ssh username@macomp02.ma.ic.ac.uk

    where 'username' is your own username.

  3. add the /usr/local_machine/bin subdirectory to your default path - this will make life a lot easier by shortening your typed-in commands and avoiding 'command not found' error messages:

    export PATH=$PATH:/usr/local_machine/bin

  4. make this path permanent so that you won't have to do this every time you log in:

    echo $PATH >> ~/.profile

  5. check to see if you have already created a SSH private:public key pair on the cluster - a quick way of doing this is to type the following at the shell prompt:

    ls -l ~/.ssh/id_rsa*

    and you should see the two files id_rsa and id_rsa.pub listed. If you don't have these, create your SSH key pair now.

  6. now type this command to make sure that both yourself and the queueing system will be able to access all systems (nodes) in the cluster without prompting you for a password every time:

    /usr/local_machine/bin/update-ssh-known-hosts

    Note: once you have done steps 3 to 6 inclusive, you will not have to do these again unless you delete the associated files from your home directory.

  7. you are all set to write your qsub wrapper script; using your favourite text editor (nano is recommended for those new to Linux as it's easy to use and includes plenty of online help) create the following file:

    
    #!/bin/bash
    #PBS -N R_job
    #PBS -m bea
    
    cd ${HOME}/programs
    /usr/local_machine/bin/R --vanilla < test.r > test_r.out
    This qsub script will run your program test.r as the job name R_job and it will send you an email as soon as the job starts to run; it will also send you another email when it finishes or if it aborts for any reason. Any output from the program will be stored in the file test_r.out. (You should of course replace the directories, the test.r and test_r.out filenames with the ones you are actually using yourself)

  8. now save the file, calling it anything you like - job1.script might be a good choice...

  9. finally, you can submit the job to the queueing system with:

    qsub job1.script

That's it, you have submitted your job to the queue and most of the time, if the cluster isn't too busy, it will be executed immediately and you will get an email from the system to let you know the job has started. You can follow the progress of your job online or use the command line programs described in the main documentation.



Andy Thomas

Research Computing Officer,
Department of Mathematics

last updated: 01.04.2014