The EECE clusters are only accessible via a batch scheduling system. The idea
is that tasks are submitted on chn (known as the head node) to a batch
queue and these tasks are then automatically run by the system as resources
become available.
Overview
Simple Linux Utility for Resource Management (SLURM) is an open-source resource
manager and job scheduling system. The entities managed by SLURM include nodes,
partitions (group of nodes), jobs and job steps. The partitions can also be
considered as job queues and each of which has a sets of constraints such as
job size limit, time limit, etc. Submitting a job to the system requires you to
specify a partition. Under some circumstances, a Quality of Service (QoS),
which indicates a classification that determines what kind of resources your
job can use, is also expected. Jobs within a partition will then be allocated
to nodes based on the scheduling policy, until all resources within a partition
are exhausted.
There are several basic commands you will need to know to submit jobs, cancel
jobs, and check status. These are:
sbatch - Submit a job to the batch queue system, e.g., sbatch myjob.slurm
squeue - Check the current jobs in the batch queue system, e.g., squeue
sinfo - View the current status of the queues, e.g., sinfo
scancel - Cancel a job, e.g., scancel 123
For more detail information on SLURM please consult the SLURM
tutorials and
documentation.
Information on particular commands can also be obtained using the standard
Unix manual page system:
man sbatch
Usage
The general approach to using SLURM on the clusters is as follows:
Copy any needed files from your computer using WinSCP or a similar program.
Update the job specification/script file. In particular, take care that any output being generated on one node will not be overwritten by a job running concurrently on another node. There are a number of environment labels that are set when the job runs and these could be used to create subdirectories during the job start-up sequence.
Submit the job using the ”sbatch <scriptname>“ command.
You can monitor your jobs using the ”squeue -l -u <username>“ command.
You can obtain more information about a particular job using the ”squeue -l -j <jobnumber>“ command.
Once the job has been completed, download the results or copy them back to your computer.
SLURM requires the job specification/script file to be in Unix format. If your
files were edited on a Windows system you can do the conversion with:
fromdos <scriptname>
To edit the job specification/script file on the head node you can use one of
the installed terminal text editors: nano
(recommended for Windows users), jed,
vim. If logged in with X11 forwarding enabled,
the following GUI based text editors are available: geany,
scite.
When the job is scheduled to run, by default SLURM will automatically change to
the same directory where you originally submitted the job file before running
the job script. You can also change the working directory as necessary in your
job script (using the standard Unix cd command).
If you need to cancel a job, for example if there is a problem with the
simulation parameters, this can be accomplished using the following command:
scancel <jobnumber>
Batch Jobs and Job Steps
A SLURM batch job is started as indicated above with the sbatch
command. The job specification/script file is actually a standard Unix shell
script (e.g. BASH script file) with additional comments containing special tags
near the top of the file. These tags are in turn identical to command-line
options to the sbatch command, embedding them in the script file is
just more compact. Each job script typically then consists of: SBATCH command
tags as comments, various setup shell operations and the actual simulation
commands (i.e. the execution of a software binary on an input model file that
will run for some time).
To allow SLURM to track the progress of a simulation, and to allow correct
cancellation of jobs, each of the simulation steps should be initiated using
the srun command. Most simulations would therefore have the following
form:
#!/bin/bash# Special tags that contain parameters which define the resources required# by the simulation and which will be used by SLURM to allocate these# resources.#SBATCH --time=00:30:00# Perform simulation setup. For example, create a temporary directory in the# scratch area.mkdir-p/tmp/$USERNAME/my_sim
# Now run my program (simulation) as a separate job step.
srun ./my_program
In these examples it is assume the ./my_program binary is
single-threaded and hence will use only one CPU core. Another possibility is a
simulation with a number of sequential runs:
#!/bin/bash#SBATCH --time=00:30:00# Perform simulation setup. For example, create a temporary directory in the# scratch area.mkdir-p/tmp/$USERNAME/my_sim
# Now run my program (simulation) as a separate job step, one for each# index (in this example 0 to 9).for i in{0..9}do
srun -l ./my_program $idone
#!/bin/bash#SBATCH --cpus-per-task=4#SBATCH --time=00:30:00# Perform simulation setup. For example, create a temporary directory in the# scratch area.mkdir-p/tmp/$USERNAME/my_sim
# Now run my program (simulation) as four concurrent job steps, each with# different parameter (index). Since only four CPUs have been allocated,# only four simulations will be run concurrently.for i in{0..3}do
srun -l ./my_program $i&done# Wait until all four concurrent simulations complete before the job ends.wait
General SLURM Batch Parameters
<note warning>
To ensure your job starts as early as possible, the minimum amount of
resources required for your job should be requested. In particular the
number of CPUs and memory per CPU must be set correctly, see
Job Accounting for instruction on how to determine these from
past and current jobs.
</note>
The following parameters are typically used in SLURM scripts:
–cpus-per-task=4: Advise the SLURM controller that ensuing job steps will require 4 processors. Without this option, the controller will just try to allocate one processor per task.
–time=<1-10:00:00>: the maximum time the job is expected to run. The job will automatically be cancelled after this time. Specified as DAYS+HOURS:MINUTES:SECONDS, for example 1+10:00:00 for 1 day 10 hours. Note that shorter jobs are given preference by the scheduler.
–mail-user=<username@domain>: set to your email address in order to receive notification of job status.
–mail-type=END: together with the previous option, will send notification on job completion. To receive notification on both job start and completion, use BEGIN,END. If you set this option you MUST also set a valid email address with the previous option.
–job-name=<MYJOB>: the name of the jobs as it will be shown in the queue list. Use a unique and descriptive name so that you can easily identify particular simulations.
–output=<filename.log>: the name of a log file into which all output and status messages for the job will be written. Note that any existing file will be truncated at job start.
For most users it is not necessary to set the following:
–partition=<partition>: request specific partition, only possible for some users as most users only have access to the default partition.
–account=<account>: request specific account, only possible for some users as most users only have access to their default account.
Similarly, the following do not need to be set as the defaults will be correct for most simulations:
–begin=now+5hours: queue the job but delay the start for 5 hours. See sbatch man page for time format options.
–nodes=<min-max>: the default is 1 node, which is the maximum which most users are currently allowed.
–ntasks=1: Generally all jobs should consist of a single task (simulation) potentially running on multiple processors.
–workdir=<directory>: set working directory for batch script, the default is the directory within which the job was submitted.
SLURM Examples
The various commercial software available on the clusters have unique
requirements in terms of licenses and resources. Below you can find pages where
templates for specific types of simulations can be downloaded. To download the
template, click on the link at the top of the embedded template.
SLURM keeps track of various statistics for each job and you can use
sacct to extract the information (see man page for usage instructions).
The most useful statistics are available with the custom smemio
command. The output is as follows:
Only completed jobs for the current user within the past week will be shown.
For a currently running job, use “sjmemio <jobid>” to get the current
stats. The first line gives the overall job parameters for the particular job
id. The line containing .batch represents the job script statistics,
excluding any job steps initiated with a srun. In this example there is
one job step, indicated with .0 that was initiated with srun, a
FEKO simulation. The fields are as follows:
Timelimit: The requested job time limit.
Elapsed: The actual elapsed time for the job.
UserCPU: The amount of user CPU time used by the job or job step.
AllocCPUS: Total number of CPU cores allocated to the job.
ReqMem: Minimum required memory for the job, in MB. A 'c' at the end of number represents Memory Per CPU, a 'n' represents Memory Per Node.
MaxVMSize: Maximum amount of virtual memory the simulation step requested (but may not actually have used).
MaxRSS: The maximum resident set size, the maximum amount of physical memory the simulation step actually used.
MaxDiskRead: Maximum number of bytes read from disk storage in job step.
MaxDiskWrite: Maximum number of bytes written to disk storage in job step.
The above example therefore illustrates a poorly specified batch job. A run
time of 20 hours was requested whilst the job only ran for about 6 hours.
Similarly, 16 GB of memory was requested (4 CPU cores with 4 GB per core) whilst
the job only actually used about 4 GB in total (but did attempt to allocate
about 6 GB total). For this job –mem-per-cpu should have been set to no
more than 2000.
High values in the last two columns would indicate an I/O bound job. Such jobs
will most like cause increased latency on the clusters if run from the NFS home
directory. Such jobs should rather copy the required files to a newly created
directory in the temporary scratch space area (/tmp). The following
SLURM template indicates how this could be done, for example, for the CST
software (which is particularly problematic in this regard). The example
assumes that the model has been uploaded to the user's NFS home directory as a
single ZIP file, together with the associated SLURM template. The SLURM script
will automatically create a temporary directory for the simulation and unpack
the ZIP file. Once the simulation completes the ZIP file in the user's home
directory will be updated to include all changed and new files. The user can
then directly download the resultant ZIP file for processing on their desktop.