University of Pretoria
Operational / Internal Site


This shows you the differences between two versions of the page.

Link to this comparison view

computing:hpcc:batch [2019/08/27 18:41] (current)
Line 1: Line 1:
 +====== Batch Scheduling ======
 +The EECE clusters are only accessible via a batch scheduling system. ​ The idea
 +is that tasks are submitted on alpha1-1 (known as the head node) to a batch
 +queue and these tasks are then automatically run by the system as resources
 +become available.
 +===== Overview =====
 +Simple Linux Utility for Resource Management (SLURM) is an open-source resource
 +manager and job scheduling system. The entities managed by SLURM include nodes,
 +partitions (group of nodes), jobs and job steps. The partitions can also be
 +considered as job queues and each of which has a sets of constraints such as
 +job size limit, time limit, etc. Submitting a job to the system requires you to
 +specify a partition. Under some circumstances,​ a Quality of Service (QoS),
 +which indicates a classification that determines what kind of resources your
 +job can use, is also expected. Jobs within a partition will then be allocated
 +to nodes based on the scheduling policy, until all resources within a partition
 +are exhausted.
 +There are several basic commands you will need to know to submit jobs, cancel
 +jobs, and check status. These are:
 +  * **''​sbatch''​** - Submit a job to the batch queue system, e.g., sbatch myjob.slurm
 +  * **''​squeue''​** - Check the current jobs in the batch queue system, e.g., squeue
 +  * **''​sinfo''​** - View the current status of the queues, e.g., sinfo
 +  * **''​scancel''​** - Cancel a job, e.g., scancel 123
 +For more detail information on SLURM please consult the SLURM
 +[[http://​​tutorials.html|tutorials]] and
 +Information on particular commands can also be obtained using the standard
 +Unix manual page system:
 +man sbatch
 +===== Usage =====
 +The general approach to using SLURM on the clusters is as follows:
 +  - Log onto the queue management node by connecting to "​**''​''​** using [[:​computing:​putty:​index]].
 +  - Copy any needed files from your computer using WinSCP or a similar program.
 +  - Update the job specification/​script file. In particular, take care that any output being generated on one node will not be overwritten by a job running concurrently on another node. There are a number of environment labels that are set when the job runs and these could be used to create subdirectories during the job start-up sequence.
 +  - Submit the job using the "''​**sbatch <​scriptname>​**''"​ command.
 +  - You can monitor your jobs using the "''​**squeue -l -u <​username>​**''"​ command.
 +  - You can obtain more information about a particular job using the "''​**squeue -l -j <​jobnumber>​**''"​ command.
 +  - Once the job has been completed, download the results or copy them back to your computer.
 +SLURM requires the job specification/​script file to be in Unix format. If your
 +files were edited on a Windows system you can do the conversion with:
 +fromdos <​scriptname>​
 +To edit the job specification/​script file on the head node you can use one of
 +the installed terminal text editors: [[http://​|nano]]
 +(recommended for Windows users), [[http://​​jed/​|jed]],​
 +[[http://​​about.php|vim]]. If logged in with X11 forwarding enabled,
 +the following GUI based text editors are available: [[http://​|geany]],​
 +When the job is scheduled to run, by default SLURM will automatically change to
 +the same directory where you originally submitted the job file before running
 +the job script. You can also change the working directory as necessary in your
 +job script (using the standard Unix ''​cd''​ command).
 +If you need to cancel a job, for example if there is a problem with the
 +simulation parameters, this can be accomplished using the following command:
 +scancel <​jobnumber>​
 +===== Batch Jobs and Job Steps =====
 +A SLURM batch job is started as indicated above with the **''​sbatch''​**
 +command. ​ The job specification/​script file is actually a standard Unix shell
 +script (e.g. BASH script file) with additional comments containing special tags
 +near the top of the file. These tags are in turn identical to command-line
 +options to the **''​sbatch''​** command, embedding them in the script file is
 +just more compact. Each job script typically then consists of: SBATCH command
 +tags as comments, various setup shell operations and the actual simulation
 +commands (i.e. the execution of a software binary on an input model file that
 +will run for some time).
 +To allow SLURM to track the progress of a simulation, and to allow correct
 +cancellation of jobs, each of the simulation steps should be initiated using
 +the **''​srun''​** command. Most simulations would therefore have the following
 +<file bash simple.slurm>​
 +# Special tags that contain parameters which define the resources required
 +# by the simulation and which will be used by SLURM to allocate these
 +# resources.
 +#SBATCH --time=00:​30:​00
 +# Perform simulation setup. For example, create a temporary directory in the
 +# scratch area.
 +mkdir -p /​tmp/​$USERNAME/​my_sim
 +# Now run my program (simulation) as a separate job step.
 +srun ./​my_program
 +In these examples it is assume the **''​./​my_program''​** binary is
 +single-threaded and hence will use only one CPU core. Another possibility is a
 +simulation with a number of sequential runs:
 +<file bash serial.slurm>​
 +#SBATCH --time=00:​30:​00
 +# Perform simulation setup. For example, create a temporary directory in the
 +# scratch area.
 +mkdir -p /​tmp/​$USERNAME/​my_sim
 +# Now run my program (simulation) as a separate job step, one for each
 +# index (in this example 0 to 9).
 +for i in {0..9}
 +    srun -l ./​my_program $i
 +And for the case of a parallel simulation:
 +<file bash parallel.slurm>​
 +#SBATCH --cpus-per-task=4
 +#SBATCH --time=00:​30:​00
 +# Perform simulation setup. For example, create a temporary directory in the
 +# scratch area.
 +mkdir -p /​tmp/​$USERNAME/​my_sim
 +# Now run my program (simulation) as four concurrent job steps, each with
 +# different parameter (index). Since only four CPUs have been allocated,
 +# only four simulations will be run concurrently.
 +for i in {0..3}
 +    srun -l ./​my_program $i &
 +# Wait until all four concurrent simulations complete before the job ends.
 +===== General SLURM Batch Parameters =====
 +<note warning>
 +To ensure your job starts as early as possible, the minimum amount of
 +resources required for your job should be requested. In particular the
 +number of CPUs and memory per CPU must be set correctly, see
 +[[batch#Job Accounting]] for instruction on how to determine these from
 +past and current jobs.
 +The following parameters are typically used in SLURM scripts:
 +  * **''​--cpus-per-task=4''​**:​ Advise the SLURM controller that ensuing job steps will require 4 processors. Without this option, the controller will just try to allocate one processor per task.
 +  * **''​--time=<​1-10:​00:​00>''​**:​ the maximum time the job is expected to run. The job will automatically be cancelled after this time. Specified as ''​DAYS+HOURS:​MINUTES:​SECONDS'',​ for example ''​1+10:​00:​00''​ for 1 day 10 hours. Note that shorter jobs are given preference by the scheduler.
 +  * **''​--mail-user=<​username@domain>''​**:​ set to your email address in order to receive notification of job status.
 +  * **''​--mail-type=END''​**:​ together with the previous option, will send notification on job completion. To receive notification on both job start and completion, use **''​BEGIN,​END''​**. If you set this option you MUST also set a valid email address with the previous option.
 +  * **''​--job-name=<​MYJOB>''​**:​ the name of the jobs as it will be shown in the queue list. Use a unique and descriptive name so that you can easily identify particular simulations.
 +  * **''​--output=<​filename.log>''​**:​ the name of a log file into which all output and status messages for the job will be written. Note that any existing file will be truncated at job start.
 +For most users it is not necessary to set the following:
 +  * **''​--partition=<​partition>''​**:​ request specific partition, only possible for some users as most users only have access to the default partition.
 +  * **''​--account=<​account>''​**:​ request specific account, only possible for some users as most users only have access to their default account.
 +Similarly, the following do not need to be set as the defaults will be correct for most simulations:​
 +  * **''​--begin=now+5hours''​**:​ queue the job but delay the start for 5 hours. See sbatch man page for time format options.
 +  * **''​--nodes=<​min-max>''​**:​ the default is 1 node, which is the maximum which most users are currently allowed.
 +  * **''​--ntasks=1''​**:​ Generally all jobs should consist of a single task (simulation) potentially running on multiple processors.
 +  * **''​--workdir=<​directory>''​**:​ set working directory for batch script, the default is the directory within which the job was submitted.
 +===== SLURM Examples =====
 +The various commercial software available on the clusters have unique
 +requirements in terms of licenses and resources. Below you can find pages where
 +templates for specific types of simulations can be downloaded. To download the
 +template, click on the link at the top of the embedded template.
 +  - [[FEKO]]
 +  - [[COMSOL]]
 +  - [[CST]]
 +  - [[MATLAB]]
 +  - [[Octave]]
 +===== Job Accounting =====
 +SLURM keeps track of various statistics for each job and you can use
 +**''​sacct''​** to extract the information (see man page for usage instructions).
 +The most useful statistics are available with the custom **''​smemio''​**
 +command. The output is as follows:
 +       ​JobID ​   JobName ​ Timelimit ​   Elapsed ​  ​UserCPU ​ AllocCPUS ​    ​ReqMem ​ MaxVMSize ​    ​MaxRSS ​ MaxDiskRead MaxDiskWrite
 +------------ ---------- ---------- ---------- --------- ---------- ---------- ---------- ---------- ------------ ------------
 +3174             ​lambda ​  ​20:​00:​00 ​  ​05:​45:​04 ​ 23:​00:​14 ​         4     ​4000Mc
 +3174.batch ​       batch              05:​45:​04 ​ 23:​00:​14 ​         4     ​4000Mc ​   263296K ​     9012K        0.40M        0.20M
 +3174.0 ​         runfeko ​             05:​45:​03 ​ 23:​00:​13 ​         4     ​4000Mc ​  ​5419024K ​  ​3948592K ​         51M          27M
 +Only completed jobs for the current user within the past week will be shown.
 +For a currently running job, use "​**''​sjmemio <​jobid>''​**"​ to get the current
 +stats. ​ The first line gives the overall job parameters for the particular job
 +id.  The line containing **''​.batch''​** represents the job script statistics,
 +excluding any job steps initiated with a **''​srun''​**. In this example there is
 +one job step, indicated with **''​.0''​** that was initiated with **''​srun''​**,​ a
 +FEKO simulation. The fields are as follows:
 +  * **Timelimit**:​ The requested job time limit.
 +  * **Elapsed**:​ The actual elapsed time for the job.
 +  * **UserCPU**:​ The amount of user CPU time used by the job or job step.
 +  * **AllocCPUS**:​ Total number of CPU cores allocated to the job.
 +  * **ReqMem**: Minimum required memory for the job, in MB. A '​c'​ at the end of number represents Memory Per CPU, a '​n'​ represents Memory Per Node.
 +  * **MaxVMSize**:​ Maximum amount of virtual memory the simulation step requested (but may not actually have used).
 +  * **MaxRSS**: The maximum resident set size, the maximum amount of physical memory the simulation step actually used.
 +  * **MaxDiskRead**:​ Maximum number of bytes read from disk storage in job step.
 +  * **MaxDiskWrite**:​ Maximum number of bytes written to disk storage in job step.
 +The above example therefore illustrates a poorly specified batch job. A run
 +time of 20 hours was requested whilst the job only ran for about 6 hours.
 +Similarly, 16 GB of memory was requested (4 CPU cores with 4 GB per core) whilst
 +the job only actually used about 4 GB in total (but did attempt to allocate
 +about 6 GB total). For this job **''​--mem-per-cpu''​** should have been set to no
 +more than 2000.
 +High values in the last two columns would indicate an I/O bound job. Such jobs
 +will most like cause increased latency on the clusters if run from the NFS home
 +directory. Such jobs should rather copy the required files to a newly created
 +directory in the temporary scratch space area (**''/​tmp''​**). The following
 +SLURM template indicates how this could be done, for example, for the CST
 +software (which is particularly problematic in this regard). The example
 +assumes that the model has been uploaded to the user's NFS home directory as a
 +single ZIP file, together with the associated SLURM template. The SLURM script
 +will automatically create a temporary directory for the simulation and unpack
 +the ZIP file. Once the simulation completes the ZIP file in the user's home
 +directory will be updated to include all changed and new files. The user can
 +then directly download the resultant ZIP file for processing on their desktop.
 +<file bash cst_tmp.slurm>​
 +#SBATCH --output=<​CHANGETHIS>​.log
 +#SBATCH --job-name=<​CHANGETHIS>​
 +#SBATCH --cpus-per-task=12
 +#SBATCH --mem-per-cpu=<​CHANGETHIS>​
 +#SBATCH --licenses=cst:​1
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=<​CHANGETHIS>​
 +# Define simulation base name
 +# Create temporary directory for CST simulation in scratch space
 +mkdir -p /​tmp/​$USERNAME/​$SIMNAME
 +cd /​tmp/​$USERNAME/​$SIMNAME
 +echo "Work directory: $PWD"
 +# Unpack CST simulation
 +ls -alp $HOME/​$
 +srun unzip $HOME/​$
 +# Run CST model simulation
 +srun /​usr/​local/​CST/​CST_STUDIO_SUITE/​cst_design_environment --m --q --numthreads 12 $SIMNAME.cst
 +# Update the home directory ZIP file with the results
 +srun zip -9r $HOME/​$ $SIMNAME.cst $SIMNAME