University of Pretoria
Operational / Internal Site

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

computing:hpcc:feko [2015/08/16 21:01] (current)
Line 1: Line 1:
 +====== FEKO simulations with SLURM ======
 +
 +A 24 process FEKO Gold license is available for use on the clusters.
 +
 +As the license restricts the number of parallel processes possible, simulation
 +parameters must be carefully specified in order to avoid queued jobs aborting
 +due to license issues. The FEKO Gold license seats are defined per physical CPU
 +and hence it is important to set the FEKO and SLURM parameters to match this
 +mode of operation. Furthermore,​ extra environment variables must be set as the
 +default CPU detection code of FEKO does not work well with the SLURM cgroups
 +based CPU management.
 +
 +Below there are SLURM templates for four types of FEKO simulations. The first
 +two are primarily for undergraduate project students. The last two are only
 +usable by certain postgraduate students and staff (the SLURM system limits the
 +number of CPUs and memory available to different classes of users).
 +
 +** Please use the smallest feasible values for the simulation so that our
 +cluster usage can be optimised. ** Also note that FEKO may decide to use
 +out-of-core memory for your simulation if the memory required for your
 +simulation exceeds the allocated memory. This mode of operation is usually
 +indicated early in the log file. If this mode of operation is your intention,
 +you should have a low **''​--mem-per-cpu''​** setting to ensure the rest of the
 +system'​s memory is available for other users and system cache.
 +
 +Note: In the templates, there are various parameters you should change to
 +reflect your simulation requirements. These parameters are indicated with
 +''<​CHANGETHIS>''​ tag. For example:
 +
 +<​code>​
 +#SBATCH --mail-user=<​CHANGETHIS>​
 +</​code>​
 +
 +===== Cancelling FEKO simulations ======
 +
 +<note warning>
 +Cancelling a FEKO simulation should only be done as a last resort.
 +
 +Cancelling a FEKO simulation should only be done as a last resort.
 +</​note>​
 +
 +Due to the way the FEKO Gold license works, if a FEKO simulation is not
 +cancelled using the correct method, the licenses will remain checked out even
 +though all the simulation processes have stopped. This will cause any queued
 +FEKO simulation to abort as the SLURM scheduler'​s tracking of the licenses will
 +be out of sync with that of the FEKO license manager.
 +
 +To ensure cancellation works correctly, the runfeko process must be started
 +using the **''​srun''​** command as shown in the templates below. This will
 +ensure that the FEKO simulation runs as a distinct job step and will receive
 +the required SIGTERM signal.
 +
 +===== Single CPU FEKO simulation: 1 process license =====
 +
 +This template, for a model with name ''​feko_single'',​ is suitable for many
 +relatively short running simulations,​ or simulations that require more memory
 +than is available on project lab computers. A single task is defined that
 +will be allocated a single CPU core. The template also specifies the license
 +requirement,​ in this case a single FEKO license.
 +
 +Typically values for ''​--mem-per-cpu''​ are 2000 (for 2G) or 4000 (for 4G) for
 +larger simulations. In the case of many small simulations,​ first try 1000 (1G).
 +
 +<file bash feko_single.slurm>​
 +#!/bin/bash
 +#SBATCH --output=<​CHANGETHIS>​.log
 +#SBATCH --job-name=<​CHANGETHIS>​
 +#SBATCH --cpus-per-task=1
 +#SBATCH --mem-per-cpu=<​CHANGETHIS>​
 +#SBATCH --licenses=feko:​1
 +#SBATCH --time=<​CHANGETHIS>​
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=<​CHANGETHIS>​
 +
 +# Load FEKO environment
 +source /​usr/​local/​feko/​bin/​initfeko
 +
 +# Run FEKO model simulation
 +srun runfeko feko_single
 +</​file>​
 +
 +===== Moderate size Multi-CPU FEKO simulation: 1 process license =====
 +
 +This template, for a model with name ''​feko_multi'',​ is suitable for larger
 +simulations that require more memory than is available on project lab computers
 +and would typically run more than an hour on a lab computer. A single task is
 +defined that will be allocated 4 CPU cores. Additional flags are set to
 +ensure that the correct CPU core allocation is done and that the license use by
 +FEKO is minimised.
 +
 +Typically values for ''​--mem-per-cpu''​ are 2000 (for 2G) or 4000 (for 4G) for
 +larger simulations.
 +
 +<file bash feko_multi.slurm>​
 +#!/bin/bash
 +#SBATCH --output=<​CHANGETHIS>​.log
 +#SBATCH --job-name=<​CHANGETHIS>​
 +#SBATCH --cpus-per-task=4
 +#SBATCH --cores-per-socket=4
 +#SBATCH --mem-per-cpu=<​CHANGETHIS>​
 +#SBATCH --licenses=feko:​1
 +#SBATCH --time=<​CHANGETHIS>​
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=<​CHANGETHIS>​
 +
 +# Load FEKO environment
 +source /​usr/​local/​feko/​bin/​initfeko
 +
 +# Create a machines file based on the node list allocated
 +hostlist=$(scontrol show hostname $SLURM_JOB_NODELIST)
 +rm -f machines.feko
 +echo -n "​Target Nodes: "
 +for f in $hostlist
 +do
 +   echo $f':​4'​ >> machines.feko
 +   echo $f':​4'​
 +done
 +echo
 +export FEKO_MACHFILE="​machines.feko"​
 +
 +# Ensure that CPU detection (license use) is correct for cpuset allocation
 +export FEKO_SECFEKO_USE_FALLBACK_CPUDETECTION=1
 +export FEKO_CPU_PINNING=0
 +
 +# Run FEKO model simulation with CPU socket binding to minimise license use
 +srun --cpu_bind=verbose,​socket runfeko feko_multi -np 4
 +</​file>​
 +
 +===== Large size multi-CPU FEKO simulation: 1 process license =====
 +
 +This template, for a model with name ''​feko_large'',​ is suitable for larger
 +simulations that would require a day or more to run on a desktop computer. A
 +single task is defined that will be allocated 6 CPU cores. Additional flags are
 +set to ensure that the correct CPU core allocation is done and that the license
 +use by FEKO is minimised.
 +
 +Typically values for ''​--mem-per-cpu''​ are 2000 (for 2G) or 4000 (for 4G) for
 +larger simulations.
 +
 +<file bash feko_large.slurm>​
 +#!/bin/bash
 +#SBATCH --output=<​CHANGETHIS>​.log
 +#SBATCH --job-name=<​CHANGETHIS>​
 +#SBATCH --cpus-per-task=6
 +#SBATCH --cores-per-socket=6
 +#SBATCH --mem-per-cpu=<​CHANGETHIS>​
 +#SBATCH --licenses=feko:​1
 +#SBATCH --time=<​CHANGETHIS>​
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=<​CHANGETHIS>​
 +
 +# Load FEKO environment
 +source /​usr/​local/​feko/​bin/​initfeko
 +
 +# Create a machines file based on the node list allocated
 +hostlist=$(scontrol show hostname $SLURM_JOB_NODELIST)
 +rm -f machines.feko
 +echo -n "​Target Nodes: "
 +for f in $hostlist
 +do
 +   echo $f':​6'​ >> machines.feko
 +   echo $f':​6'​
 +done
 +echo
 +export FEKO_MACHFILE="​machines.feko"​
 +
 +# Ensure that CPU detection (license use) is correct for cpuset allocation
 +export FEKO_SECFEKO_USE_FALLBACK_CPUDETECTION=1
 +export FEKO_CPU_PINNING=0
 +
 +# Run FEKO model simulation with CPU socket binding to minimise license use
 +srun --cpu_bind=verbose,​socket runfeko feko_large -np 6
 +</​file>​
 +
 +===== Maximum size multi-CPU FEKO simulation: 2 process licenses =====
 +
 +This template, for a model with name ''​feko_max'',​ is suitable for very large
 +simulations that would require a week or more to run on a desktop computer. A
 +single task is defined that will be allocated 12 CPU cores. Additional flags
 +are set to ensure that the correct CPU core allocation is done and that the
 +license use by FEKO is minimised.
 +
 +Typically values for ''​--mem-per-cpu''​ are 2000 (for 2G) or 2500 (for 2.5) for
 +larger simulations.
 +
 +<file bash feko_max.slurm>​
 +#!/bin/bash
 +#SBATCH --output=<​CHANGETHIS>​.log
 +#SBATCH --job-name=<​CHANGETHIS>​
 +#SBATCH --cpus-per-task=12
 +#SBATCH --mem-per-cpu=2500
 +#SBATCH --licenses=feko:​2
 +#SBATCH --time=<​CHANGETHIS>​
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=<​CHANGETHIS>​
 +
 +# Load FEKO environment
 +source /​usr/​local/​feko/​bin/​initfeko
 +
 +# Create a machines file based on the node list allocated
 +hostlist=$(scontrol show hostname $SLURM_JOB_NODELIST)
 +rm -f machines.feko
 +echo -n "​Target Nodes: "
 +for f in $hostlist
 +do
 +   echo $f':​12'​ >> machines.feko
 +   echo $f':​12'​
 +done
 +echo
 +export FEKO_MACHFILE="​machines.feko"​
 +
 +# Ensure that CPU detection (license use) is correct for cpuset allocation
 +export FEKO_SECFEKO_USE_FALLBACK_CPUDETECTION=1
 +export FEKO_CPU_PINNING=0
 +
 +# Run FEKO model simulation
 +srun runfeko feko_max -np 12
 +</​file>​
 +
 +===== Interactive FEKO: 1 process license =====
 +
 +For the creation of large models it is sometime necessary to run CADFEKO in
 +interactive mode on the cluster. To do so, simply run **''​cadfeko''​** after
 +logging in on the head node. This will start a single CPU session with 12GB of
 +memory. Note that it will take about 30 to 50 seconds for the session to be
 +allocated on the cluster and for CADFEKO to start. ​ Note furthermore that the
 +session has a hard time limit of 4 hours and will also terminate after 20
 +minutes of inactivity.
 +
 +** Do not use the interactive session for simulations. **
 +
 +For the X11 graphics to be handled correctly, this assumes you have a X11
 +server installed on your computer and that you logged in with X11 forwarding
 +enabled.
 +