Just like PBS (described in section Using PBS), SLURM is a job management system which is widely used on large supercomputing systems. Any HPX application can easily be run using SLURM. This section describes how this can be done.
The easiest way to run an HPX application using SLURM is to utilize the command line tool srun which interacts with the SLURM batch scheduling system.
srun -p <partition> -N <number-of-nodes> hpx-application <application-arguments>
<partition> is one of the node partitions existing
on the target machine (consult the machines documentation to get a list
of existing partitions) and
<number-of-nodes> is the number of compute nodes you
want to use. By default, the HPX application is started with one locality
per node and uses all available cores on a node. You can change the number
of localities started per node (for example to account for NUMA effects)
by specifying the
option of srun. The number of cores per locality can be set by
<application-arguments> are any application specific arguments
which need to passed on to the application.
There is no need to use any of the HPX command line options related to the number of localities, number of threads, or related to networking ports. All of this information is automatically extracted from the SLURM environment by the HPX startup code.
The srun documentation
explicitly states: "If
To get an interactive development shell on one of the nodes you can issue the following command:
srun -p <node-type> -N <number-of-nodes> --pty /bin/bash -l
After the shell has been opened, you can run your HPX application. By default,
it uses all available cores. Note that if you requested one node, you don't
need to do
However, if you requested more than one nodes, and want to run your distributed
application, you can use
again to start up the distributed HPX application. It will use the resources
that have been requested for the interactive shell.
The above mentioned method of running HPX applications
is fine for development purposes. The disadvantage that comes with
srun is that it only returns once the
application is finished. This might not be appropriate for longer running
applications (for example benchmarks or larger scale simulations). In order
to cope with that limitation you can use the sbatch
sbatch command expects
a script that it can run once the requested resources are available. In
order to request resources you need to add
comments in your script or provide the necessary parameters to
sbatch directly. The parameters are the
same as with
commands you need to execute are the same you would need to start your
application as if you were in an interactive shell.