Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
[[_TOC_]]
# Scientific Computing
# Split your work into JOBs
In order to make great use of a super computer one needs to be able to tell it what should be done.
In order to make great use of a super computer one needs to be able to tell it what should be done in an automated way.
And there comes [SLURM](https://slurm.schedmd.com/documentation.html) as a workload manager for HPCs.
There is a quickstart guide [here](https://slurm.schedmd.com/quickstart.html) and in the sections below some information on how to solve the most pressing needs.
There is a quickstart guide [here](https://slurm.schedmd.com/quickstart.html) and in the sections below some information on how to solve the most common challenages is shown.
## First Job
```bash
squeue
>>>squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
```
What does that mean? The Queue is empty and is waiting for your jobs.
...
...
@@ -34,6 +35,7 @@ Now we submit this job to the queue:
sbatch test.job
```
and look agin into the queue
```
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
211 compute test_mpi mblasche R 0:02 1 jet03
...
...
@@ -42,7 +44,10 @@ It is running on jet03 and about a minute later it should be finished and you sh
```
jet03.jet.local
```
The hostname of the node running our test script. Of course that is a silly job. However, there are things to learn for this example:
The hostname of the node running our test script.
## Leasons learnt
Of course that is a silly job. However, there are things to learn for this example:
1. Writing a `job` file and using the `#SBATCH` commands to specify
*`job-name`, this is just for you
*`output`, this is just for you
...
...
@@ -54,8 +59,21 @@ The hostname of the node running our test script. Of course that is a silly job.
# Writing a job file
There are things to consider when writing a job
```bash
#!/bin/bash
# SLURM specific commands
#SBATCH --job-name=test
#SBATCH --output=test.log
#SBATCH --ntasks=1
#SBATCH --time=01:30
# Your Code below here
srun
# Interactive Session
```
# Interactive Sessions
It is possible to ask for an interactive session on any node. When your job runs you can always `ssh` to that node and maybe check things, although your file system is available everywhere, there is no need for that.