Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
[[_TOC_]]
[[_TOC_]]
# Scientific Computing
# Split your work into JOBs
In order to make great use of a super computer one needs to be able to tell it what should be done.
In order to make great use of a super computer one needs to be able to tell it what should be done in an automated way.
And there comes [SLURM](https://slurm.schedmd.com/documentation.html) as a workload manager for HPCs.
And there comes [SLURM](https://slurm.schedmd.com/documentation.html) as a workload manager for HPCs.
There is a quickstart guide [here](https://slurm.schedmd.com/quickstart.html) and in the sections below some information on how to solve the most pressing needs.
There is a quickstart guide [here](https://slurm.schedmd.com/quickstart.html) and in the sections below some information on how to solve the most common challenages is shown.
## First Job
```bash
```bash
squeue
>>>squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
```
```
What does that mean? The Queue is empty and is waiting for your jobs.
What does that mean? The Queue is empty and is waiting for your jobs.
...
@@ -34,6 +35,7 @@ Now we submit this job to the queue:
...
@@ -34,6 +35,7 @@ Now we submit this job to the queue:
sbatch test.job
sbatch test.job
```
```
and look agin into the queue
and look agin into the queue
```
```
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
211 compute test_mpi mblasche R 0:02 1 jet03
211 compute test_mpi mblasche R 0:02 1 jet03
...
@@ -42,7 +44,10 @@ It is running on jet03 and about a minute later it should be finished and you sh
...
@@ -42,7 +44,10 @@ It is running on jet03 and about a minute later it should be finished and you sh
```
```
jet03.jet.local
jet03.jet.local
```
```
The hostname of the node running our test script. Of course that is a silly job. However, there are things to learn for this example:
The hostname of the node running our test script.
## Leasons learnt
Of course that is a silly job. However, there are things to learn for this example:
1. Writing a `job` file and using the `#SBATCH` commands to specify
1. Writing a `job` file and using the `#SBATCH` commands to specify
*`job-name`, this is just for you
*`job-name`, this is just for you
*`output`, this is just for you
*`output`, this is just for you
...
@@ -54,8 +59,21 @@ The hostname of the node running our test script. Of course that is a silly job.
...
@@ -54,8 +59,21 @@ The hostname of the node running our test script. Of course that is a silly job.
# Writing a job file
# Writing a job file
There are things to consider when writing a job
There are things to consider when writing a job
```bash
#!/bin/bash
# SLURM specific commands
#SBATCH --job-name=test
#SBATCH --output=test.log
#SBATCH --ntasks=1
#SBATCH --time=01:30
# Your Code below here
srun
# Interactive Session
```
# Interactive Sessions
It is possible to ask for an interactive session on any node. When your job runs you can always `ssh` to that node and maybe check things, although your file system is available everywhere, there is no need for that.
It is possible to ask for an interactive session on any node. When your job runs you can always `ssh` to that node and maybe check things, although your file system is available everywhere, there is no need for that.