From 4fd2a0f97eda0f36e11273fcf4cc03068103eec1 Mon Sep 17 00:00:00 2001 From: Michael Blaschek <michael.blaschek@univie.ac.at> Date: Tue, 13 Oct 2020 16:28:23 +0200 Subject: [PATCH] Update README.md --- README.md | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 98 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6240fdc..668ef2b 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,99 @@ -# Slurm +# Scientific Computing -How to use Slurm on JET \ No newline at end of file +In order to make great use of a super computer one needs to be able to tell it what should be done. +And there comes [SLURM](https://slurm.schedmd.com/documentation.html) as a workload manager for HPCs. +There is a quickstart guide [here](https://slurm.schedmd.com/quickstart.html) and in the sections below some information on how to solve the most pressing needs. + +```bash +squeue + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) +``` +What does that mean? The Queue is empty and is waiting for your jobs. +So let's give the computer something to think about. +Open your favorite Editor (Gedit, Emacs, vim, nano, ...) and put the code below into the file, save it as `test.job` +```bash +#!/bin/bash +# SLURM specific commands +#SBATCH --job-name=test +#SBATCH --output=test.log +#SBATCH --ntasks=1 +#SBATCH --time=01:30 + +# Your Code below here +srun hostname +srun sleep 60 +``` +Now we submit this job to the queue: +```bash +sbatch test.job +``` +and look agin into the queue +``` + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 211 compute test_mpi mblasche R 0:02 1 jet03 +``` +It is running on jet03 and about a minute later it should be finished and you should see the log file `test.log` with something like that in it: +``` +jet03.jet.local +``` +The hostname of the node running our test script. Of course that is a silly job. However, there are things to learn for this example: +1. Writing a `job` file and using the `#SBATCH` commands to specify + * `job-name`, this is just for you + * `output`, this is just for you + * `ntasks`, this tells slurm how many threads you should be allowed to have + * `time`, this tells slurm how much time it should give your job +2. Submit the job with `sbatch` +3. Check the Queue and Status of your job with `squeue` + +## Writing a job file + +There are things to consider when writing a job + +## Interactive Session + +It is possible to ask for an interactive session on any node. When your job runs you can always `ssh` to that node and maybe check things, although your file system is available everywhere, there is no need for that. + +An interactive session can be started like this +How to run an interactive job +```bash +srun --pty bash -i +``` + +The following options also work with non-interactive jobs (sbatch, salloc, etc.) +How to run on a specific node + +```bash +# use the -w / --nodelist option +srun -w jet05 --pty bash -i +``` +How to acquire each node exclusively + +```bash +# use the --exclusive option +srun -w jet05 --exclusive --pty bash -i +``` + +## MPI + +How to submit an MPI batch job + +```bash +#!/bin/bash +# Some example sbatch options. +# See also https://slurm.schedmd.com/sbatch.html +#SBATCH --job-name=testprog +#SBATCH --output=testprog.out +#SBATCH --ntasks=7 + +# Load the OpenMPI module. +module load openmpi/4.0.5-gcc-8.3.1-773ztsv + +# Run the program with mpirun. Additional mpirun options +# are not required. mpirun will get the info (e.g., hosts) +# from SLURM. +srun ./testprog +``` +Submit with +```bash +sbatch mpiexample.sh +``` -- GitLab