diff --git a/README.md b/README.md index eda045ea87702c404b4f5edf41f6be23fd48e2b4..febc1d1642d75cd427ada1b220e2aaa73d14e964 100644 --- a/README.md +++ b/README.md @@ -127,22 +127,87 @@ ask for more tasks # Interactive Sessions -It is possible to ask for an interactive session on any node. When your job runs you can always `ssh` to that node and maybe check things, although your file system is available everywhere, there is no need for that. +It is possible to ask for an interactive session on any node. When your normal job runs you can always `ssh` to the allocated nodes. However, the file system is global, so checking files can be done from the login nodes as well. + + +## How to run an interactive job An interactive session can be started like this -How to run an interactive job + +```bash +# This will give you a full node +$ srun --pty bash +srun: job 18351 queued and waiting for resources +srun: job 18351 has been allocated resources +[user@n3502-045 ~]$ +``` + +or using an allocation: + +```bash +# Create an allocation for 2 hours (1Node) +$ salloc --time=02:00:00 +salloc: Pending job allocation 18356 +salloc: job 18356 queued and waiting for resources +salloc: job 18356 has been allocated resources +salloc: Granted job allocation 18356 +salloc: Waiting for resource configuration +salloc: Nodes n3501-009 are ready for job +# Connect to that allocation by running +$srun --pty bash +[user@n3501-009 ~]$ +# or use ssh +$ssh n3501-009 +[user@n3501-009 ~]$ +``` + +*Please note the following:* + +Using `srun` will create a terminal session (pty) on the compute node and allow you to run programs interactively. Once you disconnect with `exit` or `CTRL+c` from the job, all running processes from that terminal session will be killed. It is not possible to put things into the background. When you connect with `ssh` it is possible to put processes into the background and exit from the ssh session. + +**Disconnecting from the login node will kill your salloc** + +### interactive long jobs + +If you need to have a development job, where you would like to run things and **connect and disconnect repeatedly** the above setup will fail. Use a terminal multiplexer for that. Follow these steps: + ```bash -srun --pty bash -i +# connect to a login node on VSC or JET or ... +# launch a terminal multiplexer: tmux, screen, ... +$ tmux +# now you are inside the Tmux session +# allocate for 3 days +$ salloc --time=3-00:00:00 +salloc: Pending job allocation 18356 +salloc: job 18356 queued and waiting for resources +salloc: job 18356 has been allocated resources +salloc: Granted job allocation 18356 +salloc: Waiting for resource configuration +salloc: Nodes n3501-009 are ready for job +# connect to the Node +$ srun --pty bash +[user@n3501-009 ~]$ +# or using ssh +$ ssh n3501-009 +[user@n3501-009 ~]$ ``` +When you disconnect from the compute Node + +TMux control shortcuts + + +more information on [Terminal Multiplexer](https://opensource.com/article/21/5/linux-terminal-multiplexer). + +## How to run on a specific node + The following options also work with non-interactive jobs (sbatch, salloc, etc.) -How to run on a specific node ```bash # use the -w / --nodelist option srun -w jet05 --pty bash -i ``` -How to acquire each node exclusively +How to acquire each node exclusively (only relevant for shared Nodes, e.g. Jet) ```bash # use the --exclusive option @@ -175,10 +240,13 @@ sbatch hello.job # Useful things ## Information on the Job -``` +```bash # show all completed jobs sacct -a -s cd +# show all jobs from user +sacct -a -u $USER + # show info for that job sacct -j <JOBID> @@ -190,8 +258,14 @@ sacct -j <JOBID> --format=JobID,JobName,ReqMem,MaxRSS,Elapsed ``` ## Send an Email Notification of your job: -ALL, BEGIN, END, FAIL -``` + +possible mail-types: `ALL, BEGIN, END, FAIL` + +```bash #SBATCH --mail-type=BEGIN # first have to state the type of event to occur #SBATCH --mail-user=<email@address.at> # and then your email address ``` + +## seff + +TODO 🚧 **being edited** 🚧 \ No newline at end of file diff --git a/docs/tmux.png b/docs/tmux.png new file mode 100644 index 0000000000000000000000000000000000000000..7218712d2d9449d86728f602a338e72ae6b92430 Binary files /dev/null and b/docs/tmux.png differ