Skip to content
Snippets Groups Projects
Commit 3d274ae3 authored by Michael Blaschek's avatar Michael Blaschek :bicyclist:
Browse files

updates

parent 94aa8897
No related branches found
No related tags found
No related merge requests found
This diff is collapsed.
...@@ -3,7 +3,6 @@ ...@@ -3,7 +3,6 @@
> High Performance Computing available to Staff > High Performance Computing available to Staff
> Austrian HPC effort > Austrian HPC effort
> part of EuroCC > part of EuroCC
>
![vsc](mkdocs/img/logo_vsc.png) ![vsc](mkdocs/img/logo_vsc.png)
...@@ -12,14 +11,15 @@ Links: ...@@ -12,14 +11,15 @@ Links:
- [VSC](https://vsc.ac.at/home/) - [VSC](https://vsc.ac.at/home/)
- [VSC-Wiki](https://wiki.vsc.ac.at) - [VSC-Wiki](https://wiki.vsc.ac.at)
- [EuroCC - Austria](https://eurocc-austria.at) - [EuroCC - Austria](https://eurocc-austria.at)
- [VSC Account](https://service.vsc.ac.at/clusteruser/login/)
We have the privilege to be part of the VSC and have private nodes at VSC-5 (since 2022), VSC-4 (since 2020) and VSC-3 (since 2014), which is retired by 2022. We have the privilege to be part of the VSC and have private nodes at VSC-5 (since 2022), VSC-4 (since 2020) and VSC-3 (since 2014), which is retired by 2022.
Access is primarily via SSH: Access is primarily via SSH:
```bash title='ssh to VSC' ```bash title='ssh to VSC'
$ ssh user@vsc5.vsc.ac.at ssh user@vsc5.vsc.ac.at
$ ssh user@vsc4.vsc.ac.at ssh user@vsc4.vsc.ac.at
``` ```
Please follow some connection instruction on the [wiki](https://wiki.vsc.ac.at) which is similar to all other servers (e.g. [SRVX1](Servers/SRVX1.md)). Please follow some connection instruction on the [wiki](https://wiki.vsc.ac.at) which is similar to all other servers (e.g. [SRVX1](Servers/SRVX1.md)).
...@@ -37,18 +37,21 @@ If you want you can use some shared shell scripts that provide information for u ...@@ -37,18 +37,21 @@ If you want you can use some shared shell scripts that provide information for u
``` ```
Please find the following commands available: Please find the following commands available:
- `imgw-quota` shows the current quota on VSC for both HOME and DATA - `imgw-quota` shows the current quota on VSC for both HOME and DATA
- `imgw-container` singularity/apptainer container run script, see [below](#containers) - `imgw-container` singularity/apptainer container run script, see [below](#containers)
- `imgw-transfersh` Transfer-sh service on [srvx1](https://srvx1.img.univie.ac.at/filetransfer), easily share small files. - `imgw-transfersh` Transfer-sh service on [wolke](https://transfersh.wolke.img.univie.ac.at), easily share small files.
- `imgw-cpuinfo` Show CPU information - `imgw-cpuinfo` Show CPU information
Please find a shared folder in `/gpfs/data/fs71386/imgw/shared` and add data there that needs to be used by multiple people. Please make sure that things are removed again as soon as possible. Thanks. Please find a shared folder in `/gpfs/data/fs71386/imgw/shared` and add data there that needs to be used by multiple people. Please make sure that things are removed again as soon as possible. Thanks.
## Node Informaiton VSC-5 ## Node Information VSC-5
There are usually two sockets per Node, which means 2 CPUs per Node.
```txt title='VSC-5 Compute Node' ```txt title='VSC-5 Compute Node'
CPU model: AMD EPYC 7713 64-Core Processor CPU model: AMD EPYC 7713 64-Core Processor
1 CPU, 64 physical cores per CPU, total 128 logical CPU units 2 CPU, 64 physical cores per CPU, total 256 logical CPU units
512 GB Memory 512 GB Memory
``` ```
...@@ -57,15 +60,17 @@ We have access to 11 private Nodes of that kind. We also have access to 1 GPU no ...@@ -57,15 +60,17 @@ We have access to 11 private Nodes of that kind. We also have access to 1 GPU no
```txt title='VSC-5 Quality of Service' ```txt title='VSC-5 Quality of Service'
$ sqos $ sqos
qos_name total used free walltime priority partitions qos name type total res used res free res walltime priority total n* used n* free n*
========================================================================= ================================================================================================================================
p71386_0512 11 0 11 10-00:00:00 100000 zen3_0512 p71386_0512 cpu 2816 2816 0 10-00:00:00 100000 11 11 0
p71386_a100dual 0 0 0 10-00:00:00 100000 zen3_0512_a100x2 p71386_a100dual gpu 2 0 2 10-00:00:00 100000 1 0 1
* node values do not always align with resource values since nodes can be partially allocated
``` ```
## Storage on VSC-5 ## Storage on VSC-5
the HOME and DATA partition are the same as on [VSC-4](#storage-on-vsc-4).
the HOME and DATA partition are the same as on [VSC-4](#storage-on-vsc-4).
![JET and VSC-5 holding hands](./mkdocs/img/jet_and_vsc5.png) ![JET and VSC-5 holding hands](./mkdocs/img/jet_and_vsc5.png)
...@@ -80,7 +85,7 @@ can be found on VSC-5 ...@@ -80,7 +85,7 @@ can be found on VSC-5
``` ```
You can use these directories as well for direct writing. The performance is higher on VSC-5 storage. This does not work on VSC-4. You can use these directories as well for direct writing. The performance is higher on VSC-5 storage. **This does not work on VSC-4.**
## Node Information VSC-4 ## Node Information VSC-4
...@@ -95,10 +100,13 @@ We have access to 5 private Nodes of that kind. We also have access to the jupyt ...@@ -95,10 +100,13 @@ We have access to 5 private Nodes of that kind. We also have access to the jupyt
```txt title='VSC-4 Quality of Service' ```txt title='VSC-4 Quality of Service'
$ sqos $ sqos
qos_name total used free walltime priority partitions qos name type total res used res free res walltime priority total n* used n* free n*
========================================================================= ================================================================================================================================
jupyter 20 1 19 3-00:00:00 1000 jupyter p71386_0384 cpu 480 288 192 10-00:00:00 100000 5 3 2
p71386_0384 5 0 5 10-00:00:00 100000 mem_0384 skylake_0096_jupyter cpu 288 12 276 3-00:00:00 1000 3 1 2
* node values do not always align with resource values since nodes can be partially allocated
``` ```
## Storage on VSC-4 ## Storage on VSC-4
...@@ -120,15 +128,16 @@ Check quotas running the following commands yourself, including your PROJECTID o ...@@ -120,15 +128,16 @@ Check quotas running the following commands yourself, including your PROJECTID o
$ mmlsquota --block-size auto -j data_fs71386 data $ mmlsquota --block-size auto -j data_fs71386 data
Block Limits | File Limits Block Limits | File Limits
Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks
data FILESET 4.027T 9.766T 9.766T 482.4M none | 176664 1000000 1000000 65 none vsc-storage.vsc4.opa data FILESET 66.35T 117.2T 117.2T 20.45G none | 4597941 5000000 5000000 1632 none vsc-storage.vsc4.opa
$ mmlsquota --block-size auto -j home_fs71386 home $ mmlsquota --block-size auto -j home_fs71386 home
Block Limits | File Limits Block Limits | File Limits
Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks
home FILESET 62.17G 100G 100G 207.8M none | 631852 1000000 1000000 287 none vsc-storage.vsc4.opa home FILESET 182.7G 200G 200G 921.6M none | 1915938 2000000 2000000 1269 none vsc-storage.vsc4.opa
``` ```
## Other Storage ## Other Storage
We have access to the Earth Observation Data Center [EODC](https://eodc.eu/data/), where one can find primarily the following data sets: We have access to the Earth Observation Data Center [EODC](https://eodc.eu/data/), where one can find primarily the following data sets:
- Sentinel-1, 2, 3 - Sentinel-1, 2, 3
...@@ -138,54 +147,65 @@ These datasets can be found directly via `/eodc/products/`. ...@@ -138,54 +147,65 @@ These datasets can be found directly via `/eodc/products/`.
We are given a private data storage location (`/eodc/private/uniwien`), where we can store up to 22 TB on VSC-4. However, that might change in the future. We are given a private data storage location (`/eodc/private/uniwien`), where we can store up to 22 TB on VSC-4. However, that might change in the future.
## Run time limits and queues
## Run time limits
VSC-5 queues and limits: VSC-5 queues and limits:
```bash title='VSC-4 Queues' ```bash title='VSC-5 Queues'
$ sacctmgr show qos format=name%20s,priority,grpnodes,maxwall,description%40s $ sacctmgr show qos format=name%20s,priority,grpnodes,maxwall,description%40s
Name Priority GrpNodes MaxWall Descr Name Priority GrpNodes MaxWall Descr
-------------------- ---------- -------- ----------- ---------------------------------------- -------------------- ---------- -------- ----------- ----------------------------------------
normal 0 1-00:00:00 Normal QOS default normal 0 1-00:00:00 Normal QOS default
idle_0096 1 1-00:00:00 vsc-4 idle nodes
idle_0384 1 1-00:00:00 vsc-4 idle nodes
idle_0768 1 1-00:00:00 vsc-4 idle nodes
jupyter 1000 3-00:00:00 nodes for jupyterhub on vsc4
long 1000 10-00:00:00 long running jobs on vsc-4
p71386_0384 100000 10-00:00:00 private nodes haimberger p71386_0384 100000 10-00:00:00 private nodes haimberger
idle_0512 1 1-00:00:00 vsc-5 idle nodes
idle_1024 1 1-00:00:00 vsc5 idle nodes
idle_2048 1 1-00:00:00 vsc5 idle nodes
zen2_0256_a40x2 2000 3-00:00:00 24 x a40 nodes with 32 cores each
zen2_0256_a40x2_tra+ 1000000 1-00:00:00 qos for training on a40 gpu nodes
zen3_0512_a100x2 1000 3-00:00:00 public qos for a100 gpu nodes zen3_0512_a100x2 1000 3-00:00:00 public qos for a100 gpu nodes
zen3_0512_a100x2_tr+ 1000000 1-00:00:00 qos for training on a100 gpu nodes
cascadelake_0384 2000 3-00:00:00 intel cascadelake nodes on vsc-4
zen3_0512 1000 3-00:00:00 vsc-5 regular cpu nodes with 512 gb of + zen3_0512 1000 3-00:00:00 vsc-5 regular cpu nodes with 512 gb of +
zen3_0512_devel 5000000 00:10:00 fast short qos for dev jobs zen3_0512_devel 5000000 00:10:00 fast short qos for dev jobs
zen3_1024 1000 3-00:00:00 vsc-5 regular cpu nodes with 1024 gb of+ zen3_1024 1000 3-00:00:00 vsc-5 regular cpu nodes with 1024 gb of+
zen3_2048 1000 3-00:00:00 vsc-5 regular cpu nodes with 2048 gb of+ zen3_2048 1000 3-00:00:00 vsc-5 regular cpu nodes with 2048 gb of+
idle_0512 1 1-00:00:00 vsc-5 idle nodes
idle_1024 1 1-00:00:00 vsc5 idle nodes
idle_2048 1 1-00:00:00 vsc5 idle nodes
```
The department has access to these partitions:
```sh title="VSC5 available partitions with QOS"
partition QOS
------------------------------------------------
cascadelake_0384 cascadelake_0384
zen2_0256_a40x2 zen2_0256_a40x2
zen3_0512_a100x2 zen3_0512_a100x2
zen3_0512 zen3_0512,zen3_0512_devel
zen3_1024 zen3_1024
zen3_2048 zen3_2048
``` ```
VSC-4 queues and limits: VSC-4 queues and limits:
```bash title='VSC-5 Queues' ```bash title='VSC-4 Queues'
$ sacctmgr show qos format=name%20s,priority,grpnodes,maxwall,description%40s $ sacctmgr show qos format=name%20s,priority,grpnodes,maxwall,description%40s
Name Priority GrpNodes MaxWall Descr Name Priority GrpNodes MaxWall Descr
-------------------- ---------- -------- ----------- ---------------------------------------- -------------------- ---------- -------- ----------- ----------------------------------------
p71386_0384 100000 10-00:00:00 private nodes haimberger p71386_0384 100000 10-00:00:00 private nodes haimberger
long 1000 10-00:00:00 long running jobs on vsc-4 long 1000 10-00:00:00 long running jobs on vsc-4
fast_vsc4 1000000 3-00:00:00 high priority access fast_vsc4 1000000 3-00:00:00 high priority access
idle_0096 1 1-00:00:00 vsc-4 idle nodes
idle_0384 1 1-00:00:00 vsc-4 idle nodes
idle_0768 1 1-00:00:00 vsc-4 idle nodes
mem_0096 1000 3-00:00:00 vsc-4 regular nodes with 96 gb of memory mem_0096 1000 3-00:00:00 vsc-4 regular nodes with 96 gb of memory
mem_0384 1000 3-00:00:00 vsc-4 regular nodes with 384 gb of memo+ mem_0384 1000 3-00:00:00 vsc-4 regular nodes with 384 gb of memo+
mem_0768 1000 3-00:00:00 vsc-4 regular nodes with 768 gb of memo+ mem_0768 1000 3-00:00:00 vsc-4 regular nodes with 768 gb of memo+
``` ```
The department has access to these partitions:
```sh title="VSC-4 available partitions with QOS"
partition QOS
--------------------------
skylake_0096 skylake_0096,skylake_0096_devel
skylake_0384 skylake_0384
skylake_0768 skylake_0768
```
**single/few core jobs are allocated to nodes n4901-0[01-72] and n4902-0[01-72] **
SLURM allows for setting a run time limit below the default QOS's run time limit. After the specified time is elapsed, the job is killed: SLURM allows for setting a run time limit below the default QOS's run time limit. After the specified time is elapsed, the job is killed:
```bash title="slurm time limit" ```bash title="slurm time limit"
...@@ -194,15 +214,15 @@ SLURM allows for setting a run time limit below the default QOS's run time limit ...@@ -194,15 +214,15 @@ SLURM allows for setting a run time limit below the default QOS's run time limit
Acceptable time formats include `minutes`, `minutes:seconds`, `hours:minutes:seconds`, `days-hours`, `days-hours:minutes` and `days-hours:minutes:seconds`. Acceptable time formats include `minutes`, `minutes:seconds`, `hours:minutes:seconds`, `days-hours`, `days-hours:minutes` and `days-hours:minutes:seconds`.
## Example Job ## Example Job
- [VSC Wiki Slurm](https://wiki.vsc.ac.at/doku.php?id=doku:slurm) - [VSC Wiki Slurm](https://wiki.vsc.ac.at/doku.php?id=doku:slurm)
- [VSC Wiki private Queue](https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_queue) - [VSC Wiki private Queue](https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_queue)
### Example Job on VSC ### Example Job on VSC
We have to use the following keywords to make sure that the correct partitions are used: We have to use the following keywords to make sure that the correct partitions are used:
- `--partition=mem_xxxx` (per email) - `--partition=mem_xxxx` (per email)
- `--qos=xxxxxx` (see below) - `--qos=xxxxxx` (see below)
- `--account=xxxxxx` (see below) - `--account=xxxxxx` (see below)
...@@ -231,15 +251,15 @@ Put this in the Job file (e.g. VSC-5 Nodes) ...@@ -231,15 +251,15 @@ Put this in the Job file (e.g. VSC-5 Nodes)
<mpirun -np 32 a.out> <mpirun -np 32 a.out>
``` ```
* **-J** job name - **-J** job name
* **-N** number of nodes requested (16 cores per node available) - **-N** number of nodes requested (16 cores per node available)
* **-n, --ntasks=<number>** specifies the number of tasks to run, - **-n, --ntasks=<number>** specifies the number of tasks to run,
* **--ntasks-per-node** number of processes run in parallel on a single node - **--ntasks-per-node** number of processes run in parallel on a single node
* **--ntasks-per-core** number of tasks a single core should work on - **--ntasks-per-core** number of tasks a single core should work on
* **srun** is an alternative command to **mpirun**. It provides direct access to SLURM inherent variables and settings. - **srun** is an alternative command to **mpirun**. It provides direct access to SLURM inherent variables and settings.
* **-l** adds task-specific labels to the beginning of all output lines. - **-l** adds task-specific labels to the beginning of all output lines.
* **--mail-type** sends an email at specific events. The SLURM doku lists the following valid mail-type values: *"BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL and REQUEUE), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of time limit). Multiple type values may be specified in a comma separated list."* [cited from the SLURM doku](http://slurm.schedmd.com) - **--mail-type** sends an email at specific events. The SLURM doku lists the following valid mail-type values: _"BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL and REQUEUE), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of time limit). Multiple type values may be specified in a comma separated list."_ [cited from the SLURM doku](http://slurm.schedmd.com)
* **--mail-user** sends an email to this address - **--mail-user** sends an email to this address
```bash title="slurm basic commands" ```bash title="slurm basic commands"
sbatch check.slrm # to submit the job sbatch check.slrm # to submit the job
...@@ -248,6 +268,39 @@ scancel JOBID # for premature removal, where JOBID ...@@ -248,6 +268,39 @@ scancel JOBID # for premature removal, where JOBID
# is obtained from the previous command # is obtained from the previous command
``` ```
### Example of multiple simulations inside one job
Sample Job when for running multiple mpi jobs on a VSC-4 node.
Note: The “mem_per_task” should be set such that
`mem_per_task * mytasks < mem_per_node - 2Gb`
The approx 2Gb reduction in available memory is due to operating system stored in memory. For a standard node with 96 Gb of Memory this would be eg.:
`23 Gb * 4 = 92 Gb < 94 Gb`
```bash title="VSC-4 example concurrent job"
#!/bin/bash
#SBATCH -J many
#SBATCH -N 1
# ... other slurm directives
# disable resources consumption by subsequent srun calls.
export SLURM_STEP_GRES=none
mytasks=4
cmd="stress -c 24"
mem_per_task=10G
for i in `seq 1 $mytasks`
do
srun --mem=$mem_per_task --cpus-per-task=2 --ntasks=1 $cmd &
done
wait
```
## Software ## Software
The VSC use the same software system as Jet and have environmental modules available to the user: The VSC use the same software system as Jet and have environmental modules available to the user:
...@@ -284,17 +337,21 @@ import sys, site ...@@ -284,17 +337,21 @@ import sys, site
sys.path.append(site.site.getusersitepackages()) sys.path.append(site.site.getusersitepackages())
# This will add the correct path. # This will add the correct path.
``` ```
Then you will be able to load all packages that are located in the user site. Then you will be able to load all packages that are located in the user site.
## Containers ## Containers
We can use complex software that is contained in [singularity](https://singularity.hpcng.org/) containers [(doc)](https://singularity.hpcng.org/user-docs/master/) and can be executed on VSC-4. Please consider using one of the following containers: We can use complex software that is contained in [singularity](https://singularity.hpcng.org/) containers [(doc)](https://singularity.hpcng.org/user-docs/master/) and can be executed on VSC-4. Please consider using one of the following containers:
- `py3centos7anaconda3-2020-07-dev` - `py3centos7anaconda3-2020-07-dev`
located in the `$DATA` directory of IMGW: `/gpfs/data/fs71386/imgw` located in the `$DATA` directory of IMGW: `/gpfs/data/fs71386/imgw`
### How to use? ### How to use?
Currently there is only one container with a run script. Currently there is only one container with a run script.
```bash ```bash
# The directory of the containers # The directory of the containers
/gpfs/data/fs71386/imgw/run.sh [arguments] /gpfs/data/fs71386/imgw/run.sh [arguments]
...@@ -317,30 +374,33 @@ In principle, a run script needs to do only 3 things: ...@@ -317,30 +374,33 @@ In principle, a run script needs to do only 3 things:
It is necessary to set the `SINGULARITY_BIND` because the `$HOME` and `$DATA` or `$BINFS` path are no standard linux paths, therefore the container linux does not know about these and accessing files from within the container is not possible. In the future if you have problems with accessing other paths, adding them to the `SINGULARITY_BIND` might fix the issue. It is necessary to set the `SINGULARITY_BIND` because the `$HOME` and `$DATA` or `$BINFS` path are no standard linux paths, therefore the container linux does not know about these and accessing files from within the container is not possible. In the future if you have problems with accessing other paths, adding them to the `SINGULARITY_BIND` might fix the issue.
In principe one can execute the container like this: In principe one can execute the container like this:
```bash ```bash
# check if the module is loaded # check if the module is loaded
module load singularity $ module load singularity
# just run the container initiating the builting runscript (running ipython): # just run the container initiating the building runscript (running ipython):
[mblasch@l44 imgw]$ /gpfs/data/fs71386/imgw/py3centos7anaconda3-2020-07-dev.sif $ /gpfs/data/fs71386/imgw/py3centos7anaconda3-2020-07-dev.sif
Python 3.8.3 (default, Jul 2 2020, 16:21:59) Python 3.8.3 (default, Jul 2 2020, 16:21:59)
Type 'copyright', 'credits' or 'license' for more information Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help. IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: In [1]:
In [2]: %env DATA In [2]: %env DATA
Out[2]: '/gpfs/data/fs71386/mblasch' Out[2]: '/gpfs/data/fs71386/USER'
In [3]: ls /gpfs/data/fs71386/mblasch In [3]: ls /gpfs/data/fs71386/USER
ls: cannot access /gpfs/data/fs71386/mblasch: No such file or directory ls: cannot access /gpfs/data/fs71386/USER: No such file or directory
# Please note here that the path is not available, because we did not use the SINGULARITY_BIND # Please note here that the path is not available, because we did not use the SINGULARITY_BIND
``` ```
### What is inside the container? ### What is inside the container?
In principe you can check what is inside by using In principe you can check what is inside by using
```bash
module load singularity ```bash title="Inspect a Singularity/Apptainer container"
singularity inspect py3centos7anaconda3-2020-07-dev.sif $ module load singularity
$ singularity inspect py3centos7anaconda3-2020-07-dev.sif
author: M.Blaschek author: M.Blaschek
dist: anaconda2020.07 dist: anaconda2020.07
glibc: 2.17 glibc: 2.17
...@@ -354,18 +414,19 @@ org.label-schema.usage.singularity.version: 3.8.1-1.el8 ...@@ -354,18 +414,19 @@ org.label-schema.usage.singularity.version: 3.8.1-1.el8
os: centos7 os: centos7
python: 3.8 python: 3.8
``` ```
which shows you some information on the container, e.g. Centos 7 is installed, python 3.8, and glibc 2.17. which shows you some information on the container, e.g. Centos 7 is installed, python 3.8, and glibc 2.17.
But you can also check the applications inside But you can also check the applications inside
```bash ```bash title="Execute commands inside a container"
# List all executables inside the container # List all executables inside the container
py3centos7anaconda3-2020-07-dev.sif ls /opt/view/bin $ py3centos7anaconda3-2020-07-dev.sif ls /opt/view/bin
# or using conda for the environment # or using conda for the environment
py3centos7anaconda3-2020-07-dev.sif conda info $ py3centos7anaconda3-2020-07-dev.sif conda info
# for the package list # for the package list
py3centos7anaconda3-2020-07-dev.sif conda list $ py3centos7anaconda3-2020-07-dev.sif conda list
``` ```
which shows something like this which shows something like this
...@@ -788,20 +849,19 @@ which shows something like this ...@@ -788,20 +849,19 @@ which shows something like this
zstd 1.5.0 ha95c52a_0 conda-forge zstd 1.5.0 ha95c52a_0 conda-forge
``` ```
## Debugging on VSC-4 ## Debugging on VSC-4
Currently (6.2021) there is no development queue on VSC-4 and the support suggested to do the following: Currently (6.2021) there is no development queue on VSC-4 and the support suggested to do the following:
```bash ```bash title="Debuging on VSC-4"
# Request resources from slurm (-N 1, a full Node) # Request resources from slurm (-N 1, a full Node)
salloc -N 1 -p mem_0384 --qos p71386_0384 --no-shell $ salloc -N 1 -p mem_0384 --qos p71386_0384 --no-shell
# Once the node is assigned / job is running # Once the node is assigned / job is running
# Check with # Check with
squeue -u $USER $ squeue -u $USER
# connect to the Node with ssh # connect to the Node with ssh
ssh [Node] $ ssh [Node]
# test and debug the model there. # test and debug the model there.
``` ```
otherwise you can access one of the `*_devel` queues/partitions and submit short test jobs to check your setup.
\ No newline at end of file
mkdocs/img/cluster-computing.png

17.4 KiB

mkdocs/img/data.png

23.2 KiB

mkdocs/img/gpu.png

35.7 KiB

...@@ -11,20 +11,67 @@ class: title, center, middle ...@@ -11,20 +11,67 @@ class: title, center, middle
![](../img/logo_fgga2.svg) ![](../img/logo_fgga2.svg)
9.10.2023 - M. Blaschek - PhD Seminar @ IMGW 18.03.2024 - M. Blaschek - PhD Seminar @ IMGW
--- ---
# What can you expect from this presentation? # What can you expect from this presentation?
- Asking Questions (anytime, please interrupt) - Asking Questions (anytime, please interrupt) - This presentation is for you!
--
- Hardware - Hardware
- Data and Storage - Data and Storage
--
- Services - Services
- Teachinghub via Moodle
- Rules - Rules
- Updates --
- Updates
- VSCode
--
---
# Bare metal @ Department
.left-column[
### SRVX1
.right[<img src="../img/cpu.png" width="50px"><img src="../img/cpu.png" width="50px"><img src="../img/cpu.png" width="50px"><img src="../img/cpu.png" width="50px">]
Arsenal, **Development & Teaching Node**
- Storage (800 TB)
### JET
.right[<img src="../img/cpu.png" width="50px"><img src="../img/cpu.png" width="50px"><br><img src="../img/cluster-computing.png" width="100px">]
Arsenal, **Computing Cluster**
- 2x Login (JET01 (vnc),JET02 (hub))
- 7x Compute
- Storage (3.5 PB)
- SLURM
- JET2VSC
]
.right-column[
### AURORA
.right[<img src="../img/cpu.png" width="50px"><img src="../img/cpu.png" width="50px">]
Arsenal, **Development & Visual (VNC) Node**
### VSC
.right[<img src="../img/cpu.png" width="50px"><img src="../img/cpu.png" width="50px"><br><img src="../img/cluster-computing.png" width="100px"><br><img src="../img/gpu.png" width="100px">]
<img src="../img/logo_vsc.png" width="50px">
<br>Arsenal, **HPC Cluster**
- VSC4 5x Nodes
- **VSC5 11 Nodes**
- VSC5 1x GPU
- Shared HOME (200GB)
- Shared DATA (100TB)
- **# of files!!!**
- SLURM
- Projects
]
--- ---
......
...@@ -26,6 +26,15 @@ h1, h2, h3 {font-weight: normal; font-weight: 400; margin-bottom: 0;} ...@@ -26,6 +26,15 @@ h1, h2, h3 {font-weight: normal; font-weight: 400; margin-bottom: 0;}
color: hsl(330, 75%, 100%); color: hsl(330, 75%, 100%);
} }
.right {
text-align: right;
float: right;
}
.left {
text-align: left;
float: left;
}
/* Two-column layouts */ /* Two-column layouts */
.left-column { width: 49%; float: left; } .left-column { width: 49%; float: left; }
.right-column { width: 49%; float: right; } .right-column { width: 49%; float: right; }
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment