@@ -14,7 +14,6 @@ What ECMWF says: [here](https://www.ecmwf.int/en/forecasts/access-forecasts/acce
- using [MARS web api](../Python/QA-012-Mars-Requests.ipynb) in Python
-[using ECMWF mars web interface](https://apps.ecmwf.int/archive-catalogue/?class=od) using the archive catalogue.
## General Regularly distributed Information in Binary form (GRIB)
GRIB is a binary format, and the data is packed to increase storage efficiency. GRIB messages are often concatenated together to form a GRIB file. GRIB files usually have the extension .grib, .grb or .gb.
...
...
@@ -58,4 +57,74 @@ grib_ls input.grb
# split by model lvel type
grib_copy input.grb output_[typeOfLevel].grb
```
more information can be found at [ECMWF](https://confluence.ecmwf.int/display/OIFS/How+to+convert+GRIB+to+netCDF)
\ No newline at end of file
more information can be found at [ECMWF](https://confluence.ecmwf.int/display/OIFS/How+to+convert+GRIB+to+netCDF)
## Example of an efficient MARS request
It is not easy to write good MARS requests, because there are so many parameters.
A few things are important:
1. NetCDF does not handle well different times and steps.
2. Retrieving should loop by experiment, date, time and retrieve as much as possible
3. Check the catalogue first and do a "estimate download size" check. Look for the number of tapes. If it is one tape, then you are fine.
@@ -274,7 +274,22 @@ There is some more information about how to use slurm:
### Job efficiency reports
since 2024 there is a new feature that allows to check how well one's jobs ran and get information on the efficiency of the resources used. This can be helpful to optimize your workflow and make room for other users to user the cluster simultaneaous.
TODO 🚧 **being edited** 🚧
\ No newline at end of file
since 2024 there is a new feature that allows to check how well one's jobs ran and get information on the efficiency of the resources used. The report is available once the job has finished.
```sh title="Job efficiency report"
# get a jobs efficiency report
seff [jobid]
# example showing only 3% memory and 45% cpu efficiency!
| source code | HOME | use git repo for source control |
| personal info | HOME | nobody but you should have access. perm: `drwx------.` |
| model output | SCRATCH | small and large files do not need backup |
| important results | HOME | within your quota limits |
| input data | SCRATCH | if this is only your input data, otherwise |
| input data | SHARED | `/jetfs/shared-data` or `/srvfs/shared` |
| important data | DATA | `/srvfs/data` is backed up, daily. |
| collaboration data | WEBDATA | `/srvfs/webdata`, accessible via [webdata.wolke](https://webdata.wolke.img.univie.ac.at) |
**Remember: All data needs to be evaluated after some time and removed.**
## Long term storage
The ZID of the University of Vienna offers an archive system, where data can be stored for at least 3 years. If you have data that needs to be stored for some time, but not easily accessible, you can request the data to be sent to the archive:
```sh title="Request data to be archived"
# Only the admin can issue the transfer, but you can create the request
# and add some documentation.
# You can add a notification if the data can be deleted after the 3 years
# or should be downloaded again.
userservices archive -h
# Create an archive request
```
## Publishing data
There are various data hubs, that can store your data following the [FAIR principles](https://www.go-fair.org/fair-principles/) and based on your [data management plan](https://zid.univie.ac.at/en/research/planning-data-management/).
External Hubs:
-[Zenodo (up to 50-200 GB/100 files)](https://zenodo.org)
-[]()
The University of Vienna offers not yet a comparable service that can host large data sets on longer time scales.
The department of Meteorology and Geophysics has established a collaboration, [Cloud4Geo](https://www.digitaluniversityhub.eu/dx-initiativen/alle-initiativen/in-forschung/cloud4geo), to allow such long term storage of research data and share it with the scientific community. The data is made available via the Earth Observation Data Centre [EODC](https://eodc.eu).