Update Slurm.md and JET.md with job efficiency report examples

2bf9fcb7 · Michael Blaschek · e7584b53 · 2bf9fcb7 · 2bf9fcb7 · 2bf9fcb7
Commit 2bf9fcb7 authored 7 months ago by Michael Blaschek
--- a/ECMWF/Data.md
+++ b/ECMWF/Data.md
@@ -14,7 +14,6 @@ What ECMWF says: [here](https://www.ecmwf.int/en/forecasts/access-forecasts/acce
 - using [MARS web api](../Python/QA-012-Mars-Requests.ipynb) in Python
 - [using ECMWF mars web interface](https://apps.ecmwf.int/archive-catalogue/?class=od) using the archive catalogue.

-
 ## General Regularly distributed Information in Binary form (GRIB)

 GRIB is a binary format, and the data is packed to increase storage efficiency. GRIB messages are often concatenated together to form a GRIB file. GRIB files usually have the extension .grib, .grb or .gb.
@@ -58,4 +57,74 @@ grib_ls input.grb
 # split by model lvel type
 grib_copy input.grb output_[typeOfLevel].grb
 ```
+
 more information can be found at [ECMWF](https://confluence.ecmwf.int/display/OIFS/How+to+convert+GRIB+to+netCDF)
+
+## Example of an efficient MARS request
+
+It is not easy to write good MARS requests, because there are so many parameters.
+
+A few things are important:
+1. NetCDF does not handle well different times and steps.
+2. Retrieving should loop by experiment, date, time and retrieve as much as possible
+3. Check the catalogue first and do a "estimate download size" check. Look for the number of tapes. If it is one tape, then you are fine.
+
+[MARS keywords](https://confluence.ecmwf.int/display/UDOC/Keywords+in+MARS+and+Dissemination+requests)
+[HRES Guide](https://confluence.ecmwf.int/display/UDOC/HRES%3A+Atmospheric+%28oper%29%2C+Model+level+%28ml%29%2C+Forecast+%28fc%29%3A+Guidelines+to+write+efficient+MARS+requests
+)
+
+```sh title="MARS retrieval of HRES operational forecast"
+#!/bin/bash
+
+# this example will filter the area of Europe (N/W/S/E) and interpolate the final fields to
+# a 0.5x0.5 regular lat-lon grid (GRID=0.5/0.5)
+AREA="73.5/-27/33/45"
+GRID="0.5/0.5"
+
+# fixed selection from the same block
+PARAMS="130/131/132"
+LEVELIST="127/128/129/130/131/132/133/134/135/136/137"
+STEP="0/to/90/by/1"
+
+TIMES="0000 1200"
+YEAR="2017"
+MONTH="04"
+
+#date loop
+for y in ${YEAR}; do
+    for m in ${MONTH}; do
+        #get the number of days for this particular month/year
+        days_per_month=$(cal ${m} ${y} | awk 'NF {DAYS = $NF}; END {print DAYS}')
+
+        for my_date in $(seq -w 1 ${days_per_month}); do
+            my_date=${YEAR}${m}${my_date}
+
+            #time loop
+            for my_time in ${TIMES}; do
+                cat << EOF > my_request_${my_date}_${my_time}.mars
+RETRIEVE,
+    CLASS      = OD,
+    TYPE       = FC,
+    STREAM     = OPER,
+    EXPVER     = 0001,
+    LEVTYPE    = ML,
+    GRID       = ${GRID},
+    AREA       = ${AREA},
+    LEVELIST   = ${LEVELIST},
+    PARAM      = ${PARAMS},
+    DATE       = ${my_date},
+    TIME       = ${my_time},
+    STEP       = ${STEP},
+    TARGET     = "oper_ml_${my_date}_${my_time}.grib"
+EOF
+
+            # request all times together
+            mars my_request_${my_date}_${my_time}.mars
+            if [ $? -eq 0 ]; then
+                rm -f my_request_${my_date}_${my_time}.mars
+            fi
+            done
+        done
+    done
+done
+```
--- a/Misc/Slurm.md
+++ b/Misc/Slurm.md
@@ -97,5 +97,31 @@ sacct -j [jobid]
 # get a jobs efficiency report
 seff [jobid]
 # example
-seff 
+# example showing only 3% memory and 45% cpu efficiency!
+seff 2614735
+Job ID: 2614735
+Cluster: cluster
+User/Group: /vscusers
+State: COMPLETED (exit code 0)
+Nodes: 1
+Cores per node: 30
+CPU Utilized: 01:00:33
+CPU Efficiency: 41.05% of 02:27:30 core-walltime
+Job Wall-clock time: 00:04:55
+Memory Utilized: 596.54 MB
+Memory Efficiency: 2.91% of 20.00 GB
+```
+
+There is a helpful [script](seff-array.py) that can report job efficiency for job arrays too.
+
+??? note "seff-array.py"
+
+    ``` sh title="seff-array.py"
+    --8<-- "seff-array.py"
+    ```
+One can use that to get more detailed information on a job array:
+
+```sh title="Running job efficiency report array"
+# usually one needs to install a few dependencies first.
+
 ```
--- a/Misc/seff-array.py
+++ b/Misc/seff-array.py
+#!/usr/bin/env python3
+
+import argparse
+import subprocess
+import sys
+
+import numpy as np
+import pandas as pd
+
+from io import StringIO
+import os
+
+import termplotlib as tpl
+
+__version__ = 0.4
+debug = False
+
+
+def time_to_float(time):
+    """ converts [dd-[hh:]]mm:ss time to seconds """
+    if isinstance(time, float):
+        return time
+    days, hours = 0, 0
+
+    if "-" in time:
+        days = int(time.split("-")[0]) * 86400
+        time = time.split("-")[1]
+    time = time.split(":")
+
+    if len(time) > 2:
+        hours = int(time[0]) * 3600
+
+    mins = int(time[-2]) * 60
+    secs = float(time[-1])
+
+    return days + hours + mins + secs
+
+#@profile
+def job_eff(job_id=0, cluster=os.getenv('SLURM_CLUSTER_NAME')):
+
+    if job_id==0:
+        df_short = pd.read_csv('seff_test_oneline.csv', sep='|')
+        df_long = pd.read_csv('seff_test.csv', sep='|')
+    else:
+        fmt = '--format=JobID,JobName,Elapsed,ReqMem,ReqCPUS,Timelimit,State,TotalCPU,NNodes,User,Group,Cluster'
+        if cluster != None:
+            q = f'sacct -X --units=G -P {fmt} -j {job_id} --cluster {cluster}'
+        else:
+            q = f'sacct -X --units=G -P {fmt} -j {job_id}'
+        res = subprocess.check_output([q], shell=True)
+        res = str(res, 'utf-8')
+        df_short = pd.read_csv(StringIO(res), sep='|')
+
+        fmt = '--format=JobID,JobName,Elapsed,ReqMem,ReqCPUS,Timelimit,State,TotalCPU,NNodes,User,Group,Cluster,MaxVMSize'
+        if cluster != None:
+            q = f'sacct --units=G -P {fmt} -j {job_id} --cluster {cluster}'
+        else:
+            q = f'sacct --units=G -P {fmt} -j {job_id}'
+        res = subprocess.check_output([q], shell=True)
+        res = str(res, 'utf-8')
+        df_long = pd.read_csv(StringIO(res), sep='|')
+
+
+    # filter out pending and running jobs
+    finished_state = ['COMPLETED', 'FAILED', 'OUT_OF_MEMORY', 'TIMEOUT', 'PREEMPTEED']
+    df_long_finished = df_long[df_long.State.isin(finished_state)]
+
+    if len(df_long_finished) == 0:
+        print(f"No jobs in {job_id} have completed.")
+        return -1
+        
+    # cleaning
+    df_short = df_short.fillna(0.)
+    df_long  = df_long.fillna(0.)
+
+    df_long['JobID'] = df_long.JobID.map(lambda x: x.split('.')[0])
+    df_long['MaxVMSize'] = df_long.MaxVMSize.str.replace('G', '').astype('float')
+    df_long['ReqMem'] = df_long.ReqMem.str.replace('G', '').astype('float')
+    df_long['TotalCPU'] = df_long.TotalCPU.map(lambda x: time_to_float(x))
+    df_long['Elapsed'] = df_long.Elapsed.map(lambda x: time_to_float(x))
+    df_long['Timelimit'] = df_long.Timelimit.map(lambda x: time_to_float(x))
+
+    # job info
+    if isinstance(df_short['JobID'][0], np.int64):
+        job_id = df_short['JobID'][0]
+        array_job = False
+    else:
+        job_id = df_short['JobID'][0].split('_')[0]
+        array_job = True
+    
+    job_name = df_short['JobName'][0]
+    cluster = df_short['Cluster'][0]
+    user = df_short['User'][0]
+    group = df_short['Group'][0]
+    nodes = df_short['NNodes'][0]
+    cores = df_short['ReqCPUS'][0]
+    req_mem = df_short['ReqMem'][0]
+    req_time = df_short['Timelimit'][0]
+    
+    print("--------------------------------------------------------")
+    print("Job Information")
+    print(f"ID: {job_id}")
+    print(f"Name: {job_name}")
+    print(f"Cluster: {cluster}")
+    print(f"User/Group: {user}/{group}")
+    print(f"Requested CPUs: {cores} cores on {nodes} node(s)")
+    print(f"Requested Memory: {req_mem}")
+    print(f"Requested Time: {req_time}")
+    print("--------------------------------------------------------")
+    
+    print("Job Status")
+    states = np.unique(df_short['State'])
+    for s in states:
+        print(f"{s}: {len(df_short[df_short.State == s])}")
+    print("--------------------------------------------------------")
+    
+    # filter out pending and running jobs
+    finished_state = ['COMPLETED', 'FAILED', 'OUT_OF_MEMORY', 'TIMEOUT', 'PREEMPTEED']
+    df_long_finished = df_long[df_long.State.isin(finished_state)]    
+
+    if len(df_long_finished) == 0:
+        print(f"No jobs in {job_id} have completed.")
+        return -1
+    
+    cpu_use =  df_long_finished.TotalCPU.loc[df_long_finished.groupby('JobID')['TotalCPU'].idxmax()]
+    time_use = df_long_finished.Elapsed.loc[df_long_finished.groupby('JobID')['Elapsed'].idxmax()]
+    mem_use =  df_long_finished.MaxVMSize.loc[df_long_finished.groupby('JobID')['MaxVMSize'].idxmax()]
+    cpu_eff = np.divide(np.divide(cpu_use.to_numpy(), time_use.to_numpy()),cores)
+
+    print("--------------------------------------------------------")
+    print("Finished Job Statistics")
+    print("(excludes pending, running, and cancelled jobs)")
+    print(f"Average CPU Efficiency {cpu_eff.mean()*100:.2f}%")
+    print(f"Average Memory Usage {mem_use.mean():.2f}G")
+    print(f"Average Run-time {time_use.mean():.2f}s")
+    print("---------------------")
+    
+    if array_job:
+        print('\nCPU Efficiency (%)\n---------------------')
+        fig = tpl.figure()
+        h, bin_edges = np.histogram(cpu_eff*100, bins=np.linspace(0,100,num=11))
+        fig.hist(h, bin_edges, orientation='horizontal')
+        fig.show()
+        
+        print('\nMemory Efficiency (%)\n---------------------')
+        fig = tpl.figure()
+        h, bin_edges = np.histogram(mem_use*100/float(req_mem[0:-1]), bins=np.linspace(0,100,num=11))
+        fig.hist(h, bin_edges, orientation='horizontal')
+        fig.show()
+        
+        print('\nTime Efficiency (%)\n---------------------')
+        fig = tpl.figure()
+        h, bin_edges = np.histogram(time_use*100/time_to_float(req_time), bins=np.linspace(0,100,num=11))
+        fig.hist(h, bin_edges, orientation='horizontal')
+        fig.show()
+
+    print("--------------------------------------------------------")
+
+if __name__ == "__main__":
+
+    desc = (
+        """
+    seff-array v%s
+    https://github.com/ycrc/seff-array
+    ---------------
+    An extension of the Slurm command 'seff' designed to handle job arrays and display information in a histogram.
+
+    To use seff-array on the job array with ID '12345678', simply run 'seff-array 12345678'.
+
+    Other things can go here in the future.
+    -----------------
+    """
+        % __version__
+    )
+
+    parser = argparse.ArgumentParser(
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        description=desc,
+    )
+    parser.add_argument("jobid")
+    parser.add_argument("-c", "--cluster", action="store", dest="cluster")
+    parser.add_argument('--version', action='version',  version='%(prog)s {version}'.format(version=__version__))
+    args = parser.parse_args()
+
+    job_eff(args.jobid, args.cluster)
--- a/Servers/JET.md
+++ b/Servers/JET.md
@@ -274,7 +274,22 @@ There is some more information about how to use slurm:

 ### Job efficiency reports

-since 2024 there is a new feature that allows to check how well one's jobs ran and get information on the efficiency of the resources used. This can be helpful to optimize your workflow and make room for other users to user the cluster simultaneaous.
-
-
-TODO 🚧 **being edited** 🚧
\ No newline at end of file
+since 2024 there is a new feature that allows to check how well one's jobs ran and get information on the efficiency of the resources used. The report is available once the job has finished. 
+
+```sh title="Job efficiency report"
+# get a jobs efficiency report
+seff [jobid]
+# example showing only 3% memory and 45% cpu efficiency!
+seff 2614735
+Job ID: 2614735
+Cluster: cluster
+User/Group: /vscusers
+State: COMPLETED (exit code 0)
+Nodes: 1
+Cores per node: 30
+CPU Utilized: 01:00:33
+CPU Efficiency: 41.05% of 02:27:30 core-walltime
+Job Wall-clock time: 00:04:55
+Memory Utilized: 596.54 MB
+Memory Efficiency: 2.91% of 20.00 GB
+```
\ No newline at end of file
--- a/Storage.md
+++ b/Storage.md
+# Storage
+Everybody needs data and produces results. *But where should all these different data go to?*
+
+## Limits
+
+| System | Path             | Quota                | Feature      | Note                                    |
+| ------ | ---------------- | -------------------- | ------------ | --------------------------------------- |
+| AURORA | `/srvfs/home`    | 200 GB / 1000k files | daily Backup | other than staff get 100 GB/ 100k files |
+| JET    | `/jetfs/home`    | 100 GB / 500k files  | daily Backup |                                         |
+| JET    | `/jetfs/scratch` | no                   | no Backup    |                                         |
+
+## Where can data be?
+
+| Data Type          | Location | Note                                                                                     |
+| ------------------ | -------- | ---------------------------------------------------------------------------------------- |
+| source code        | HOME     | use git repo for source control                                                          |
+| personal info      | HOME     | nobody but you should have access. perm: `drwx------.`                                   |
+| model output       | SCRATCH  | small and large files do not need backup                                                 |
+| important results  | HOME     | within your quota limits                                                                 |
+| input data         | SCRATCH  | if this is only your input data, otherwise                                               |
+| input data         | SHARED   | `/jetfs/shared-data` or `/srvfs/shared`                                                  |
+| important data     | DATA     | `/srvfs/data` is backed up, daily.                                                       |
+| collaboration data | WEBDATA  | `/srvfs/webdata`, accessible via [webdata.wolke](https://webdata.wolke.img.univie.ac.at) |
+
+**Remember: All data needs to be evaluated after some time and removed.**
+
+## Long term storage
+
+The ZID of the University of Vienna offers an archive system, where data can be stored for at least 3 years. If you have data that needs to be stored for some time, but not easily accessible, you can request the data to be sent to the archive:
+
+```sh title="Request data to be archived"
+# Only the admin can issue the transfer, but you can create the request 
+# and add some documentation.
+# You can add a notification if the data can be deleted after the 3 years 
+# or should be downloaded again.
+userservices archive -h
+
+# Create an archive request
+
+```
+
+## Publishing data
+
+There are various data hubs, that can store your data following the [FAIR principles](https://www.go-fair.org/fair-principles/) and based on your [data management plan](https://zid.univie.ac.at/en/research/planning-data-management/).
+
+External Hubs:
+- [Zenodo (up to 50-200 GB/100 files)](https://zenodo.org)
+- []()
+
+The University of Vienna offers not yet a comparable service that can host large data sets on longer time scales.
+
+The department of Meteorology and Geophysics has established a collaboration, [Cloud4Geo](https://www.digitaluniversityhub.eu/dx-initiativen/alle-initiativen/in-forschung/cloud4geo), to allow such long term storage of research data and share it with the scientific community. The data is made available via the Earth Observation Data Centre [EODC](https://eodc.eu).