- How to obtain climatological datasets that match the format of ÖKS15 data?
- Introduction
- Overview
- Variables and naming convention
- Accessing datasets and variables
- ÖKS15
- SPARTACUS
- EURO-CORDEX
- E-OBS
- Destination Earth
- ERA5
- Interpolation of climatological data (Remapping)
- Process to match the spatial extent of ÖKS15 data (Cutting)
- highrescut
- Repository structure
- Availability and installation
- Authors
- Acknowledgemnts
How to obtain climatological datasets that match the format of ÖKS15 data?
Introduction
Within the HighResLearn project our goal is to write and publish a subsetting function, respectively a so-called data recipe, that easy accesses and processes several climatological datasets, including reanalysis, high-resolution model output and observational data. In addition to the ability to compare global and regional model results on the same domain, the developments form the basis for innovative approaches to investigate km-scale climate models using machine learning algorithms.
Overview
To this end, the following 6 datasets will be considered:
Name | Category | Spatial res. | Temporal res. | Domain |
---|---|---|---|---|
ÖKS15 | model | 1 km | daily | Austria |
SPARTACUS | observational | 1 km | daily | Austria |
EURO-CORDEX | model | 12.5 km | daily | Europe |
E-OBS | observational | 11 km | daily | Europe |
Destination Earth | model | 5 km | hourly | global |
ERA5(-Land) | reanalysis | 30 km (9km) | hourly | global (Land) |
Table 1: Selection of datasets to be used in this data recipes, inlcuding the axpproximate resolution and covered domain.
Variables and naming convention
In particular, for the purpose of the project, we are only interested in temperature and precipitation, whereby other atmospheric variables can also be considered depending on the needs of the community. The datasets used have different naming conventions for the variables mentioned before. These deviating variable names and their units are addressed and resolved in the recipes provided by a a standardized adaption to the ÖKS15 format:
Variable | ÖKS15 | SPARTACUS |
---|---|---|
Temperature | tas [°C] | (TN+TX)/2 [°C] |
Precipitation | pr [kg m-2] | daily precipitation sum (RR) [kg m-2] |
Variable | EURO-CORDEX | E-OBS |
---|---|---|
Temperature | 2m_air_temperature (tas) [K] | daily mean temperature (tg) [°C] |
Precipitation | precipitation flux (pr) [kg m-2 s-1] | daily precipitation sum (rr) |
Variable | Destination Earth | ERA5-(Land) |
---|---|---|
Temperature | 2 metre temperature (t2m) [K] | 2m_temperature (t2m) [K] |
Precipitation | Total precipitation rate (tprate) [kg m-2 s-1] | total precipitation (tp) [m/h] |
Table 2: Variable names and units of the used datasets.
Note: Hourly data is generally available for ERA5 and Destination Earth data.
To convert Kelvin (K) to degrees Celsius (°C), we use the formula:
tas_{\text{°C}} = tas_{\text{K}} - 273.15
To convert ERA5-Land hourly total precipitation data from meters into the total precipitation for a day (mm): pr_{\text{daily}} = pr_{\text{d+1 00UTC}} \times 1000
To convert a precipiation flux respectively a precipitation rate into daily precipitation, we are following these steps:
1. Unterstand the units:
- Precipitation flux is given in kg per square meter per second [kg m-2 s-1].
- Daily precipitation is required in kg per square meter per day [kg m-2 d-1].
- Since 1 mm precipitation = 1 kg m-2, daily precipitation in kg m-2 is numerically the same as daily precipitation in mm.
2. Convert seconds to a day: There are 86.400 seconds in a day (24 hours x 60 minutes x 60 seconds).
3. Perform the conversion: pr_{\text{daily}} = pr_{\text{flux}} \times 86400
Accessing datasets and variables
Note that Destination Earth and ERA5 data are available after registration via the respective platforms, while all other datasets are freely accessible.
ÖKS15
To analyze regional climate change in Austria the latest generation of regional climate models can be used. As part of the European branch of the Regional Downscaling Experiment (EURO-CORDEX), 13 regional climate simulations are available for the greenhouse gas scenarios RCP4.5 and RCP8.5.
Nr. | Global Model | Regional Model |
---|---|---|
1 | CNRM-CERFACS-CNRM-CM5 | CLMcom-CCLM4-8-17 |
2 | CNRM-CERFACS-CNRM-CM5 | CNRM-ALADIN53 |
3 | CNRM-CERFACS-CNRM-CM5 | SMHI-RCA4 |
4 | ICHEC-EC-EARTH | CLMcom-CCLM4-8-17 |
5 | ICHEC-EC-EARTH | SMHI-RCA4 |
6 | ICHEC-EC-EARTH | KNMI-RACMO22E |
7 | ICHEC-EC-EARTH | DMI-HIRHAM5 |
8 | IPSL-IPSL-CM5A-MR | IPSL-INERIS-WRF331F |
9 | IPSL-IPSL-CM5A-MR | SMHI-RCA4 |
10 | MOHC-HadGEM2-ES | CLMcom-CCLM4-8-17 |
11 | MOHC-HadGEM2-ES | SMHI-RCA4 |
12 | MPI-M-MPI-ESM-LR | CLMcom-CCLM4-8-17 |
13 | MPI-M-MPI-ESM-LR | SMHI-RCA4 |
Table 3: Combination of global (GCMs) and regional models (RCMs) from EURO-CORDEX used by ÖKS15.
Data is available at the Datahub of Geosphere Austria and can be downloaded either via a HTTP file list or via the THREDDS Data Server (TDS). In addition to daily mean near-surface temperature (tas) and daily precipitation (pr), the daily minimum (tasmin) and maximum temperature (tasmax) as well as solar shortwave radiation (rsds) are available. Each dataset spans the timperiod 1951-2100.
In ÖKS15, the data of the RCMs from the EURO-CORDEX initiative (12.5 km) is interpolated to the grid of the observations (1 km), whereby the transition to the fine high-resolution grid (1 km) is accomplished using statistical methods.
SPARTACUS
The gridded dataset describes the spatial distribution of observed air temperature (minimum temperature TN and maximum temperature TX), precipitation (RR) and absolute sunshine duration (SA) on a daily basis since 1961 in a horizontal resolution of 1 km over Austria.
The dataset is available for download on the Datahub of Geosphere Austria and can be obtained in different ways (spatial subset download, filearchive, API).
A gridded data set of the daily mean temperature is calculated by averaging the available temperature datasets of daily minimum and daily maximum temperatures. This leads to a continuous dataset for the daily mean temperature from 1961-present over Austria and will be used in the course of the project especially for the evaluation of climate model data for the Austrian domain.
EURO-CORDEX
CORDEX (Coordinated Regional Climate Downscaling Experiment) is a global initiative led by the World Climate Research Program (WCRP). Its goal is to improve regional climate projections by downscaling global climate models (GCMs) to higher resolutions, whereby EURO-CORDEX is the European branch of this initiative, focusing specifically on the European region.
The data is available either via ESGF (Earth System Grid Federation system) or via the Copernius Climate Data Store.
Within these data recipes, CORDEX data is accessed via a special Python interface. The ESGF PyClient is a Python package designed for interacting with the ESGF system. This tool can be used to find the corresponding download links. The Python package wget is then used to download the files for further processing.
EURO-CORDEX provides a wide range of meteorological and climatological variables, including daily mean, max, min temperatures, precipitation rate, longwave and shortwave radiation and many others. Additionaly, the simulations are onducted at two different spatial resolutions, the general CORDEX resolution of 0.44 degree (EUR-44, ~50 km) and additionally the finer resolution of 0.11 degree (EUR-11, ~12.5km).
As for ÖKS15, there are also simulations for different climate scenarios available:
- Historical (1950-2005)
- Future Scenarios (2006–2100, from CMIP5)
- RCP2.6 (Low emissions)
- RCP4.5 (Medium emissions)
- RCP8.5 (High emissions)
E-OBS
E-OBS is a daily gridded observational dataset that provides comprehensive information on various meteorological variables across Europe. It includes data on precipitation, temperature (mean, minimum, and maximum), sea level pressure, global radiation, wind speed, and relative humidity. The dataset spans from January 1950 to the present.
To access and download the E-OBS dataset, visit the official data access page. The data is available in NETCDF-4 format on regular latitude-longitude grids with spatial resolutions of 0.1° x 0.1° and 0.25° x 0.25°.
Destination Earth
Destination Earth is a flagship initiative of the European Commission to develop a highly-accurate digital model of the Earth (a digital twin of the Earth) to model, monitor and simulate natural phenomena, hazards and the related human activities. This initiative presents the first ever attempt to operationalise the production of global multi-decadal climate projections at km-scale resolutions of 5 to 10km. To access the datasets from the Earth Data Hub you need to register on the Destination Earth Platform. In order to get full access to the data, one has to upgrade the access by selecting the appropriate user category (e.g Academia & research). Your request will reviewed and you will be notified on the acceptance.
Note:
- Data is provided on a hierachical HEALPix grid for both models (ICON & IFS-FESOM/IFS-NEMO).
- Data is provided on various levtype values, for different parameters: 2 metre temperature has the parameter 167, Total precipitation rate has the parameter 260048.
The easiest way to access the Destination Earth DT data is via the Polytope web service hosted on the LUMI databride.
- The polytope-client can be installed from PyPI:
pip install --upgrade polytope-client
- Retrieve a token from the Destination Earth Service Platform (DESP) by running the script included in the repository:
python desp-authentication.py
You will then be prompted to enter your username and password which were set during the registration process.
You will also nedd some dependencies to run the script, which can be installed using pip:
pip install --upgrade lxml conflator
The script automatically places your token in ~/.polytopeapirc
where the client will pick it up. The token is a long-lived ("offline_access") token.
ERA5
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. It spans atmospheric, land and ocean variables and includes hourly data with global coverage at 30 km resolution.
Data is avaiable at the Copernius Climate Data Store and can be downloaded via the CDS API. In addition, ERA5-Land hourly data from 1950 to present is also available at a resolution of 9 km.
Interpolation of climatological data (Remapping)
To interpolate data from an existing horizontal field to a finer or coarser grid or another grid type, CDO provides a set of interpolation operators. Within these data recipes we are offering four different methods for regridding climatological datasets:
- nearest neighbor:
remapnn
- distance:
remapdis
- conservative:
remapcon
- bilinear:
remapbil
Those are the four operators that are also available for the interpolation of HEALPix data. The interpolation with remapcon is a bit inaccurate, because we cannot calculate the edges of the HEALPix grid 100% correctly.
The aim of this section is
-
to provide a tool that can remap different grids to the same target grid.
-
to try to understand the different interpolation methods and their effect on atmospheric variables.
Our target grid has the following specifications:
Bouding Box: 46.374817 - 49.017044°N, 9.531588 - 17.162361°E
Grid mapping name: lambert_conformal_conic
Grid resolution: x = 575, y = 297
Magnificently, CDO has the great feature of direct remapping form one grid to another. Using the nearest neighbor interpolation this can be achieved by using the following command:
module load cdo
cdo remapnn,<target_grid> <source_grid> <output_grid>
This transforms any source grid into your target projection, that matches in our case the format of the ÖKS15 data. This methodology can be used accordingly for all datasets described above and all interpolation methods mentioned in this section. These have the following properties, among others:
Nearest neighbour regridding:
- takes the value of the nearest grid cell of the source grid and writes it into the target grid cell
- can be used for unstructured grids (like HEALPix grid)
- simple method that works most of the time, even when other methods do not
Distance weighted regridding:
- inverse distance weighted average remapping of the four (default number, can be changed) nearest neighbor values
- smoother grid, gradients less steep than with nearest neighbour
- no need to provide source grid cell corner coordinates
Conservative regridding:
- ensures that a conserved quantity (e.g. mass, energy) is preserved during interpolation
- uses weighted averaging based on overlapping areas of source and target grids
- need to provide source grid cell corner coordiantes
Bilinear regridding:
- weighted average of the four nearest grid points in a 2D grid
- smooth interpolation producing continous results
- can introduce discontinuities at cell boundaries
There is not a rule set in stone as to which interpolation method should be used, therefore study your data and goals beforehand. One general advise is to always be careful when applying interpolation/regridding methods, especially for not continous fields like precipitation!
Process to match the spatial extent of ÖKS15 data (Cutting)
Our target format is the format of the ÖKS15 dataset. A simple processing step is required to ensure that all other datasets correspond to the spatial extent of the ÖKS15 data. According to the ÖKS15 data, only grid points that lie within the domain of Austria have a value. All other grid points contain Nan-values.
One can cut the remapped datasets based on the grid of the ÖKS15 data. Therefore, load one of the downloaded ÖKS15 datasets and the dataset you want to cut (e.g. remapped Destination Earth data):
import xarray as xr
oeks15 = xr.open_dataset("oeks15_exampledata.nc")
destine = xr.open_dataset("destine_exampledata_remapped.nc")
To find the locations with Nan-values on the ÖKS15 grid use:
nan_locations = oeks15.isnull()
By using the in-built function the Nan-locations are applied accordingly to the remapped dataset to retain the Austrian domain defined by the ÖKS15 data:
import numpy as np
destine_cutted = nextgems.where(~nan_locations, np.nan)
highrescut
In order to answer the question at the very beginning, we have set ourselves the task of developing a specific python package for this purpose. Highrescut is a python library for downloading, remapping and cutting climatological datasets to match the format of ÖKS15 data, including download as preprocessing scripts as well as example data and a corresponding documentation.

Repository structure
The repository is organized as follows:
-
highrescut contains the actual python code.
__init__.py
is acting as the control center. The individual download functions can be called here, whereby the path of the respective download is returned by the function. The remapping function (regrid.py
) requires the path of the target-grid (ÖKS15), as well as the path of the input file to be regridded. The cutting function (cut.py
) requires again the path of the target-grid and the path of the previously regridded file. Note that all functions can also be used independently of each other. - example_data contains example data (September 2020) for every dataset provided in Table 1.
- example_notebooks contains example notebooks that illustrate the use of highrescut (for processing temperature and preciptation data). These notebooks can be adapted by the needs of the user, but provide the general procedure for regridding und cutting the data as well as a plotting function to make a first comparison between the different datasets.
Note: Depending on the timeperiod and therefore the size of the data to be processed, the use of the package within a Jupyter notebook for processing large amounts of data is not recommended.
Availability and installation
Like all results of the HighResLearn project, the data recipes are made publicly available. For this reason, we use version-controlled distribution options such as GitHub in combination with convenient installation options (such as Python's pypi).
Highrescut is hosted at the gitlab service of Phaidra at University of Vienna, Austria.
Highrestcut will be available from pypi and can in the future be installed via pip:
pip install highrescut
To run the code and also the notebooks you can use the environment.yml file provided to create a conda environment that can run the code. The following commands create the environment and also create an ipykernel called highrescut that can be used in notebooks if selected.
envname=highrescut
conda create -n $envname -c conda-forge -y python=3.10
conda env update -n $envname -f environment.yml
conda actiavte $envname
# set highrescut environment to the default used by ipykernels
python3 -m ipykernel install --user --name=$envname
Authors
Highrescut has been developed by:
- Maximilian Meindl (University of Vienna, Austria)
- Luiza Sabchuk (University of Vienna, Austria)
- Aiko Voigt (University of Vienna, Austria)
- Lukas Brunner (University of Hamburg, Germany)
Acknowledgemnts
Highrescut uses a couple of other python libraries, and we are very grateful to the communities of developers and maintainers of these libraries. These libraries are:
- cdsapi, https://pypi.org/project/cdsapi/
- ESGF PyClient, https://esgf-pyclient.readthedocs.io/en/latest/index.html
- numpy, https://numpy.org/
- cdo, https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo%7Brbpy%7D
- polytope-client, https://github.com/ecmwf/polytope-client
- wget, https://pypi.org/project/wget/
- xarray, http://xarray.pydata.org/