quick_start.rst
Quick Start
flex_extract
is a command-line tool. In the first versions, it was started via a korn shell script and since version 6, the entry point was a python script. From version 7.1, a bash shell script was implemented to call flex_extract
with the command-line parameters.
To submit an extraction job, change the working directory to the subdirectory Run
(directly under the flex_extract_vX.X
root directory, where X.X
is the version number):
cd <path-to-flex_extract_vX.X>/Run
Within this directory you can find everything you need to modify and run flex_extract
. The following tree shows a shortened list of directories and important files. The *
serves as a wildcard. The brackets []
indicate that the file is present only in certain modes of application.
Run
├── Control
│ ├── CONTROL_*
├── Jobscripts
│ ├── compilejob.ksh
│ ├── job.ksh
│ ├── [joboper.ksh]
├── Workspace
│ ├── CERA_example
│ │ ├── CE000908*
├── [ECMWF_ENV]
├── run_local.sh
└── run.sh
The Jobscripts
directory is used to store the Korn shell job scripts generated by a flex_extract
run in the Remote or Gateway mode. They are used to submit the setup information to the ECMWF server and start the jobs in ECMWF's batch mode. The typical user must not touch these files. They are generated from template files which are stored in the Templates
directory under flex_extract_vX.X
. Usually there will be a compilejob.ksh
and a job.ksh
script which are explained in the section :doc:`Documentation/input`. In the rare case of operational data extraction there will be a joboper.ksh
which reads time information from environment variables at the ECMWF servers.
The Controls
directory contains a number of sample CONTROL
files. They cover the current range of possible kinds of extractions. Some parameters in the CONTROL
files can be adapted and some others should not be changed. In this :doc:`quick_start` guide we explain how an extraction with flex_extract
can be started in the different :doc:`Documentation/Overview/app_modes` and point out some specifics of each dataset and CONTROL
file.
Directly under Run
you find the files run.sh
and run_local.sh
and according to your selected :doc:`Documentation/Overview/app_modes` there might also be a file named ECMWF_ENV
for the user credentials to quickly and automatically access ECMWF servers.
From version 7.1 on, the run.sh
(or run_local.sh
) script is the main entry point to flex_extract
.
Note
Note that for experienced users (or users of older versions), it is still possible to start flex_extract
directly via the submit.py
script in directory flex_extract_vX.X/Source/Python
.
Job preparation
To actually start a job with flex_extract
it is sufficient to start either run.sh
or run_local.sh
. Data sets and access modes are selected in CONTROL
files and within the user section of the run
scripts. One should select one of the sample CONTROL
files. The following sections describes the differences in the application modes and where the results will be stored.
Remote and gateway modes
For member-state users it is recommended to use the remote or gateway mode, especially for more demanding tasks, which retrieve and convert the data on ECMWF machines; only the final output files are transferrred to the local host.
- Remote mode
-
The only difference between both modes is the users working location. In the remote mode you have to login to the ECMWF server and then go to the
Run
directory as shown above. At ECMWF serversflex_extract
is installed in the$HOME
directory. However, to be able to start the program you have to load thePython3
environment with the module system first.# Remote mode ssh -X <ecuid>@ecaccess.ecmwf.int
# On ECMWF server [<ecuid>@ecgb11 ~]$ module load python3 [<ecuid>@ecgb11 ~]$ cd flex_extract_vX.X/Run
- Gateway mode
-
For the gateway mode you have to log in on the gateway server and go to the
Run
directory offlex_extract
:# Gateway mode ssh <user>@<gatewayserver> cd <path-to-flex_extract_vX.X>/Run
From here on the working process is the same for both modes.
For your first submission you should use one of the example CONTROL
files stored in the Control
directory. We recommend to extract CERA-20C data since they usually guarantee quick results and are best for testing reasons.
Therefore open the run.sh
file and modify the parameter block marked in the file as shown below:
# -----------------------------------------------------------------
# AVAILABLE COMMANDLINE ARGUMENTS TO SET
#
# THE USER HAS TO SPECIFY THESE PARAMETERS:
QUEUE='ecgate'
START_DATE=None
END_DATE=None
DATE_CHUNK=None
JOB_CHUNK=3
BASETIME=None
STEP=None
LEVELIST=None
AREA=None
INPUTDIR=None
OUTPUTDIR=None
PP_ID=None
JOB_TEMPLATE='job.temp'
CONTROLFILE='CONTROL_CERA'
DEBUG=0
REQUEST=2
PUBLIC=0
This would retrieve a one day (08.09.2000) CERA-20C dataset with 3 hourly temporal resolution and a small 1° domain over Europe. Since the ectrans
parameter is set to 1
the resulting output files will be transferred to the local gateway into the path stored in the destination (SEE INSTRUCTIONS FROM INSTALLATION). The parameters listed in the run.sh
file would overwrite existing settings in the CONTROL
file.
To start the retrieval you only have to start the script by:
./run.sh
Flex_extract
will print some information about the job. If there is no error in the submission to the ECMWF server you will see something like this:
---- On-demand mode! ----
The job id is: 10627807
You should get an email per job with subject flex.hostname.pid
FLEX_EXTRACT JOB SCRIPT IS SUBMITED!
Once submitted you can check the progress of the submitted job using ecaccess-job-list
. You should get an email after the job is finished with a detailed protocol of what was done.
In case the job fails you will receive an email with the subject ERROR!
and the job name. You can then check for information in the email or you can check on ECMWF server in the $SCRATCH
directory for debugging information.
cd $SCRATCH
ls -rthl
The last command lists the most recent logs and temporary retrieval directories (usually pythonXXXXX
, where XXXXX is the process id). Under pythonXXXXX
a copy of the CONTROL
file is stored under the name CONTROL
, the protocol is stored in the file prot
and the temporary files as well as the resulting files are stored in a directory work
. The original name of the CONTROL
file is stored in this new file under parameter controlfile
.
If the job was submitted to the HPC ( queue=cca
or queue=ccb
) you may login to the HPC and look into the directory /scratch/ms/ECGID/ECUID/.ecaccess_do_not_remove
for job logs. The working directories are deleted after job failure and thus normally cannot be accessed.
To check if the resulting files are still transferred to local gateway server you can use the command ecaccess-ectrans-list
or check the destination path for resulting files on your local gateway server.
Local mode
To get to know the working process and to start your first submission you could use one of the example CONTROL
files stored in the Control
directory as they are. For quick results and for testing reasons it is recommended to extract CERA-20C data.
Open the run_local.sh
file and modify the parameter block marked in the file as shown below. The differences are highlighted.
Take this for member-state user | Take this for public user |
This would retrieve a one day (08.09.2000) CERA-20C dataset with 3 hourly temporal resolution and a small 1° domain over Europe. The destination location for this retrieval will be within the Workspace
directory within Run
. This can be changed to whatever path you like. The parameters listed in run_local.sh
would overwrite existing settings in the CONTROL
file.
To start the retrieval you then start the script by:
./run_local.sh
While job submission on the local host is convenient and easy to monitor (on standard output), there are a few caveats with this option:
- There is a maximum size of 20GB for single retrieval via ECMWF Web API. Normally this is not a problem but for global fields with T1279 resolution and hourly time steps the limit may already apply.
- If the retrieved MARS files are large but the resulting files are relative small (small local domain) then the retrieval to the local host may be inefficient since all data must be transferred via the Internet. This scenario applies most notably if
etadot
has to be calculated via the continuity equation as this requires global fields even if the domain is local. In this case job submission via ecgate might be a better choice. It really depends on the use patterns and also on the internet connection speed.
Selection and adjustment of CONTROL
files
This section describes how to work with the CONTROL
files. A detailed explanation of CONTROL
file parameters and naming compositions can be found here. The more accurately the CONTROL
file describes the retrieval needed, the fewer command-line parameters are needed to be set in the run
scripts. With version 7.1
all CONTROL
file parameters have default values. They can be found in section CONTROL parameters or in the CONTROL.documentation
file within the Control
directory. Only those parameters which need to be changed for a dataset retrieval needs to be set in a CONTROL
file!
The limitation of a dataset to be retrieved should be done very cautiously. The datasets can differ in many ways and vary over the time in resolution and parameterisations methods, especially the operational model cycles improves through a lot of changes over the time. If you are not familiar with the data it might be useful or necessary to check for availability of data in ECMWF’s MARS:
- Public users can use a web mask to check on data or list available data at this Public datasets web interface.
- Member state users can check availability of data online in the MARS catalogue.
There you can select step by step what data suits your needs. This would be the most straightforeward way of checking for available data and therefore limit the possibility of flex_extract
to fail. The following figure gives an example how the web interface would look like:

Additionally, you can find a lot of helpful links to dataset documentations, direct links to specific dataset web catalogues or further general information at the link collection in the ECMWF data section.
Flex_extract
is specialised to retrieve a limited number of datasets, namely ERA-Interim, CERA-20C, ERA5 and HRES (operational data) as well as the ENS (operational data, 15-day forecast). The limitation relates mainly to the dataset itself, the stream (what kind of forecast or what subset of dataset) and the experiment number. Mostly, the experiment number is equal to 1
to signal that the actual version should be used.
The next level of differentiation would be the field type, level type and time period. Flex_extract
currently only supports the main streams for the re-analysis datasets and provides extraction of different streams for the operational dataset. The possibilities of compositions of dataset and stream selection are represented by the current list of example CONTROL
files. You can see this in the naming of the example files:
The main differences and features in the datasets are listed in the table shown below:

A common problem for beginners in retrieving ECMWF datasets is a mismatch in the choice of values for these parameters. For example, if you try to retrieve operational data for 24 June 2013 or earlier and set the maximum level to 137, you will get an error because this number of levels was introduced only on 25 June 2013. Thus, be careful in the combination of space and time resolution as well as the field types.
Note
Sometimes it might not be clear how specific parameters in the control file must be set in terms of format. Please consult the description of the parameters in section CONTROL parameters or have a look at the ECMWF user documentation for MARS keywords
In the following, we shortly discuss the typical retrievals for the different datasets and point to the respective CONTROL
files.