ecmwf_models
Readers and converters for data from the ECMWF reanalysis models. Written in Python.
Works great in combination with pytesmo.
Citation
If you use the software in a publication then please cite it using the Zenodo DOI. Be aware that this badge links to the latest package version.
Please select your specific version at https://doi.org/10.5281/zenodo.593533 to get the DOI of that version. You should normally always use the DOI for the specific version of your record in citations. This is to ensure that other researchers can access the exact research artefact you used for reproducibility.
You can find additional information regarding DOI versioning at http://help.zenodo.org/#versioning
Installation
Install required C-libraries via conda. For installation we recommend
Miniconda. So please install it according
to the official installation instructions. As soon as you have the conda
command in your shell you can continue:
conda install -c conda-forge pandas pygrib netcdf4 pyresample xarray
The following command will download and install all the needed pip packages as well as the ecmwf-model package itself.
pip install ecmwf_models
To create a full development environment with conda, the yml files inside the folder environment/ in this repository can be used. Both environements should work. The file latest should install the newest version of most dependencies. The file pinned is a fallback option and should always work.
git clone --recursive git@github.com:TUW-GEO/ecmwf_models.git ecmwf_models
cd ecmwf_models
conda env create -f environment/latest.yml
source activate ecmwf_models
python setup.py develop
pytest
Supported Products
At the moment this package supports
ERA Interim (deprecated)
ERA5
ERA5-Land
reanalysis data in grib and netcdf format (download, reading, time series creation) with a default spatial sampling of 0.75 degrees (ERA Interim), 0.25 degrees (ERA5), resp. 0.1 degrees (ERA5-Land). It should be easy to extend the package to support other ECMWF reanalysis products. This will be done as need arises.
Contribute
We are happy if you want to contribute. Please raise an issue explaining what is missing or if you find a bug. Please take a look at the developers guide.
Downloading ERA5 Data
ERA5 (and ERA5-Land) data can be downloaded manually from the Copernicus Data Store (CDS) or automatically via the CDS api, as done in the download module (era5_download). Before you can use this, you have to set up an account at the CDS and setup the CDS key.
Then you can use the program era5_download
to download ERA5 images between
a passed start and end date.
era5_download --help
will show additional information on using the command.
For example, the following command in your terminal would download ERA5 images
for all available layers of soil moisture in netcdf format, between
January 1st and February 1st 2000 in grib format into /path/to/storage
.
The data will be stored in subfolders of the format YYYY/jjj
. The temporal
resolution of the images is 6 hours by default.
era5_download /path/to/storage -s 2000-01-01 -e 2000-02-01 --variables swvl1 swvl2 swvl3 swvl4
The names of the variables to download can be its long names, the short names
(as in the example) or the parameter IDs. We use the era5_lut.csv
file to
look up the right name for the CDS API.
Other flags, that can be activated in era5_download
are:
-h (–help) : shows the help text for the download function
-p (–product): specify the ERA5 product to download. Choose either ERA5 or ERA5-Land. Default is ERA5.
-keep (–keep_original) : keeps the originally downloaded files as well. We split the downloaded, monthly stacks into single images and discard the original files by default.
- -grb (–as_grib)download the data in grib format instead of the default nc4
format (grib reading is not supported on Windows OS).
–h_steps : full hours for which images are downloaded (e.g. –h_steps 0 would download only data at 00:00 UTC). By default we use 0, 6, 12 and 18.
Downloading ERA Interim Data
ERA-Interim has been decommissioned. Use ERA5 instead.
ERA-Interim data can be downloaded manually from the ECMWF servers. It can also be done automatically using the ECMWF API. To use the ECMWF API you have to be registered, install the ecmwf-api Python package and setup the ECMWF API Key. A guide for this is provided by ECMWF.
After that you can use the command line program eraint_download
to download
images with a temporal resoltuion of 6 hours between a passed start and end date.
eraint_download --help
will show additional information on using the command.
For example, the following command in your terminal would download ERA Interim
soil moisture images of all available layers (see the
Variable DB) in netcdf format on
the default gaussian grid for ERA-Interim (0.75°x0.75°) into
the folder /path/to/storage
between January 1st and February 1st 2000.
The data will be stored in subfolders of the format YYYY/jjj
, where YYYY
describes the year and jjj
the day of the year for the downloaded files.
eraint_download /path/to/storage -s 2000-01-01 -e 2000-02-01 --variables swvl1 swvl2 swvl3 swvl4
Additional optional parameters allow downloading images in netcdf format, and in a different spatial resolution (see the –help function and descriptions for downloading ERA5 data)
Reading data
After downloading the data for ERA Interim or ERA5 via eraint_download
resp.
era5_download
, images can be read with the ERA5GrbDs
and
ERA5NcDs
(for grib and netcdf image stacks), respectively the
ERA5GrbImg
and ERA5NcImg
(for single grib and netcdf images) classes.
The respective functions for reading images are defined in
ecmwf_models.erainterim.interface
ecmwf_models.era5.interface
.
The following examples are shown for ERA5 data, but work the same way with the respective ERA Interim functions.
For example, you can read the image for a single variable at a specific date. In this case for a stack of downloaded image files:
# Script to load a stack of downloaded netcdf images
# and read a variable for a single date.
from ecmwf_models.era5.interface import ERA5NcDs
root_path = "/path/to/netcdf_storage"
ds = ERA5NcDs(root_path, parameter='swvl1')
data = ds.read(datetime(2010, 1, 1, 0))
# Script to load a stack of downloaded grib images
# and read a variable for a single date.
from ecmwf_models.era5.interface import ERA5GrbDs
root_path = "/path/to/grib_storage"
ds = ERA5GrbDs(root_path, parameter='swvl1')
data = ds.read(datetime(2010, 1, 1, 0))
You can also read multiple variables at a specific date by passing a list of parameters. In this case for a set of netcdf files:
# Script to load a stack of downloaded netcdf images
# and read two variables for a single date.
from ecmwf_models.era5.interface import ERA5NcDs
root_path = "/path/to/storage"
ds = ERA5NcDs(root_path, parameter=['swvl1', 'swvl2'])
data = ds.read(datetime(2000, 1, 1, 0))
All images between two given dates can be read using the
iter_images
methods of all the image stack reader classes.
Conversion to time series format
For a lot of applications it is favorable to convert the image based format into a format which is optimized for fast time series retrieval. This is what we often need for e.g. validation studies. This can be done by stacking the images into a netCDF file and choosing the correct chunk sizes or a lot of other methods. We have chosen to do it in the following way:
Store only the reduced gaußian grid points (for grib data) since that saves space.
Store the time series in netCDF4 in the Climate and Forecast convention Orthogonal multidimensional array representation
Store the time series in 5x5 degree cells. This means there will be 2566 cell files and a file called
grid.nc
which contains the information about which grid point is stored in which file. This allows us to read a whole 5x5 degree area into memory and iterate over the time series quickly.
This conversion can be performed using the era5_reshuffle
(respectively
eraint_reshuffle
) command line program. An example would be:
era5_reshuffle /era_data /timeseries/data 2000-01-01 2001-01-01 swvl1 swvl2
Which would take 6-hourly ERA5 images stored in /era_data
from January
1st 2000 to January 1st 2001 and store the parameters “swvl1” and “swvl2” as time
series in the folder /timeseries/data
. If you time series should have a different
resolution than 6H, use the h_steps
flag here accordingly (images to use for time
series generation have to be in the downloaded raw data).
The passed names have to correspond with the names in the downloaded file,
i.e. use the variable short names here.
Other flags, that can be used in era5_reshuffle
are:
-h (–help) : Shows the help text for the reshuffle function
–land_points : Reshuffle and store only data over land points.
-h_steps (–as_grib) : full hours for which images are reshuffled (e.g. –h_steps 0 would reshuffle only data at 00:00 UTC). By default we use 0, 6, 12 and 18.
–imgbuffer : The number of images that are read into memory before converting them into time series. Bigger numbers make the conversion faster but consume more memory.
Conversion to time series is performed by the repurpose package in the background. For custom settings
or other options see the repurpose documentation and the code in
ecmwf_models.reshuffle
.
Reading converted time series data
For reading time series data, that the era5_reshuffle
and eraint_reshuffle
command produces, the class ERATs
can be used.
Optional arguments that are passed to the parent class
(OrthoMultiTs
, as defined in pynetcf.time_series)
can be passed as well:
from ecmwf_models import ERATs
# read_bulk reads full files into memory
# read_ts takes either lon, lat coordinates to perform a nearest neighbour search
# or a grid point index (from the grid.nc file) and returns a pandas.DataFrame.
ds = ERATs(ts_path, ioclass_kws={'read_bulk': True})
ts = ds.read_ts(45, 15)
Bulk reading speeds up reading multiple points from a cell file by storing the
file in memory for subsequent calls. Either Longitude and Latitude can be passed
to perform a nearest neighbour search on the data grid (grid.nc
in the time series
path) or the grid point index (GPI) can be passed directly.