This notebook presents two approaches of accessing TRACMIP data in the Pangeo cloud from your own computer. Both appraoches access zarr-based data in the Google Cloud and in the example provided here result in a dictionary of monthly-mean precipitation in the aquaControl from all TRACMIP models.
As a proof of concept, the notebook closes with plotting the time-averaged precipitation for one of the TRACMIP models using the dictionaries generated by both approaches. Naturally, and in fact naecessarily, the plot is the same for both dictionaries.
Based on "Way 2: Do the same thing with the Google Cloud Zarr-based data (still from your laptop)" described in Ryan Abernathey's blog post "CMIP6 in the Cloud Five Ways", https://medium.com/pangeo/cmip6-in-the-cloud-five-ways-96b177abe396
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import gcsfs
Browse Catalog: The data catatalog is stored as a CSV file. Here we read it with Pandas.
df = pd.read_csv('https://storage.googleapis.com/cmip6/tracmip.csv')
df.head()
For the purpose of this example, we filter the data to find monthly precipitation for the aquaControl simulation.
df_pr = df.query("frequency == 'Amon' & variable == 'pr' & experiment == 'aquaControl'")
We check the content of the dataframe. As desired it contains monthly precipitation for the 14 TRACMIP models. The entries in the source column point to the data location in the Google Cloud and are needed to read the data.
df_pr
We are now going to read the data into a dictionary of xarray datasets.
# this only needs to be created once
gcs = gcsfs.GCSFileSystem(token='anon')
# initialize an empty dictionary
ds_dict1=dict()
To this end, we loop over the source values to read the data of the individual models and to fill the dictionary. As for the dictionary keys we use the model names.
for zstore in df_pr.source.values:
mapper = gcs.get_mapper(zstore)
ds = xr.open_zarr(mapper, consolidated=True)
ds_dict1[ds.attrs['model_id']] = ds
For the sake of demonstration, we print the dictionary content for the ECHAM61 model.
ds_dict1['ECHAM61']
This approach was developed with help by Charles Christopher Blackmon Luca.
from intake import open_catalog
# get the entire Pangeo catalogue ...
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/climate.yaml")
# ... and select TRACMIP collection
col = cat.tracmip()
For illustration, we look at some basic information of the TRACMIP collection. AS expected, there is 3 output frequencies (monthly-mean, daily-mean, 3-hr snapshots), 11 experiments (6 are due to the CALTECH model with changed atmosperic opacity), and 47 variables
col
To make ourselves a bit more familiar with the collection we print its starting and end portion to screen. This looks just as for appraoch 1, as it must be.
col.df.head()
col.df.tail()
Now we load the monthly-mean precip data for the aquaControl experiment into a dictionary, in analogy to what we did for approach 1.
Note: the option "zarr_kwargs={'consolidated': True}" for to_dataset_dicts does not seem necessary but is still included here.
ds_dict2 = col.search(frequency="Amon", experiment="aquaControl",
variable="pr").to_dataset_dict(zarr_kwargs={'consolidated': True})
For the sake of demonstration, we print the dictionary content for the ECHAM61 model. Note that the keys are now 'model.experiment.frequency'.
ds_dict2['ECHAM61.aquaControl.Amon']
import matplotlib.pyplot as plt
plt.plot(ds_dict1['ECHAM61'].lat,
ds_dict1['ECHAM61']['pr'].isel(time=slice(120,360)).mean(['lon', 'time'])*86400,
'b', linewidth=3, label='approach 1')
plt.plot(ds_dict2['ECHAM61.aquaControl.Amon'].lat,
ds_dict2['ECHAM61.aquaControl.Amon']['pr'].isel(time=slice(120,360)).mean(['lon', 'time'])*86400,
'r--', linewidth=3, label='approach 2')
plt.xlabel('degree latitude')
plt.ylabel('precipitation (mm/day)')
plt.title('ECHAM61, aquaControl');
plt.legend();
The above hopefully provides a helpful and clear recipe for accessing TRACMIP data from the Pangeo Cloud. It should be straightforward to condense the approaches into wrapper functions.