CMIP5

The CMIP5 module provides tools for searching through the CMIP5 data stored on NCI’s /g/data filesystem

Getting Started:

The ARCCSSive library is available as a module on Raijin. Load it using:

module use ~access/modules
module load pythonlib/ARCCSSive

To use the CMIP5 catalog you first need to connect to it:

>>> from ARCCSSive import CMIP5
>>> cmip5 = CMIP5.connect() 

The session object allows you to run queries on the catalog. There are a number of helper functions for common operations, for instance searching through the model outputs:

>>> outputs = cmip5.outputs(
...     experiment = 'rcp45',
...     variable   = 'tas',
...     mip        = 'day',
...     ensemble   = 'r1i1p1')

You can then loop over the search results in normal Python fashion:

>>> for o in outputs.filter_by(model='ACCESS1.3'):
...     (o.model, o.filenames())
('ACCESS1.3', ['tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc'])

Examples

Get files from a single model variable

>>> outputs = cmip5.outputs(
...     experiment = 'rcp45',
...     variable   = 'tas',
...     mip        = 'day',
...     model      = 'ACCESS1.3',
...     ensemble   = 'r1i1p1')

>>> for f in outputs.first().filenames():
...     f
'tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc'

Get files from all models for a specific variable

>>> outputs = cmip5.outputs(
...     experiment = 'rcp45',
...     variable   = 'tas',
...     mip        = 'day',
...     ensemble   = 'r1i1p1')

>>> for m in outputs:
...     model = m.model
...     files = m.filenames()

Choose more than one variable at a time

More complex queries on the Session.outputs() results can be performed using SQLalchemy’s filter():

>>> from ARCCSSive.CMIP5.Model import *
>>> from sqlalchemy import *

>>> outputs = cmip5.outputs(
...     experiment = 'rcp45',
...     model      = 'ACCESS1-3',
...     mip        = 'Amon',) \
...     .filter(Instance.variable.in_(['tas','pr']))

Get results from a specific output version

Querying specific versions currently needs to go through the Session.query() function, this will be simplified in a future version of ARCCSSive:

>>> from ARCCSSive.CMIP5.Model import *

>>> res = cmip5.query(Version) \
...         .join(Instance) \
...         .filter(
...     Version.version     == 'v20120413',
...     Instance.model      == 'ACCESS1-3',
...     Instance.experiment == 'rcp45',
...     Instance.mip        == 'Amon',
...     Instance.ensemble   == 'r1i1p1')

>>> # This returns a sequence of Version, get the variable information from
>>> # the .variable property
>>> for o in res:
...     o.variable.model, o.variable.variable, o.filenames()

Compare model results between two experiments

Link two sets of outputs together using joins:

>>> from ARCCSSive.CMIP5.Model import *
>>> from sqlalchemy.orm import aliased
>>> from sqlalchemy import *

>>> # Create aliases for the historical and rcp variables, so we can
>>> # distinguish them in the query
>>> histInstance = aliased(Instance)
>>> rcpInstance  = aliased(Instance)
>>> rcp_hist  = cmip5.query(rcpInstance, histInstance).join(
...         histInstance, and_(
...             histInstance.variable == rcpInstance.variable,
...             histInstance.model    == rcpInstance.model,
...             histInstance.mip      == rcpInstance.mip,
...             histInstance.ensemble == rcpInstance.ensemble,
...         )).filter(
...             rcpInstance.experiment  == 'rcp45',
...             histInstance.experiment == 'historicalNat',
...         )

>>> for r, h in rcp_hist:
...     r.versions[-1].path, h.versions[-1].path

API

connect()

ARCCSSive.CMIP5.connect()[source]

Connect to the CMIP5 catalog

Returns:A new Session

Example:

>>> from ARCCSSive import CMIP5 
>>> cmip5   = CMIP5.DB.connect() 
>>> outputs = cmip5.query() 

Session

The session object has a number of helper functions for getting information out of the catalog, e.g. Session.models() gets a list of all available models.

class ARCCSSive.CMIP5.Session[source]

Holds a connection to the catalog

Create using ARCCSSive.CMIP5.connect()

experiments()[source]

Get the list of all experiments in the dataset

Returns:A list of strings
files(**kwargs)[source]

Query the list of files

Returns a list of files that match the arguments

Parameters:kwargs – Match any attribute in Model.Instance, e.g. model = ‘ACCESS1-3’
Returns:An iterable returning Model.File matching the search query
mips()[source]

Get the list of all MIP tables in the dataset

Returns:A list of strings
models()[source]

Get the list of all models in the dataset

Returns:A list of strings
outputs(**kwargs)[source]

Get the most recent instances matching a query

Arguments are optional, using them will select only matching outputs

Parameters:
  • variable – CMIP variable name
  • experiment – CMIP experiment
  • mip – MIP table
  • model – Model used to generate the dataset
  • ensemble – Ensemble member
Returns:

An iterable sequence of ARCCSSive.CMIP5.Model.Instance

query(*args, **kwargs)[source]

Query the CMIP5 catalog

Allows you to filter the full list of CMIP5 outputs using SQLAlchemy commands

Returns:A SQLalchemy query object
variables()[source]

Get the list of all variables in the dataset

Returns:A list of strings

Model

The model classes hold catalog information for a single entry. Each model run variable can have a number of different data versions, as errors get corrected by the publisher, and each version can consist of a number of files split into a time sequence.

Each model class has a number of relationships, which can be used in a query to efficiently return linked data e.g.:

>>> q = (cmip5.query(Instance, VersionFile)
...         .join(Instance.latest_version)
...         .join(Version.files))

This query returns an iterator of (Instance, ARCCSSive.model.cmip5.File) pairs and only needs to query the database once, whereas using a loop requires a database query for each iteration.

class ARCCSSive.CMIP5.Model.Instance(**kwargs)[source]

A combination of a CMIP5 Dataset and a single variable

Relationships:

versions

list[Version]: List of all available versions of this dataset

latest_version

Version: The most recent version of this dataset

files

list[ARCCSSive.model.cmip5.File]: All files belonging to this dataset and variable, regardless of version

Attributes:

variable

Variable name

experiment

CMIP experiment

mip

MIP table specifying output frequency and realm

model

Model that generated the dataset

ensemble

Ensemble member

realm

Realm: ie atmos, ocean

filenames()[source]

Returns the file names from the latest version of this variable

Returns:List of file names
drstree_path()[source]

Returns the drstree path for this instance latest version

class ARCCSSive.CMIP5.Model.Version(**kwargs)[source]

A version of a model run’s variable

Relationships:

variable

Instance: Dataset and variable this version is attached to

warnings

[ARCCSSive.model.cmip5.Warning]: Warnings attached to this dataset version

files

[ARCCSSive.model.cmip5.File]: Files belonging to this dataset version

Attributes:

version

Version identifier

path

Path to the output directory

>>> instance = cmip5.query(Instance).filter_by(dataset_id = 'c6d75f4c-793b-5bcc-28ab-1af81e4b679d', variable='tas').one()
>>> version = instance.latest()
>>> version = instance.versions[-1]
glob()[source]

Get the glob string matching the CMIP5 filename

>>> six.print_(version.glob())
tas_day_ACCESS1.3_rcp45_r1i1p1*.nc
build_filepaths()[source]

Returns the list of files matching this version

Returns:List of file names
>>> pprint.pprint(version.build_filepaths())
['/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20060101-20301231.nc',
 '/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc',
 '/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20560101-20801231.nc',
 '/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20810101-21001231.nc']
filenames()[source]

Returns the list of filenames for this version

Returns:List of file names
>>> sorted(version.filenames())
['tas_day_ACCESS1-3_rcp45_r1i1p1_20060101-20301231.nc', 'tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc', 'tas_day_ACCESS1-3_rcp45_r1i1p1_20560101-20801231.nc', 'tas_day_ACCESS1-3_rcp45_r1i1p1_20810101-21001231.nc']
tracking_ids()[source]

Returns the list of tracking_ids for files in this version

Returns:List of tracking_ids
>>> sorted(version.tracking_ids())
['54779e2d-41fb-4671-bbdf-2170385afa3b', '800713b7-c303-4618-aef9-f72548d5ada6', 'd2813685-9c7c-4527-8186-44a8f19d31dd', 'f810f58d-329e-4934-bb1c-28c5c314e073']
drstree_path()[source]

Returns the drstree path for this particular version

class ARCCSSive.model.cmip5.File(**kwargs)[source]

A CMIP5 output file’s attributes

Relationships:

attribute:: dataset
Dataset: The dataset this file is part of
attribute:: version
Version: This file’s dataset version
attribute:: warnings
[Warning]: Warnings associated with this file
attribute:: timeseries
Timeseries holding all files in the dataset with the same variables

Attributes:

attribute:: experiment_id attribute:: frequency attribute:: institute_id attribute:: model_id attribute:: modeling_realm attribute:: product attribute:: table_id attribute:: tracking_id attribute:: version_number attribute:: realization attribute:: initialization_method attribute:: physics_version