CMIP5¶
The CMIP5 module provides tools for searching through the CMIP5 data stored on NCI’s /g/data filesystem
Getting Started:¶
The ARCCSSive library is available as a module on Raijin. Load it using:
module use ~access/modules
module load pythonlib/ARCCSSive
To use the CMIP5 catalog you first need to connect to it:
>>> from ARCCSSive import CMIP5
>>> cmip5 = CMIP5.connect()
The session object allows you to run queries on the catalog. There are a number of helper functions for common operations, for instance searching through the model outputs:
>>> outputs = cmip5.outputs(
... experiment = 'rcp45',
... variable = 'tas',
... mip = 'day',
... ensemble = 'r1i1p1')
You can then loop over the search results in normal Python fashion:
>>> for o in outputs.filter_by(model='ACCESS1.3'):
... (o.model, o.filenames())
('ACCESS1.3', ['tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc'])
Examples¶
Get files from a single model variable¶
>>> outputs = cmip5.outputs(
... experiment = 'rcp45',
... variable = 'tas',
... mip = 'day',
... model = 'ACCESS1.3',
... ensemble = 'r1i1p1')
>>> for f in outputs.first().filenames():
... f
'tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc'
Get files from all models for a specific variable¶
>>> outputs = cmip5.outputs(
... experiment = 'rcp45',
... variable = 'tas',
... mip = 'day',
... ensemble = 'r1i1p1')
>>> for m in outputs:
... model = m.model
... files = m.filenames()
Choose more than one variable at a time¶
More complex queries on the Session.outputs()
results can be performed using
SQLalchemy’s filter():
>>> from ARCCSSive.CMIP5.Model import *
>>> from sqlalchemy import *
>>> outputs = cmip5.outputs(
... experiment = 'rcp45',
... model = 'ACCESS1-3',
... mip = 'Amon',) \
... .filter(Instance.variable.in_(['tas','pr']))
Get results from a specific output version¶
Querying specific versions currently needs to go through the
Session.query()
function, this will be simplified in a future version of
ARCCSSive:
>>> from ARCCSSive.CMIP5.Model import *
>>> res = cmip5.query(Version) \
... .join(Instance) \
... .filter(
... Version.version == 'v20120413',
... Instance.model == 'ACCESS1-3',
... Instance.experiment == 'rcp45',
... Instance.mip == 'Amon',
... Instance.ensemble == 'r1i1p1')
>>> # This returns a sequence of Version, get the variable information from
>>> # the .variable property
>>> for o in res:
... o.variable.model, o.variable.variable, o.filenames()
Compare model results between two experiments¶
Link two sets of outputs together using joins:
>>> from ARCCSSive.CMIP5.Model import *
>>> from sqlalchemy.orm import aliased
>>> from sqlalchemy import *
>>> # Create aliases for the historical and rcp variables, so we can
>>> # distinguish them in the query
>>> histInstance = aliased(Instance)
>>> rcpInstance = aliased(Instance)
>>> rcp_hist = cmip5.query(rcpInstance, histInstance).join(
... histInstance, and_(
... histInstance.variable == rcpInstance.variable,
... histInstance.model == rcpInstance.model,
... histInstance.mip == rcpInstance.mip,
... histInstance.ensemble == rcpInstance.ensemble,
... )).filter(
... rcpInstance.experiment == 'rcp45',
... histInstance.experiment == 'historicalNat',
... )
>>> for r, h in rcp_hist:
... r.versions[-1].path, h.versions[-1].path
API¶
connect()¶
Session¶
The session object has a number of helper functions for getting information out
of the catalog, e.g. Session.models()
gets a list of all available
models.
-
class
ARCCSSive.CMIP5.
Session
[source]¶ Holds a connection to the catalog
Create using
ARCCSSive.CMIP5.connect()
-
files
(**kwargs)[source]¶ Query the list of files
Returns a list of files that match the arguments
Parameters: kwargs – Match any attribute in Model.Instance
, e.g. model = ‘ACCESS1-3’Returns: An iterable returning Model.File
matching the search query
-
outputs
(**kwargs)[source]¶ Get the most recent instances matching a query
Arguments are optional, using them will select only matching outputs
Parameters: - variable – CMIP variable name
- experiment – CMIP experiment
- mip – MIP table
- model – Model used to generate the dataset
- ensemble – Ensemble member
Returns: An iterable sequence of
ARCCSSive.CMIP5.Model.Instance
-
query
(*args, **kwargs)[source]¶ Query the CMIP5 catalog
Allows you to filter the full list of CMIP5 outputs using SQLAlchemy commands
Returns: A SQLalchemy query object
-
Model¶
The model classes hold catalog information for a single entry. Each model run variable can have a number of different data versions, as errors get corrected by the publisher, and each version can consist of a number of files split into a time sequence.
Each model class has a number of relationships, which can be used in a query to efficiently return linked data e.g.:
>>> q = (cmip5.query(Instance, VersionFile)
... .join(Instance.latest_version)
... .join(Version.files))
This query returns an iterator of (Instance
,
ARCCSSive.model.cmip5.File
) pairs and only needs to query the database
once, whereas using a loop requires a database query for each iteration.
-
class
ARCCSSive.CMIP5.Model.
Instance
(**kwargs)[source]¶ A combination of a CMIP5 Dataset and a single variable
Relationships:
-
files
¶ list[
ARCCSSive.model.cmip5.File
]: All files belonging to this dataset and variable, regardless of version
Attributes:
-
variable
¶ Variable name
-
experiment
¶ CMIP experiment
-
mip
¶ MIP table specifying output frequency and realm
-
model
¶ Model that generated the dataset
-
ensemble
¶ Ensemble member
-
realm
¶ Realm: ie atmos, ocean
-
-
class
ARCCSSive.CMIP5.Model.
Version
(**kwargs)[source]¶ A version of a model run’s variable
Relationships:
-
warnings
¶ [
ARCCSSive.model.cmip5.Warning
]: Warnings attached to this dataset version
-
files
¶ [
ARCCSSive.model.cmip5.File
]: Files belonging to this dataset version
Attributes:
-
version
¶ Version identifier
-
path
¶ Path to the output directory
>>> instance = cmip5.query(Instance).filter_by(dataset_id = 'c6d75f4c-793b-5bcc-28ab-1af81e4b679d', variable='tas').one() >>> version = instance.latest() >>> version = instance.versions[-1]
-
glob
()[source]¶ Get the glob string matching the CMIP5 filename
>>> six.print_(version.glob()) tas_day_ACCESS1.3_rcp45_r1i1p1*.nc
-
build_filepaths
()[source]¶ Returns the list of files matching this version
Returns: List of file names >>> pprint.pprint(version.build_filepaths()) ['/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20060101-20301231.nc', '/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc', '/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20560101-20801231.nc', '/g/data1/ua6/unofficial-ESG-replica/tmp/tree/pcmdi9.llnl.gov/thredds/fileServer/cmip5_css02_data/cmip5/output1/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/day/r1i1p1/tas/1/tas_day_ACCESS1-3_rcp45_r1i1p1_20810101-21001231.nc']
-
filenames
()[source]¶ Returns the list of filenames for this version
Returns: List of file names >>> sorted(version.filenames()) ['tas_day_ACCESS1-3_rcp45_r1i1p1_20060101-20301231.nc', 'tas_day_ACCESS1-3_rcp45_r1i1p1_20310101-20551231.nc', 'tas_day_ACCESS1-3_rcp45_r1i1p1_20560101-20801231.nc', 'tas_day_ACCESS1-3_rcp45_r1i1p1_20810101-21001231.nc']
-
tracking_ids
()[source]¶ Returns the list of tracking_ids for files in this version
Returns: List of tracking_ids >>> sorted(version.tracking_ids()) ['54779e2d-41fb-4671-bbdf-2170385afa3b', '800713b7-c303-4618-aef9-f72548d5ada6', 'd2813685-9c7c-4527-8186-44a8f19d31dd', 'f810f58d-329e-4934-bb1c-28c5c314e073']
-
-
class
ARCCSSive.model.cmip5.
File
(**kwargs)[source] A CMIP5 output file’s attributes
Relationships:
- attribute:: dataset
Dataset
: The dataset this file is part of- attribute:: version
Version
: This file’s dataset version- attribute:: warnings
- [
Warning
]: Warnings associated with this file - attribute:: timeseries
Timeseries
holding all files in the dataset with the same variables
Attributes:
attribute:: experiment_id attribute:: frequency attribute:: institute_id attribute:: model_id attribute:: modeling_realm attribute:: product attribute:: table_id attribute:: tracking_id attribute:: version_number attribute:: realization attribute:: initialization_method attribute:: physics_version