|
|
Reading and Processing Climate Data
in Network Common Data Form (netCDF) Using R
This
case demonstrates
use of an R language script to read and process long-range 'reanalysis'
climate data (relative humidity) that is delivered in the
industry-standard Network Common Data Format (netCDF).
This Use Case will demonstrates how-to:
- Browse netCDF file headers
with netCDF Explorer
- Read netCDF files with the R
netCDF Library routines
- Process long-range climate data
read from netCDF files with R scripts
- Produce multiple-plot 'plot
matrix' diagrams using R graphics commands
Many
atmospheric and Earth science laboratories around the world produce and
distribute gridded environmental data sets
in which the values represent measurements or estimates of conditions
within cells of a regularly-spaced grid. Examples of gridded data
include Earth satellite imagery and the output from global or regional
climate forecasting models that produce image simulation or weather
forecasting time-series.
The
Network Common Data Form (netCDF)
is a machine-independant binary data format widely used by scientists
to store and distribute gridded climate data sets, including
multidimensional data sets such as time series data. As the use of the
Net CDF format has grown, most standard data analysis software packages
have added the capability to read and analyze data in this format.
Click here
for a list of organizations distributing Net CDF data.
Case Study: Relative Humidity
Time Series for Lake Baikal
In the current case, an NCEAS scientist required long-term Relative
Humidity (RH) estimates for a 58 year period (1948 -
2006), for the Lake Baikal region of Siberia. After a brief search, the
scientist identified the NOAA/NCEAP
Reanalysis 1
data produced by the NOAA Earth System Research Library /
Physical
Sciences Division (ESRL/PSD). These data sets are available online and
are delivered in Net CDF format. In the following paragraphs, we
document the data search, acquisition, and exploration process,
including the open-source software tools available to explore and
analyze the data sets.
Step 1: Search and acquire data set
We
will use the ESRL/PSD Gridded Climate Data search
site to
identify and download relative humidity time series data. A series of
four screen images shows the search and retrieval process for this use
case: Image 1
and Image 2
show the search parameters that we will use: Pressure level (e.g., data
reported at altitudes of constant barometric pressure) daily mean
relative humidity estimates, based on measurements at some locations. Image 3
shows the geographic and date ranges specified for the search: 1948 -
2006, for a 3 degree (longitude) by 4 degree (latitude) region covering
Lake Baikal. Image 4
confirms that the search request has delivered the requested data set.
The output from the online search is a Net CDF file containing the
relative humidity time series, along with descriptive metadata.
Step
2: Review metadata with HDF Explorer
We use the HDF Explorer
software, which reads Net CDF files as well as the more complex HDF files to review
the time series data file's metadata. HDF Explorer's 'file tree viewer'
interface, similar to Microsoft Explorer, enables users to drill down
into the Net CDF file's metadata. Click here
to see the HDF Explorer view of the sample Relative Humidity
file's metadata, which has three major levels: Dimensions, Variables,
and Attributes. The Dimensions section depicts the nesting order for
the four parameters contained in the data set: longitude, latitude,
pressure level (one only in this case), and time. Within each level,
items with a corresponding yellow page icon can be displayed by
clicking the icon. Click here
to see the latitude (50 - 57.5 degrees), longitude (105 - 110 degrees),
and several the displayed parameters for the 'rhum' parameter contained
in the file
From reviewing the metadata, and from review of the ESRL/PSD Gridded
Data web page, we see that the relative humidity data is organized as a
3-dimensional 'data cube' in which four latitudes (50, 52.5, 55,57.5)
are rows, three longitudes (105, 107.5, 110) are columns, and 21000 +
days (moving forward from January 1, 1948) are the depth or 'z'
dimension. Extracting all of the 'z' values along a constant
longitude/column and latitude/row produces a daily relative humidity
time series.
Step 3: Read and explore Relative
Humidity time series with R Statistical Environment
To
keep file sizes manageable, data for the complete 1948 time series was
downloaded in three . separate files: file '.42.nc' (1948 - 1963,),
'48.nc' (1963 - 1986), and '.24.nc' (1986 - 2006). We developed two R
programming language scripts to extract and explore the humidity time
series from the Net CDF file: ReadPlotRhum.r.
here is an outline of this script:
- Read the three Net CDF files into R data
objects,
- Extract the time series for the
user-specified latitude/longitude cell,
- Merges the three time series into one
covering the entire time range,
- Generates a plot of the time series
(click here to view the
plot),
-
Writes
the time series as a two column, comma-separated-values (.csv) file
(click here to view
the file)
In order to validate the the time series has been
constructed correctly, we decided to
compare all of the possible time series that could be
extracted from the 4 row, 3 column time series. The
R script CreateTsPlotMatrix.r
produces this time series matrix plot (click here
to view the plot).
The time series matrix plot shows similar trends over time, and similar
standard deviations for each time series. Note that the single plot
resides in row 2, column 2 of the plot matrix.
Here are the files discussed in this
case study:
ReadPlotRhum.r / CreateTSPlotMatrix.r FirstPlot.jpg / RhPlotMatrix.jpg /YearlyMeanRH.csv
Click here
to download a .zip archive containing all files (including Net CDF files) discussed in this example.
Learning
More:
Net CDF
File
Format:
Net CDF (Network Common Data
Form) file format home page
HDF Explorer:
Space Research Software
(distributor of HDF Explorer) home page
R
Programming Environment:
The R Project (R Programming Environment Info Center) home page
Point of Contact for this Use Case: Rick Reeves, NCEAS Scientific Programmer reeves@nceas.ucsb.edu
This Use Case compiled May, 2007
|