This case demonstrates use of an R language script to read and process long-range 'reanalysis' climate data (relative humidity) that is delivered in the industry-standard Network Common Data Format (netCDF).
Use Case demonstrates:
- Browsing netCDF file headers with netCDF Explorer
- Reading netCDF files with the R netCDF Library routines
- Processing long-range climate data read from netCDF files with R scripts
- Producing multiple-plot 'plot matrix' diagrams using R graphics commands
Many atmospheric and Earth science laboratories around the world produce and distribute gridded environmental data sets in which the values represent measurements or estimates of conditions within cells of a regularly-spaced grid. Examples of gridded data include Earth satellite imagery and the output from global or regional climate forecasting models that produce image simulation or weather forecasting time-series.
The Network Common Data Form (netCDF) is a machine-independant binary data format widely used by scientists to store and distribute gridded climate data sets, including multidimensional data sets such as time series data. As the use of the Net CDF format has grown, most standard data analysis software packages have added the capability to read and analyze data in this format. Click here for a list of organizations distributing Net CDF data.
Case Study: Relative Humidity Time Series for Lake Baikal
In the current case, an NCEAS scientist required long-term Relative Humidity (RH) estimates for a 58 year period (1948 - 2006), for the Lake Baikal region of Siberia. After a brief search, the scientist identified the NOAA/NCEAP Reanalysis 1 data produced by the NOAA Earth System Research Library / Physical Sciences Division (ESRL/PSD). These data sets are available online and are delivered in Net CDF format. In the following paragraphs, we document the data search, acquisition, and exploration process, including the open-source software tools available to explore and analyze the data sets.
Step 1: Search and acquire data set
We will use the ESRL/PSD Gridded Climate Data search site to identify and download relative humidity time series data. A series of four screen images shows the search and retrieval process for this use case: Image 1 and Image 2 show the search parameters that we will use: Pressure level (e.g., data reported at altitudes of constant barometric pressure) daily mean relative humidity estimates, based on measurements at some locations. Image 3 shows the geographic and date ranges specified for the search: 1948 - 2006, for a 3 degree (longitude) by 4 degree (latitude) region covering Lake Baikal. Image 4 confirms that the search request has delivered the requested data set.
The output from the online search is a Net CDF file containing the relative humidity time series, along with descriptive metadata.
Step 2: Review metadata with HDF Explorer
We use the HDF Explorer software, which reads Net CDF files as well as the more complex HDF files to review the time series data file's metadata. HDF Explorer's 'file tree viewer' interface, similar to Microsoft Explorer, enables users to drill down into the Net CDF file's metadata. Click here to see the HDF Explorer view of the sample Relative Humidity file's metadata, which has three major levels: Dimensions, Variables, and Attributes. The Dimensions section depicts the nesting order for the four parameters contained in the data set: longitude, latitude, pressure level (one only in this case), and time. Within each level, items with a corresponding yellow page icon can be displayed by clicking the icon. Click here to see the latitude (50 - 57.5 degrees), longitude (105 - 110 degrees), and several the displayed parameters for the 'rhum' parameter contained in the file
From reviewing the metadata, and from review of the ESRL/PSD Gridded Data web page, we see that the relative humidity data is organized as a 3-dimensional 'data cube' in which four latitudes (50, 52.5, 55,57.5) are rows, three longitudes (105, 107.5, 110) are columns, and 21000 + days (moving forward from January 1, 1948) are the depth or 'z' dimension. Extracting all of the 'z' values along a constant longitude/column and latitude/row produces a daily relative humidity time series.
Step 3: Read and explore Relative Humidity time series with R Statistical Environment
To keep file sizes manageable, data for the complete 1948 time series was downloaded in three . separate files: file '.42.nc' (1948 - 1963,), '48.nc' (1963 - 1986), and '.24.nc' (1986 - 2006). We developed two R programming language scripts to extract and explore the humidity time series from the Net CDF file: ReadPlotRhum.r. here is an outline of this script:
- Read the three Net CDF files into R data objects,
- Extract the time series for the user-specified latitude/longitude cell,
- Merges the three time series into one covering the entire time range,
- Generates a plot of the time series (click here to view the plot),
- Writes the time series as a two column, comma-separated-values (.csv) file (click here to view the file)
In order to validate the the time series has been constructed correctly, we decided to compare all of the possible time series that could be extracted from the 4 row, 3 column time series. The R script CreateTsPlotMatrix.r produces this time series matrix plot (click here to view the plot). The time series matrix plot shows similar trends over time, and similar standard deviations for each time series. Note that the single plot resides in row 2, column 2 of the plot matrix.
Here are the files discussed in this case study:
Click here to download a .zip archive containing all files (including Net CDF files) discussed in this example.
netCDF File FormatNetwork Common Data Form file format home page
HDF Explorer Data Viewer:Space Research Software (distributor of HDF Explorer) home page
R Programming Environment:The R Project (R Programming Environment Info Center) home page
Point of Contact for this Use Case: reeves [at] nceas.ucsb.edu (Rick Reeves), NCEAS Scientific Programmer
This Use Case compiled May, 2007