Compile Multi-Layer Geospatial Data Base for Speciation Rate Study

Download data and R Code for this example

Project Overview:

An NCEAS scientist has obtained sighting point locations (including latitude/longitude coordinates) for several North American species from the Global Biodiversity Information Facility (GBIF) data portal. She plans to perform Exploratory Data Analysis to evaluate relationships between propagation of these species and six physical environmental parameters:

  • Terrain Elevation
  • Terrain Slope and Aspect Angle
  • Climate Parameters (e.g., Temperature, Precipitation)
  • Soil Health: Per Cent Organic Content and Soil Ph
  • Surface Cover
  • Vegetation Density

To perform the analysis, the scientist must assign an estimate for each of the six parameters to each of the species sighting point locations. This assignment is performed by 1) acquiring spatially-registered raster image layers for each parameter, 2) superimposing the individual maps in a spatial data base in which the layers are precisely spatially aligned and cover exactly the same area, and 3) overlaying (superimposing) the sighting points on the data base and assigning the coincident raster values from each layer to each point.

Once the spatial data base is has been created, it is the basis for an open-ended series of EDA 'steps' in which each analytical 'step' potentially generates a new data base 'layer'.

Exploratory spatial data analysis is a compute-intensive process. Fortunately, several open-source Spatial Analysis tools for managing and processing geospatial data are available. The R statistical computing environment is uniquely suited to this project, as it contains a rich set of statistical methods as well as strong spatial data classes. Spatial data processing utilities, such as the GDAL Utilities, can be used within programming scripts to efficiently transform groups of (large) spatial data files. And GIS 'viewer' software such as Quantum GIS simultaneously display multiple spatial data layers, enabling interactive exploration of relationships between layers. The scientist will use all of these tools during this project.

This case study demonstrates each step in this scientist's spatial EDA project.

Step 1: Acquire Data Layers

GlobalDEMImage

GTOPO30 Global Digital Elevation Mosaic

North America Shape File

Study Area Polygons

NorthAmerica AVHRRLandCover

AVHRR-Derived Land Cover

Advanced remote sensing satellites, coupled with advances in scientific computing and Web-enabled data repositories, make an abundance of digital Earth Systems data available to scientists. The challenge is: selecting the most appropriate data for a project from the vast population of candidate data sets . Briefly, the scientist must narrow the alternatives to data sets that 1) measure the deisred phenomena (or a valid surrogate) consistantly cover the study area(s); 2) at (or across all) the required date(s) and time(s); 4) at an appropriate and consistent (throughout all study areas) spatial resolution.

The scientist considered the the study area size and distribution of species points, and after consulting with NCEAS scientific programmers, selected a spatial resolution of 1 square kilometer for the raster data sets. The species sighting location point coordinates are in unprojected, Cartesian (degree) coordinates; to minimize data preprocessing, the data grids are also in unprojected geographic space.

The NCEAS Scientific Computing Web Site Ecological and Spatial Data Sources Portal portal contains links to high-quality data portals for 12 data categories of interest to Ecologists. Reviewing the data sources on the NCEAS site (and other places), the scientist selected these sources for the environmental parameters:

With the exception of the MODIS NDVI data sets, all selected data layers use unprojected geographic coordinate space at 1 kilometer spatial resolution. The MODIS data uses the Albers Equal Area projection and a spatial resolution of 250 meters. Therefore, this layer must be resampled to 1 km resolution and transformed to unprojected geographic space.

Cell values in five of the six layers contain actual measurements (e.g., temperature); however, the Harmonized World Soil Database (HWSD) is more complicated. It consists of a raster image file and a relational database. The image cell values are relational indexes into the database; each cell is thus linked to quantitative (e.g., soil Ph) and categorical (e.g., soil texture class) soil attributes (click here for comprehensive documentation). For these data, we use an R script to generate Soil Organic Content and Soil Ph data layer 'maps' from the HWSD grid and relational database.

Downloading data sets: In the case of single-file 'global' coverage data sets (e.g., the SRTM Global Mosaic), the scientist downloads the entire file and then extracts the subset corresponding to their study region. Many geospatial data servers also provide a map- or grid-based search interface with which user specifies the specific coordinate boundaries (and sometimes the observation times) for the desired data.

Data servers typically 'deliver' requested data products to the local computer in a compressed, archived format (e.g., a .zip or .rar file). The user then 'unpacks' these archives and then reviews the data using a GIS 'viewer' (Quantum GIS). Alternatively, the scientist can review the accompanying spatial metadata using command-line or script-based tools (GDAL Utilities).

Step 2: Compile the Spatial Data Base

Typically, collections of spatial layers from multiple sources require alignment so that all layers represent the identical geographic region at exactly the same spatial resolution.

Three raster processing operations are used to align data layers:

Clipping: Extracts an image subset using a vector polygon 'template'.

Resampling: Redistributes image cells into a new spatial grid with different-sized cells.

Projection: Using standard map projection formulas, transform locations on the curved Earth surface onto a flat surface for mapping and analysis purposes. In this example, we will transform one map layer into the same map projection model used in the other data layers.

Often, two or more of these operations (e.g., projection and resampling) are combined in a single geoprocessing step. Once all layers have been aligned, they are usually combined into a spatial data structure that facilitates other spatial operations.

The R raster package

The six selected data layers have a high degree of initial alignment. All the layers are 'global' in scope; they cover the entire Earth surface. Five of the layers employ the same (unprojected Geographic) coordinate space; four of the layers have the same spatial resolution (1 Km).

Here is a summary of the required alignment processing:

All layers: Clip to the North America region

WORLDCLIM Parameters: Generate a separate Temperature or Precipitation 'map' (image) for each month of each year, for the entire time series (Manage potentially large volume of image files).

Slope/Aspect and MODIS NDVI: Resample to 1 Kilometer spatial resolution

MODIS NDVI: Project from Albers Equal Area to unprojected Geographic coordinate space.

Compiling this data base requires the development of programming scripts for each 'production' step; separate scripts perform the Exploratory Data Analysis. Following are the most important data base development scripts used in the data base development scripts. These and other scripts are provided the download for this example

Script 1: Extract subset from a raster layer matching a vector polygon Shape File - re-project and resample to dataset specification


#bin/bash
###################################################################################
# ClipRasterWShpGdalProj.sh: Bourne Shell script uses GDAL Library
#  to clip large raster file to the outline of an ESRI Polygon Shape File
# 
# Derived from: http://linfiniti.com/2009/09/clipping-rasters-with-gdal-using-polygons/
#
# Inputs: 1) Large raster file to be 'clipped':  .tif, .img., or any other GDAL-compatible raster file
#         2) Polygon shape file group (with one file having the .shp sufix) to use as clipping polygon.
#         3) Map projection string (PROJ.4 format), assigned to 'gdal_translate' command output.
#            ***IMPORTANT: this string is NOT VALIDATED by script!
# Output: 1) Clipped raster subimage, currently in .img format. 
#                 The format can be changed by changing the string '.img' to the suffix
#                 of another GDAL-compatible file format.
#            *** The output raster file name is derived from the input raster file name by
#                inserting the string: '_shpclip' before the file suffix.
#
# To make this script executable run the command:  chmod +x ~/bin/ClipRasterWShpGdal.sh
# To run the script from command line:
# $~/bin/ClipRasterWShpGdal myshapefile.shp mybigraster.img (or .tif, etc)
# Author: Rick Reeves, National Center for Ecological Analysis and Synthesis
# January 25, 2011. 
# Notes: This script created and tested under the Bourne shell on the NCEAS 'Eos' server.
#        No input validity checking yet. This script will NOT OVERWRITE an existing output file
#        with the same name, but will print a GDAL error message and stop.
###################################################################################
SHPFILE=$1
RASTERFILE=$2
PROJSTRING=$3
BASE=`basename $SHPFILE .shp`
echo "shapefile is $SHPFILE"
echo "basename is $BASE"
echo "projection string is $PROJSTRING"
# +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
# This script produces a valid .img raster file in which all pixels 
# outside of the clipping shapefile are set to ZERO.
EXTENT=`ogrinfo -so $SHPFILE $BASE | grep Extent |
                sed 's/Extent: //g' | sed 's/(//g' | sed 's/)//g' | sed 's/ - /, /g'`
echo " EXTENT is $EXTENT"
EXTENT=`echo $EXTENT | awk -F ',' '{print $1 " " $4 " " $3 " " $2}'`
echo "gdal_translate Clipping to coordinate rectangle: $EXTENT"
gdal_translate -projwin $EXTENT -of HFA $RASTERFILE -a_srs "$PROJSTRING"
./`basename $RASTERFILE .tif`_bbclip.tif
echo "gdalwarp called..."
# note: use 'near' resampling for land cover maps, 'lanczos' for continuous data such as elevation.
gdalwarp -co COMPRESS=DEFLATE -co TILED=YES -of HFA -r near -cutline $SHPFILE `basename $RASTERFILE .tif`_bbclip.tif ./`basename $RASTERFILE .img`_shpclipProj.img

Script 2: Transform MODIS NDVI (United States only) maps to Study Area reference frame

The scientist decided to evaluate NDVI as a measure of vegetation density using the best available existing data set: MODIS-derived NDVI data maps for the continental United States obtained from the University of Maryland Global Land Cover Facility (GLCF). This data set provides a six-year time series (2001 - 2006) NDVI record, at 250 meter spatial resolution, in the Albers Equal Area map projection.

The scientist used the GDAL Utilities to transform a subset of the maps (Spring - Fall, 2006) into the same spatial 'reference frame' (1 km spatial resolution / Geographic coordinate system) as the other layers. Following is a sample of the GDAL command script that performs the transformation:


# GDAL command line utility command used to project (to Geographic coordinates) 
# and resample (to 1 Kilometer  or .008333 degrees spatial resolution) the MODIS NDVI time series.
gdalwarp -s_srs "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=0 +lon_0=-96 +x_0=0 +y_0=0 
              +a=6370997 +b=6370997 +units=m +no_defs" 
              -t_srs  " +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0" 
              -tr  .008333 .008333 US.Albers.2006321.ndvi.tif USA_ndvi_Jd2006321_GP.tif

Click here to view notes describing selection and transformation of the NDVI files.

Script 3: Create Monthly Climate Parameter Maps, WORLDCLIM data

The following script reads the as-downloaded WORLDCLIM climate grids, and generates a set of monthly estimate raster maps in the desired spatial reference frame:


###################################################################################
#
# CreateWorldClimDataLayers.r
#
# R script reads raw (.bil format files) WorldClim climate grids, then
# transforms the grids into floating point Tif files clipped to an ESRI
# Shape File boundary and re-projected into the projected coordinate
# system contained in the Shape File.
#
# Information about WorldClim climate data layers: www.worldclim.org
# 
# Inputs: 
# 
#   1) A collection of WorldClim .bil format climate grid files covering 
#      a specific part of the Earth. 
#      Note: The grid cells contained in the files used to date are
#      in unprojected Geographic lat/long coordinates.
#   2) ESRI Shape file containing a boundary polygon in the geographic
#      coordinate system
#   3) ESRI Shape file containing the boundary polygon in unprojected 
#      geographic coordinate space. This is 
#
# Outputs: 
# 
#   1) A collection of IMAGINE .img image files, one per input .bil file,
#      each containing the climate parameter grid data 'clipped' to the boundary 
#      shape file. 
#      Grid cell values are 16-bit integer format
#
# Note: This script written to produce the climate parameter data layer
# for Leslie Lancaster.
#
# Author: Rick Reeves / NCEAS / January 20, 2011
###################################################################################
#
CreateWorldClimNorthAmLayers <- function()
{
   require(raster) # also loads sp package
   require(rgdal)

# WorldClim data are monthly averages, one physical file per month.
# This program generates one .tif file per month, corresponding
# to the Ohio study area, projected into the Albers Equal Area projection.

   setwd("~/LancasterGBIF/SpeciesDataLayers")
   curDir <- getwd()   

# shape (inFiles)
   
   for (iCtr in 3: nFiles)
   {
       climateRasterIn <- raster(inFiles[[iCtr]])
       message(sprintf("processing WorldClim file: %s....",inFiles[[iCtr]]))

# Open a connection to the same file, read it into a binary integer vector
   
      rawBinFp <- file(inFiles[[iCtr]],"rb") 
      nGridCells <- ncell(climateRasterIn)   
      rawBinVec <- readBin(rawBinFp, integer(),n=nGridCells,size=2)
      close(rawBinFp)

# WorldClim data uses -9999 as a missing value. We reset this to the R-friendly 'NA'
   
      flags = (rawBinVec == -9999)   
      subset = rawBinVec[!flags]
      rawBinVec[flags] = NA      
   
      climateRasterNew <- setValues(climateRasterIn,rawBinVec,1)
      climateRasterNew <- setMinMax(climateRasterNew) 

# Update other key parameters with raster package interface routines      
        
      climateRasterNew@file@datanotation <- "INT2S"         
      projection(climateRasterNew) <- NaGeogProjString
 
# Clip the preprocessed raster to Ohio boundary, re-project to Albers, write a tif file.

message("Cropping Data Stack to North America border...")
   
      cropImageEx <- extent(as.matrix(NaGeogProj@bbox))   
      climateStackNaCrop <- crop(climateRasterNew,cropImageEx,
                                                      filename="./NaCropTemp.img",
                                                      overwrite=TRUE)
   
      sVec <- strsplit(inFiles[[iCtr]],"\\.")
      sOutFile <- sprintf("%s_NaGeog.img",sVec[[1]][1])
      message("...writing output file hit key.. ",sOutFile) 
#      browser()
      writeRaster(climateStackNaCrop,filename=sOutFile,
                          format="HFA",datatype="INT2S",
                          overwrite=TRUE)
   }
message(sprintf("WorldClim files processed: %d",iCtr))
   setwd(curDir)
}
# R script ends

Script 4: Create Soil Parameter Maps from Harmonized World Soil Database (HWSD).

This script reads the creates soil parameter raster maps, in the correct spatial reference, using the HWSD relational data base and image raster:


###################################################################################
#
# createSoilsRasterMap.r 
# 
# R script generates NEW raster soil parameter image maps using the raster and 
# relational database components of the Harmonized World Soil Database (HWSD). 
#
# More information and download of the HSWD is available from:
#
#   http://www.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/index.html
# 
# Overview: The HWSD consists of a raster grid which covers the Earth surface
# at a spatial resolution of 30 arc seconds (nominal 1 Km), and a set of relational
# data base (Microsoft Access) files containing a collection of terrain and soil
# attributes. Each HWSD raster grid cell value is an index to the relational 
# database record(s) (rows) whose attributes (columns) describe physical conditions
# within that cell. 
#
# HWSD users create 'attribute maps' for grid cell collections by creating 
# custom relational queries against the MS Access database files. Query results 
# can be collected in a (new) raster image object and written to a disk file.
#
# This R script uses the raster and RODBC packages to process incoming and 
# outgoing raster files and to create / process queries against the HWSD 
# data base files.
# 
# Inputs: 
# 
#   1) The HWSD raster grid map, possibly 'clipped' to a sub-global ROI 
#      Note: The grid cells contained in the files used to date are
#      in unprojected Geographic lat/long coordinates.
#   2) One (currently) or more (TBA future version) HWSA relational data base
#      files (MS Access format) containing physical attributes keyed to the 
#      HWSD raster grid.
#   3) One (for first version) user-specified attribute name from the HWSD data
#      base file. 
#
# Outputs: 
# 
#   1) A new raster image file, in user specified format (e.g., GeoTIFF or HFA),
#      containing the computed soil quality attribute for each input raster cell.
#
# Note: This script written to produce the climate parameter data layer
#       for Leslie Lancaster.
#       First version, currently works ONLY for numeric HWSD soil attributes for
#       whom computing a weighted arithmetic mean makes sense. Later versions will
#       (also) produce valid 'combined' values for qualitative factors such as soil
#       type names or density category measurements.
#       NOTE: the reslults of this script have not been validated and are subject
#       to change.
# 
# Author: Rick Reeves / NCEAS / February 13, 2010
###################################################################################
#
createSoilsRasterMap <- function()
{
   require(RODBC)  # NOTE: Under Windows 7, RODBC connections works only under 32 bit R 2.12. 
                   # Trying odbcConnect under Win 7 / 64 bit gives error: 
                   # "DSN contains an architecture mismatch between the Driver and Application"
   require(raster)
   setwd("f:/Projects/LancasterGBIF/HWSD")   

# read the Access database
   
   inRasterName <- "hwsdGPClipNorthAmerica.img"
   outFileName <- "SoilMapT_PH_H20.img"
   
   soilMapRaster <- raster(inRasterName)
   
# Refer to the HWSD database documentation to select a numeric parameter to extract.
# Later, 
   
   sSoilAttrToGet <-"T_PH_H2O"
   sRowWeightAttr <-"SHARE"
   
# Set up an RODBC connection, and read the main
# data base table, get a list of tables 

# This DB connection created under Windows 7 using ControlPanel/ODBC Administrator
# Need another technique for other platforms - 

   myCon <- odbcConnect(dsn="HWSD_MDB") 
   sTableList <- sqlTables(myCon)  
   
# Use sqlFetch to read the entire table into a data frame....
# Fastest technique: fetch the entire table at once, then use 
# R subset commands to extract row and column subsets.....

   myTableFetch <- sqlFetch(myCon,"HWSD_DATA")   
   
# OR Use a SQL Query to read (a portion of) table into a data frame

#   myTableQuery <- sqlQuery(myCon,"SELECT * from HWSD_DATA")   
       
# generate a NEW raster map by using input map cell values as indexes
# into the table and extracting the corresponding selected 'hard parameter'
# value
# Note: the raster package documentation for writeValues supplied most of the method
#       used here.
   
   procBlocks <- blockSize(soilMapRaster)
   outRaster <- raster(soilMapRaster)                                                                              
   outRaster@file@datanotation <- "FLT4S"
   outVec <- vector(mode="numeric",length= (procBlocks$size * soilMapRaster@ncols)) 
   outRaster <- writeStart(outRaster,filename=outFileName,overwrite=TRUE)
    for (iCtr in 1 : procBlocks$n) #  : 1)
   {
      message(sprintf("processing: block %d of %d...",iCtr,procBlocks$n))
      vBlock <- getValuesBlock(soilMapRaster, row=procBlocks$row[iCtr], nrows=procBlocks$size)

# Loop over cell values in Vblock to compute a soil map value from one selected
# attribute in HWSD_DATA table. Note: Each raster grid cell is comprised of => 1 Soil Unit type.
# Thus we may need to read > 1 HWSD_DATA records (with same keyed MU_GLOBAL attributes,
# and generate a 'blended' (interpolated) value from the group of records.
# Note: when the user desires one of the "coded" soil values, this code section
# will change.
# At present, the soil map pixel values are the weighted mean of the matching row(s)
# attribute values. 

      nonZeros <- which(outVec != 0,arr.ind=TRUE)       
      for (jCtr in nonZeros) # modify only the non-zero raster cells
      {
         matchRows <- which(myTableFetch[,"MU_GLOBAL"] == outVec[jCtr],arr.ind=TRUE)  
         outVec[jCtr] = weighted.mean(myTableFetch[matchRows,sSoilAttrToGet],
                                                         myTableFetch[matchRows,sRowWeightAttr])
      }
      
# use contents of 'vBlock' to extract the desired physical soil type parameter for the cell
# then write the entire modified block to the new raster.

      outRaster <- writeValues(outRaster, outVec, procBlocks$row[iCtr]) 
      message(sprintf("...processed: block %d",iCtr,procBlocks$n))      
   }

# Write the new soilValue raster.

   message("Writing output file....")
   outRaster <- writeStop(outRaster)
   message("...Output file is written!")
}

Step 3: Overlay Sighting Points on Raster Layers / Assign Values

The next step: Combine the aligned dataset layers into an R raster spatial data structure. Then, overlay sighting points on the layers and assign spatially corresponding values to each point. This creates the attribute list used for Exploratory Data Analysis

Script 5: Create spatial data structure; overlay sighting point locations on raster data base; assign layer values to points.


###################################################################################
#
# AssignLayerAttrsToShapePoints: R function assigns points from raster 
# data layers to a set of point locations within the raster(s) boundaries. 
#
# Inputs:
# 1) A set of raster data layers, all in the same (projected) coordinate system,
#    the same spatial resolution, and the same spatial extent. One way to assign
#    all layers the same extent is to have 'clipped' each layer using the same
#    polygon shape file. 
#    The layers are read from a set of raster image files, in a GDAL-compatible
#    format (s.g., TIF or IMG), stored on disk.
#
# 2) A set of point locations within the boundaries of all the rasters and in
#    the same (projected) coordinate space, stored in an ESRI Point Shape File.
#   
# Output:
# 1) An ESRI Point Shape File contaning the input point locations, but with new
#    columns, containing the raster layer values extracted from the raster layers,
#    added to the attribute (dbf) file. 
#
# Notes: The user MUST validate that all input spatial files have the same (projected)
#        coordinate system and that all raster layers have the same spatial resolution;
#        this script does not perform input validation.
#
#        This script also demonstrated the use of the rasterStack object to assign 
#        a group of data layers with IDENTICAL spatial extent to a set of points.
#
# Written by Rick Reeves, NCEAS Scientific Programmer, for postdoc Lesley Lancaster.
# February, 2011.
# Copyright 2011 National Center for Ecological Analysis and Synthesis
#
###################################################################################
#
assignLayersToPoints <- function()
{
   require(raster)  # loads sp, gstat, several others
   require(rgdal)   # Shape File I/O

   curDir <- getwd()
   setwd("/data/computer/reeves/LancasterGBIF/SpeciesDataLayers")
   
   outPointFileName <- "SpecPointsWithAttrs"

# Read in the point location and the 'clpping polygon' shape files. We will use both.

   studyAreaBoundary <- readOGR("./NorthAmericaWGS84.shp","NorthAmericaWGS84")
   speciesPointLocs  <- readOGR("./GBIFSpeciesPoints/SampleGBIFPoints.shp","SampleGBIFPoints")
                     
# Here is the list of raster data layer file names. We will start with four layers. 
# All layers in Geographic / WGS84 projection. First three layers have 1 km spatial resolution.

   layerFileNames = c("./SRTM_GTOPO_u30_mosaic_bbclip.tif", # Digital Elevation Model
                      "./glcfAVHRR1KmLandGlobal.tif_shpclip.img",       # Land Cover
                      "./SoilMapT_PH_H20.img",         # HWSD-derived Soil PH 
                      "./NDVIUsa1KmExpanded.img", # MODIS-derived NDVI (2006)
                      "./prec_6_NaGeog.img",            # WORLDCLIM June precip
                      "./tmax_6_NaGeog.img",           # WORLDCLIM June max T
                      "./tmean_6_NaGeog.img")        # WORLDCLIM June mean T 
                                    
# Prepare the reference layer: create an empty raster with correct extent/coordinate space 
# and spatial resolution, and 'implant' the input data into the 'blank' layer: 
#
# 1) Read the data into a layer,
# 2) Create a new raster layer with the correct extent, projection, and spatial resolution,
# 3) Resample the 'different' data into the blank layer, creating a correct layer with some missing data,
# 4) Add this new layer onto the raster stack.
    
   elevLayer <- raster(layerFileNames[1])
   lcoverLayer <- raster(layerFileNames[2])
   soilPhLayer <- raster(layerFileNames[3])
   ndviLayer <- raster(layerFileNames[4])   
   precipLayer <- raster(layerFileNames[5])
   tMaxLayer <- raster(layerFileNames[6])
   tMeanLayer <- raster(layerFileNames[7])      
   
# The last layer (data for continental USA only) needs to be implanted
# into a larger extent with 1 km resolution.
    
   dataLayerExtent <- extent(as.matrix(studyAreaBoundary@bbox))

# Make the .dbf component of the output point Shape File more useful
# for future analysis: add point Latitude/Longitude as attributes.

   speciesPointLocs@data <- cbind(speciesPointLocs@data,coordinates(speciesPointLocs))
   colnames(speciesPointLocs2@data) <- c("PointID","Longitude","Latitude")   
        
   newCol <- extract(elevLayer,as(speciesPointLocs,"SpatialPoints"))
   elevMeters <- as.vector(newCol)  # vector names will be the @data component column names 
   speciesPointLocs@data <- cbind(speciesPointLocs@data,elevMeters)
    
   newCol <- extract(lcoverLayer,as(speciesPointLocs,"SpatialPoints"))
   landCover <- as.vector(newCol)
   speciesPointLocs@data <- cbind(speciesPointLocs@data,landCover)    
   
   newCol <- extract(soilPhLayer,as(speciesPointLocs,"SpatialPoints"))
   soilPh <- as.vector(newCol)
   speciesPointLocs@data <- cbind(speciesPointLocs@data,soilPh)    
    
   newCol <- extract(ndviLayer,as(speciesPointLocs,"SpatialPoints"))
   ndvi <- as.vector(newCol)
   speciesPointLocs@data <- cbind(speciesPointLocs@data,ndvi)
 
# Working with a rasterStack: 

# The three WorldClim layers have IDENTICAL sizes and extents,
# so they can be placed in a rasterStack without getting the R error: 
# "Error in compare(x, r) : Different bounding box"
# Aggregate the climate layers into a rasterStack:

   climateStack <- stack(precipLayer,tMaxLayer,tMeanLayer)

# Extract ALL climate attributes for input points in one extract() call
# Then give the attributes/ columns new names, and append them to the 
# point shape file.

   climateAttrs <- extract(climateStack,as(speciesPointLocs,"SpatialPoints"))
   colnames(climateAttrs) <- c("precip","tMax","tMean")
   speciesPointLocs@data <- cbind(speciesPointLocs@data,climateAttrs)
   
# Write the interpolated GDD parameter estimates to an ESRI Point Shape file

   message(sprintf("Writing point shape file with new attributes: %s - hit key",outPointFileName))
   writeOGR(speciesPointLocs,".",outPointFileName,driver="ESRI Shapefile")
   setwd(curDir)                                         
}   

Resulting Point File / Augmented Attribute Table

FinalAttribTable

Attributes Assigned to Sample Points

Step 4: Exploratory Data Analysis: One Example


# R script for performing EDA.
# R script ends
Gain insight into the information contained in the data layer collection

We have assembled a collection of data layers containing structure that we propose influences speciation rate within the study area. Now, we need insight into the information within each layer as well as the relationships between the layers. We will use Exploratory Data Analysis to help develop this insight for four of the data layers: Terrain Slope, Vegetation Index (NDVI), Soil Ph, and Precipitation.

First: review the statistical range of the layers: Use R to compute the standard 'five-value' summary and a box plot for each layer

content

Next: look for variation in the speciation within the study area over the different time periods. Are there time periods and/or locations that exhibit increased (or decreased) speciation rates
Next: determine the variation for the layer values actually assigned to each Moanrch Station Point (from the Point Overlay operation).

Does structure exist that may explain the variation in speciation rates?

Finally: explore the correlation between each data layer and the speciation rate 'surface' created by the station location. Within each data layer, does geographic structure corellate spatially with variation in speciation (over time)?

content

Learning More:

 

National Institute of Standards (NIST) Engineering Statistics Handbook Chapter: Exploratory Data Analysis

The R spatial-anayst.net WIKI Discussion site for users interested in advanced applications of geocomputational tools

scicomp: