Polygon Overlay Analysis

Download data and R Code for this example

Project Requirement:

Polygon Overlay operations determine the spatial coincidence (if any) of two polygon data layers, or between polygon and point layer, usually creating a new data layer in the process. Polygon overlay techniques are often used by field scientists to explore the relationships between spatial attributes, stored as layers in a geophysical data model. Examples of polygon data are: species geographic range ranges, biomes, and watersheds. Examples of point data are: species sightings, breeding pair nest locations, or measurement stations.

Three useful (and widely used) polygon overlay operations are:

Intersection (logical AND): The common or shared area between two overlapping polygons.

Union (logical OR): The combined areas of two possibly overlapping polygons.

Point-in-Polygon (logical AND): Between a point and polygon layer, the subset of points located within the polygon boundary.

Here, we demonstrate overlay operations using a collection of point and polygon species range data sets collected in South America, and methods from the PBSmapping package. We answer four questions about the study area covered by our geospatial data:

1) What is the area of each Species Range?
2) What is the total area of South America covered by (one or more) species ranges?
3) What portion of Brazil is covered by Armadillo species range?
4) Do all of the Armadillo species sightings occur inside the Armadillo species range?

Input Data / Format:

Point File: Mammalian Species Sightings (ESRI Point Shape File) from NatureServe data set.

Base Map: DIVA-GIS Global Administrative Boundaries. Polygon Files: Mammalian Species Range maps (ESRI Polygon Shape Files) from NatureServe Digital Distribution Maps of Western Hemisphere Mammals data set.

Workflow:

1) Read input ESRI Shape files into R Spatial Data Frame objects.

2) Re-Project Data Frames into Albers Equal Area projection, to preserve areas and use meters for area calculations.

3) Translate Data Frame objects into PBSmapping-compatible PolySet and PointSet objects,
required PBSmapping package overlay methods.

4) Use PBSmapping methods to answer each of the four questions.

5) For each question, report the results, displaying appropriate summary maps.

Discussion:

To review and run the example:

1) Download the Zip file archive, unpack into a folder.

2) Be sure that your R (version 2.9.2 or later) environment includes sp, maptools, rgdal packages.

3) Start R, set working directory to the folder containing the downloaded and unpacked example.

4) At the R command prompt, enter 'source("./SpeciesRangeLayerOverlay.r")'

5) Enter the command PolygonOverlayAnalysis(). Four questions are answered in sqeuence, and summary plots are displayed in four separate graphics windows.'

Results:

All Layers Armadillo Range Union
All Data Layers Union / Intersection: Armadillo Ranges
Sightings In/Out of Range  
Armadillo Sightings In/Out of Range  
R Script Output
R Overlay Script Output

R Script:

library(maptools)   # used here to read in the spatial data
library(PBSmapping) # for GIS-like geoprocessing operations
library(rgdal)      # required in this script for spTransform()
   
# This example features habitat range data or signtings for three species:
#
#   1) Southern Armadillo
#   2) Greater Armadillo
#   3) Vesper Mouse (species sightings)
#
# Answer four questions about the species ranges:
#
#  1) What is the area of each Range?
#  2) What is the total area of South America covered by (one or more) species ranges?
#  3) What portion of Brazil is covered by Armadillo species range?
#  4) Are all of the Armadillo species sightings inside the Armadillo species range?

# Use Maptools routine to ingest the polygon shapefiles: South America country boundaries,
# the mammal species range polygon files, and the mammal signting location point file.
# All four polygon files were previously placed into the same map projection.

# Read in South America boundaries
s.america <- readShapeSpatial("SouthAmericaStates")
   
# Read in Southern Armadillo range
southern <- readShapeSpatial("SouthernArmadilloPoly")

# Read in Greater Armadillo range
greater <- readShapeSpatial("GreaterArmadilloPoly")

# Read in Greater Armadillo individual sightings
greater.pts <- readShapeSpatial("GreaterArmadilloPts")

# The input data all reside in Geographic coordinate space, in units of
# degrees (latitude and longitude). To calculate meaningful polygon
# areas, we transform each layer into an 'equal area' projection, with
# units of kilometers.

# Explicitly define the input projections using PROJ4 syntax

proj.string.latlon <- CRS("+proj=longlat +datum=WGS84")
proj4string(s.america) <- proj.string.latlon
proj4string(southern) <- proj.string.latlon
proj4string(greater) <- proj.string.latlon
proj4string(greater.pts) <- proj.string.latlon

# Reproject the input polygon data sets to Albers Equal Area, with
# parameters set for South America

proj.string.alb <- CRS("+proj=aea +lat_1=-5 +lat_2=-42 +lat_0=-32
                               +lon_0=-60 +x_0=0 +y_0=0 +datum=WGS84 +units=km +no_defs")
s.america.alb <- spTransform(s.america, CRS=proj.string.alb)
southern.alb <- spTransform(southern, CRS=proj.string.alb)
greater.alb <- spTransform(greater, CRS=proj.string.alb)
greater.pts.alb <- spTransform(greater.pts, CRS=proj.string.alb)

# Convert the spatial objects to PBSmapping objects (PolySet for
# polygons, EventData for points); for polygons, combinePolys is used to
# aggregate interior polygons into a single polygon

s.america.ps <- combinePolys(SpatialPolygons2PolySet(s.america.alb))
southern.ps <- combinePolys(SpatialPolygons2PolySet(southern.alb))
greater.ps <- combinePolys(SpatialPolygons2PolySet(greater.alb))
greater.sightings <- as.EventData(data.frame(
                           EID = seq_len(nrow(greater.pts.alb)),
                             X = coordinates(greater.pts.alb)[, 1],
                             Y = coordinates(greater.pts.alb)[, 2]))

# QUESTION 1: What is the area of each range?

area.south.america <- sum(calcArea(s.america.ps, rollup=1)$area)
area.southern <- sum(calcArea(southern.ps, rollup=1)$area)
area.greater <- sum(calcArea(greater.ps, rollup=1)$area)

# Note that polygon area can also be obtained from the SPDF object
all.equal(area.southern, sum(sapply(slot(southern.alb, "polygons"),slot, "area")))
             
# QUESTION 2: What is the total area of South America covered by at
# least one of the two species ranges? [UNION]

union.armadillo <- combinePolys(joinPolys(southern.ps, greater.ps,
                                            "UNION", maxVert = 1.0e+06))
area.union.armadillo <- sum(calcArea(union.armadillo, rollup=1)$area)

# QUESTION 3: What is the total area of South America covered by *both*
# of the Armadillo ranges? [INTERSECTION]

int.armadillo <- combinePolys(joinPolys(southern.ps, greater.ps,
                                        "INT", maxVert = 1.0e+06))
area.int.armadillo <- sum(calcArea(int.armadillo, rollup=1)$area)

# QUESTION 4: Which the Greater Armadillo sightings fall inside the
# Armadillo species range, and which fall outside? [POINT-IN-POLYGON]

matches <- findPolys(greater.sightings, union.armadillo)$EID
greater.sightings.in <- subset(greater.sightings, EID %in% matches)
greater.sightings.out <- subset(greater.sightings, !EID %in% matches)

# PLOTS - each within a new graphics window (call dev.new() to get one)
# Note: On maps displayed, distance units are eastings and northings (in
# kilometers) from the origins defined for the Albers projection.

# Plot all of the separate ranges

plotPolys(s.america.ps, proj = TRUE, col="wheat1",xlab="Easting",ylab="Northing")
addPolys(southern.ps, col="red", density=10, angle=135)
addPolys(greater.ps, col="blue" , density=10, angle=45)
addPoints(greater.sightings, col="green", pch=19)   

# Plot Armadillo range union and intersection

dev.new()
plotPolys(s.america.ps, proj = TRUE, col="wheat1",xlab="Easting",ylab="Northing")
addPolys(union.armadillo, col="orange")
addPolys(int.armadillo, border="blue", lwd=2)
   
# Plot species sightings against the range union, using red for points outside the range

dev.new()
plotPolys(s.america.ps, proj = TRUE, col="wheat1",xlab="Easting",ylab="Northing")
addPolys(union.armadillo, col="orange")
addPoints(greater.sightings.in, pch=19)
addPoints(greater.sightings.out, col="red", pch=19)
   
# Reporting areas / counts (downloadable version only)

message(sprintf("South America area                           : %.2f km**2.",
             area.south.america))
message(sprintf("Southern Armadillo Range area (Red Lines)    : %.3f km**2.",
            area.southern))
message(sprintf("Greater Armadillo Range area (Blue Lines)    : %.3f km**2.",
             area.greater))
message(sprintf("Armadillo Range UNION area (Orange Area)     : %.3f km**2.",
             area.union.armadillo))
message(sprintf("Armadillo Range INTERSECT area (Blue Outline): %.3f km**2.",
            area.int.armadillo))
message(sprintf("Greater Armadillo sightings IN Range (Black): %.0f/OUT of Range (Red): %.0f",
                       length(greater.sightings.in$X),length(greater.sightings.out$X)))

Learning More:

The GIS Primer: The Nature of Spatial Information

Overview: Polygon Overlay Analysis

Overview: Point-In-Polygon (PIP) Analysis

scicomp: