Read and write ESRI Shapefiles with R

The ESRI Shapefile is a widely used file format for storing vector-based geopatial data (i.e., points, lines, and polygons). This example demonstrates use of several different R packages that provide functions for reading and/or writing shapefiles. The first two approaches shown here use packages that depend on the sp package, which defines a set of spatial classes that have become the de facto standard spatial data types in R. The third approach creates R data objects that are less generally useful, but necessary when calling other analytical functions defined in the package.

Overview

Using rgdal:

The rgdal package provides an interface to the GDAL/OGR library, which powers the data import/export capabilities of many geospatially aware software applications. The package includes functions readOGR and writeOGR for reading and writing not only shapefiles, but numerous other vector-based file formats. In addition, the ogrInfo function is useful for retrieving details about the file without reading in the full dataset. These functions are all capable of automatically reading and writing projection information if available. Provided you are able to install the separate GDAL/OGR library - which may be tricky on some systems - it is worth learning how to use this package if you frequently work with shapefiles and/or other spatial data formats, including not just vector formats but raster formats as well.

Using maptools:

The maptools package includes a number of useful functions for reading, writing, converting, and otherwise handling spatial objects in R. The general functions for reading and writing shapefiles are readShapeSpatial and writeSpatialShape, respectively. In both cases, the function automatically determines whether the shapefile (or R object) contains points, lines, or polygons, and will then read in (or write out) the data using a more specialized function of the particular type. These specialized functions, such as readShapeLines for reading lines, can also be called directly. One advantage of doing so is that it will complain if you inadvertently use it on the wrong data type, helping you to catch errors sooner. Unlike their rgdal counterparts, the maptools functions neither read nor write projection information, leaving it up to you to manage these details manually.

Using PBSmapping:

The PBSmapping package can also read (but not write) shapefiles. However, note that PBSmapping uses its own custom-defined spatial data types that are optimized to work with various specialized package functions. This makes it harder to take advantage of functions defined in the numerous packages that are built on sp, although the maptools package does provide functions that convert between the different formats.

Map of Points, Lines, and Polygons

R code

rgdal
library(rgdal)
 
# for shapefiles, first argument of the read/write/info functions is the
# directory location, and the second is the file name without suffix
 
# optionally report shapefile details
ogrInfo(".", "nw-rivers")
# Source: ".", layer: "nw-rivers"
# Driver: ESRI Shapefile number of rows 12 
# Feature type: wkbLineString with 2 dimensions
# +proj=longlat +datum=WGS84 +no_defs  
# Number of fields: 2 
#     name type length typeName
#     1   NAME    4     80   String
#     2 SYSTEM    4     80   String
 
# read in shapefiles
centroids.rg <- readOGR(".", "nw-centroids")
rivers.rg <- readOGR(".", "nw-rivers")
counties.rg <- readOGR(".", "nw-counties")
 
# note that readOGR will read the .prj file if it exists
print(proj4string(counties.rg))
# [1] " +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
 
# generate a simple map showing all three layers
plot(counties.rg, axes=TRUE, border="gray")
points(centroids.rg, pch=20, cex=0.8)
lines(rivers.rg, col="blue", lwd=2.0)
 
# write out a new shapefile (including .prj component)
writeOGR(counties.rg, ".", "counties-rgdal", driver="ESRI Shapefile")
maptools
library(maptools)
 
# read in shapefiles; here we use the specialized readShape* functions,
# but readShapeSpatial would produce identical output in all three cases
centroids.mp <- readShapePoints("nw-centroids")
rivers.mp <- readShapeLines("nw-rivers")
counties.mp <- readShapePoly("nw-counties")
 
# note that readShape* does _not_ read the shapefile's .prj file
print(proj4string(counties.mp))
## [1] NA
 
# specifying projection information is not strictly necessary for
# plotting, but does yield nicer default axis labels and aspect ratio in
# the case of geographic data
proj4string(counties.mp) <- "+proj=longlat +datum=WGS84"
 
# generate a simple map showing all three layers
plot(counties.mp, axes=TRUE, border="gray")
points(centroids.mp, pch=20, cex=0.8)
lines(rivers.mp, col="blue", lwd=2.0)
 
# write out a new shapefile (but without .prj); the more general
# writeSpatialShape would produce equivalent output
writePolyShape(counties.mp, "counties-maptools")
PBSmapping
library(PBSmapping)
 
# read in shapefiles
centroids.pb <- importShapefile("nw-centroids")
rivers.pb <- importShapefile("nw-rivers")
counties.pb <- importShapefile("nw-counties")
 
# note that importShapefile reads the .prj file if it exists, but it
# does not adopt the proj4 format used by the above approaches
proj.abbr <- attr(counties.pb, "projection") # abbreviated projection info
proj.full <- attr(counties.pb, "prj") # full projection info
print(proj.abbr)
# [1] "LL"
 
# generate map using PBSmapping plotting functions
plotPolys(counties.pb, projection=proj.abbr, border="gray",
    xlab="Longitude", ylab="Latitude")
addPoints(centroids.pb, pch=20, cex=0.8)
addLines(rivers.pb, col="blue", lwd=2.0)

Download R script and data

Learning More:

scicomp: