Integrating C Language Functions into R: Create an R Package
This
case demonstrates, step-by-step, how to create, test, and distribute a complete R package that integrates C language source code into the R programmimg environment.
Getting Started:
Many scientists using R software would like to integrate often-used algorithms written in high-level programming languages (e.g., C or Fortran), into the R environment so that they can integrate them with the R environment's data management, computational, and graphical capabilities .
These R users have two options: The first is to re-code the algorithm as an R script (implemented as an R function), and call the function from the R command line. However, substantial time and effort may be required to re-code the algorithm in R, and thenvalidate the answers given by the new R script. The second option is to encapsulate the high-level language 'legacy' code within a new R package, and then add the new package to their R programming environment.
Assume that you have C function that you wish to call from an R script. Specifically, you wish to pass data from R into the C function, and have the function return the results to the R environment through the function call. For example:
> Results = ComputeFunctionR(ScalarParam, InputVector)
Where:
ScalarParam: A single input parameter
InputVector: A one-dimensional array
Results: Data generated by the C function, and returned to the R environment.
This function passes the two argument list parameters into the C function, receives the results matrix returned, and passes it back through the R command line, storing the returned matrix in Results. Note that Results could be a scalar (single) number, a vector/matrix, or a character string.
The best way to do this is to extend your R Programming Environment by building and installing an R Package. Once developed in accordance with the specifications built into the R software architecture, this package can be installed on any computer's R environment.Two R programming environment features support this: 1) An Applications Programming Interface (API) that enables R functions to call functions in C / C++ and Fortran 77; 2) software development tools for creating modular R packages for multiple computing platforms that incorporate the high-level language components.
This Use Case creates an R function that encapsulates a C-languge function, and then demonstrates how to produce installable R packages for the Linux and Microsoft Windows XP computing platforms.
Two R documents contain information that will help you expand this Use Case into an implementation of your own high-level language function as an R package:
Writing R Extensions describes in detail the development of installable R packages; in addition, it describes the R language interface to other functions written in C / C++ and Fortran.
R Installation and Administration describes the compliation and installation of the R environment, R help documentation, and R packages on different computing platforms. It includes detailed instructions for configuring the R-compatible software development environments required to construct R Packages for both Linux and Microsoft Windows platforms.
Creating a New R Package: Four Steps
- Create the 'R-callable' version of your high-level language function by creating a new argument list for the function.
- Prepare the required components of an installable R package: /src and /R folders containing the C code and R script developed in Steps 1 and 2),
a /man folder containing a standard 'help' file for each R function in the package, and DESCRIPTION and NAMESPACE files used by the
R package-building software.
- Configure your computer with the appropriate R package software development tools.
- Construct the R package using the R commands R CMD check and R CMD build (R CMD install --build on Windows XP platforms)
Once the package is constructed, you can install and test it within R on compatible computer platforms
using the R install.packages() function.
Downloadable examples from this use case:
Click here to the complete R package source and directory structure for this Use Case (Windows and Linux)
Download the installable R package demonstrated here: Linux OR Windows XP platforms.
Tutorial: Construct and Use an R package interface to a C function
Here are the steps in the R package develpment process, using a C language function and the R development tools under the UBUNTU Linux environment. The process steps should be similar on most UNIX environments. In order to create this package on your own Linux/Unix system, you will need to install the R programming environment, a software development toolkit, and the toolkit supporting documentation.
On Linux systems, The GNU C++ compiler (GCC) and its supporting software development tools are the correct choice for constructing R packages.
To create Windows-compatible R packages under the Windows XP environment, you will need to install the appropriate software development tools. Consult Writing R Extensions; which recommends the Rtools development package. To create Windows-compatible R packages under the Linux environment, consider installing the Linux version of the Rtools toolkit provided
here.
Here are the four steps:
1) Create the high-level language function and the interface to the R Programming Environment
The C function and its make file
Following is the
C function CalcMatrixAvg() that we wish to call from the R command line:
//
// this is the routine that we will call from R program
//
void CalcMatrixAvg(int *iNRow,
int *iNRowLen,
int *iNCol,
int *iNColLen,
double *dAverage,
int *idAvgLen,
const double *InMat,
int *iInMatLen,
double *RetOutMat)
{
double dSum = 0.0,
dAvg = 0.0;
int iCtr = 0,jCtr = 0;
double *OutMat = NULL;
//
// dynamically allocate a local results matrix, making it
// the correct size for this specific call. This is why
// we use a one-dimensional vector, and two-dimensional
// pointer notation to simulate a two-dimensional matrix.
//
OutMat = calloc((*iNRow) * (*iNCol),sizeof (double));
//
// initialize the local matrix to zero.
//
for (iCtr = 0; iCtr < *iInMatLen; iCtr++)
*(OutMat + iCtr) = (double)0.0;
for (iCtr = 0; iCtr < *iNRow; iCtr++)
{
for (jCtr = 0; jCtr < *iNCol; jCtr++)
{
printf("%6.2lf ",*(InMat + ((iCtr * *iNRow) + jCtr)) );
dSum = dSum + *(InMat + ((iCtr * *iNCol) + jCtr));
}
printf("\n");
}
*dAverage = dSum / (*iInMatLen);
//
// replace each element with the difference between the element and its average.
//
for (iCtr = 0; iCtr < *iNRow; iCtr++)
{
for (jCtr = 0; jCtr < *iNCol; jCtr++)
{
*((OutMat) + ((iCtr * *iNRow) + jCtr)) = *(InMat + ((iCtr * *iNRow) + jCtr)) - *dAverage;
}
}
for (iCtr = 0; iCtr < *iNRow; iCtr++)
{
for (jCtr = 0; jCtr < *iNCol; jCtr++)
{
printf("%6.2lf ",*(OutMat + ((iCtr * *iNRow) + jCtr)) );
}
printf("\n");
}
//
// Copy the output matrix from the local, dynamic results vector
// to the vector located in the calling (R) function memory space.
//
for (iCtr = 0; iCtr < *iInMatLen; iCtr++)
*(RetOutMat + iCtr) = *(OutMat + iCtr);
//
// de-allocate the local results vector.
//
free(OutMat);
//
}
The last argument in the C function, RetOutMat, transfers data back to the R environment through the .C() interface (see Step 3).
Here is the makefile that produces the R shared library, compatible with the GNU cc compiler; the makefile is included in the package source for this use case.
# Makefile for building the C language shared library for the CalcMatrixAvg demonstration package.
C = gcc
OPTS = -c -fPIC
LOADER = gcc
OBJECTS = CalcMatrixAvg.o
CalcMatrixAvg.so: $(OBJECTS)
R CMD SHLIB -o CalcMatrixAvg.so $(OBJECTS)
.c.o: ; $(C) $(OPTS) -c $<
clean:
-rm *.o *.so
The R function interface to the CalcMatrixAverage() function
The .C() function is R interface 'wrapper' to external C functions. Here is the .C() function interface for this example:
Lets focus on the C and R function call interfaces:
C function prototype |
|
R language .C() function interface |
//
// This is the routine that we will call in the R program
//
void CalcMatrixAvg(int *iNRow,
int *iNRowLen,
int *iNCol,
int *iNColLen,
double *dAverage,
int *idAvgLen,
const double *InMat,
int *iInMatLen,
double *RetOutMat)
|
|
#
# 'Wrapping' the C function call in the R language .C interface function
#
RetVec2 = .C("CalcMatrixAvg",
as.integer(iMatRow),
as.integer((intLen)),
as.integer(iMatCol),
as.integer((intLen)),
as.numeric(dAvg),
as.integer((numericLen)),
as.numeric(iInMat),
as.integer(iInMatLen),
iRetVec = numeric(iInMatLen))$iRetVec
|
Note that the .C() function statement requires that the length of each original C function argument be passed immediately following the argument.
Note three characteristics of the C function prototype:
- All of the C function arguments are passed by reference, using pointers.
- The addition to the C function of parameters *iNRowLen, *iNColLen, *idAvgLen, and *iInMatLen.
- The last C function argument, double **RetOutMat, passes information (in this case, a vector of floating point numbers)
back and into the calling
R function.
Here is the complete R script that calls the C routine.
CalcMatrixAvgR <- function(iMatRow,iMatCol,dAvg,iInMat,iInMatLen)
{
#
# Wrapper/interface to CalcMatrixAvg C function
# that computes the average of a two-dimensional matrix
# passed in as a one-dimensional vector, then creates
# a new matrix containing as entries the difference
# between the input cell value and the matrix average.
# Simple routine for use case demonstration.
#
# Programmer: Rick Reeves, NCEAS Scientific Programmer
# August 1, 2007
#
# Arguments:
# iMatRow : Number of Rows, incoming matrix
# iMatCol : Number of Cols, incoming matrix
# dAvg : Average of the incoming matrix
# iInMat : Input Matrix
# iInMatLen : Number of cells (rows * cols) in the input matrix.
#
# note: to conform to the rules of the R .C 'C' language function
# API, we will pass the length of each parameter into the C call,
# immediately following each parameter.
#
iRetVec = vector(mode="numeric",length = iInMatLen)
#
numericLen = 1
intLen = 1
#
# Set up a call to the C routine CalcMatrixAvg() using
# the R .C() interface function. The C routine dynamically
# allocates a local results vector sized to match the
# dimensions (iMatRow, iMatCol) specified here.
#
RetVec2 = .C("CalcMatrixAvg",
as.integer(iMatRow),
as.integer((intLen)),
as.integer(iMatCol),
as.integer((intLen)),
as.numeric(dAvg),
as.integer((numericLen)),
as.numeric(iInMat),
as.integer(iInMatLen),
iRetVec = numeric(iInMatLen))$iRetVec
#
# the answer is contained in vector RetVec2.
# copy the answer back into arg list parameter iRetVec
#
iRetVec[1:iInMatLen] <- RetVec2[1:iInMatLen]
print("back from CalcMatAvg C function call. Hit key....")
return(iRetVec)
}
Here is a short test R function that calls CalcMatAverageR():
# Test of CalcMatAvg
#
# global vector dimension parameters
# these must match the maximum dimensions MAX_ROW and MAX_COLUMN
# in the trialswapR.c file.
#
iMaxMatRow = 25
iMaxMatCol = 25
#
TestWrapper <- function()
{
#
library(CalcMatrixAvg)
#
iOutMatLen = iMaxMatRow * iMaxMatCol
#
# generate vectors that we will treat as a two dimensional array,
# initialize the variables to recognizable values for debugging.
#
iInMat = vector(mode="numeric",length = iOutMatLen)
iInMat[1:iOutMatLen] = seq(1,iOutMatLen)
iOutMat = vector(mode="numeric",length = iOutMatLen)
iOutMat[1:iOutMatLen] = -1.99
dAverage = -999.0
#
# Call the wrapper function that manages communications with the C function.
#
iOutMat = CalcMatrixAvgR(iMaxMatRow,iMaxMatCol,dAverage,iInMat,iOutMatLen)
#
print("Back from CalcMatrixAverage! hit key...")
browser()
#
# done.
#
}
2) Prepare the required components for the installable R package
Chapter 1 of Writing R Extensions (Creating R packages) describes the structure of R packages, and we recommend that you read this chapter for more details regarding the file components of R packages. Note that the specific files and folders comprising R packages will vary; however, all packages must have a top level folder with the same name as the package, and a DESCRIPTION file.
Here is a brief description of file structure comprising the CalcMatrixAvg package presented here:
The file structure for this particular package consists of a top-level folder named CalcMatrixAvg (the same as the package name); within this top folder are two files and three folders:
- man: Contains one standard 'help' file for each separate function in the package, that are displayed when the user enters help(function_name) or ?function_name at the R command line. Click here to see the man page file for the CalcMatrixAvg function in the CalcMatrixAvg package.
- R: The R script containing the R 'wrapper' function and any other functions added by the package.
- src: The C source code file(s) and makefile.
- DESCRIPTION: Contains basic package information in a standardized format.
- NAMEPACE: Contents of this file specify which package variables should be exported to make them available to package users,
and which should be imported from other packages.
The R package documentation and description files
Once the R and C package components are complete, create the NAMESPACE and DESCRIPTION files.
Here is the DESCRIPTION file: for the package described here:
Package: CalcMatrixAvg
Version: 0.0
Date: 2007-08-06
Title: Sample R package demonstrates the R / C Language interface
Author: Rick Reeves (reeves@nceas.ucsb.edu)
Maintainer: Rick Reeves (reeves@nceas.ucsb.edu)
Depends: R (>= 2.0.1)
Description: CalcMatrixAverage passes a two-dimensional matrix
into a C function via the .C() interface function, where
the average of the matrix is calculated.
This case demonstrates the details of the R / C function interface.
License: GPL (version 2 or later)
Here is the NAMESPACE file:
useDynLib(CalcMatrixAvg)
export(TestWrapper)
export(CalcMatAvgR)
The useDynLib statement specifies that the C function CalcMatrixAvg() is a dynamically loaded shared object.
The export statements exposes the R functions TestWrapper
and CalcMatAvgR functions to the R command interface.
3) Configure your computer with the appropriate R software development tools
In the process of preparing this use case, we configured both (Ubuntu) Linux and Windows XP platforms for constructing R packages:
Ubuntu Linux package development: We used the Synaptic Package Manager , part of Ubuntu Linux Systems Administration utilites,to install the GCC software develpment tools.
Windows XP pacakge development: On the Windows XP platform, we downloaded and installed the Rtools development tools, distributed as a stanrard Windows installation package. On the Ubuntu Linux platform, we downloaded and installed the Linux version of the
Rtools development tools.
4) Construct, install, and test the new R package
Use the R commands R CMD check, R CMD build, and install.packages() to create, install, and test the CalcMatrixAvg R package.
The R command R CMD REMOVE removes an installed R package from the active R environment.
R CMD commands are executed from the UNIX command line on systems that have the R software installed.
- R CMD REMOVE Removes an installed R package from the local R configuration.
- R CMD check validates all of the components within the R package for internal consistency with the R system.
- R CMD build constructs the installable R package archive (for packages validated by R CMD check) in the form of a compressed '.gz' archive.
- The install.packages() R function installs the package on the user's local R environment.Once installed, the package
functions are available to future R sessions on the local machine once the package is loaded using the library() function.
Here are the required UNIX and R commands:
% R CMD REMOVE CalcMatrixAvg
% R CMD check CalcMatrixAvg
% R CMD build CalcMatrixAvg
% R
> install.packages(repos=NULL,"CalcMatrixAvg_1_0.tar.gz")
> library(CalcMatrixAvg)
> debug(TestWrapper)
> TestWrapper()
> Function is operating in browse() mode: use the 'n' command to step through the function, evaluate varables, etc.
> Enter 'Q' to exit browse mode.