Integrate C Language Functions into R With an R Package

This case demonstrates, step-by-step, how to create, test, and distribute a complete R package that integrates C language source code into the R programmimg environment.

Download examples for this use case:

Click here to the complete R package source and directory structure for this Use Case (Windows and Linux)

Download the installable R package demonstrated here: Linux OR Windows XP platforms.

Getting Started:

Many scientists using R software would like to integrate often-used algorithms written in high-level programming languages (e.g., C or Fortran), into the R environment so that they can integrate them with the R environment's data management, computational, and graphical capabilities .

These R users have two options: The first is to re-code the algorithm as an R script (implemented as an R function), and call the function from the R command line. However, substantial time and effort may be required to re-code the algorithm in R, and thenvalidate the answers given by the new R script. The second option is to encapsulate the high-level language 'legacy' code within a new R package, and then add the new package to their R programming environment.

Assume that you have C function that you wish to call from an R script. Specifically, you wish to pass data from R into the C function, and have the function return the results to the R environment through the function call. For example:

> Results = ComputeFunctionR(ScalarParam, InputVector)

Where:

ScalarParam: A single input parameter
InputVector: A one-dimensional array
Results: Data generated by the C function, and returned to the R environment.

This function passes the two argument list parameters into the C function, receives the results matrix returned, and passes it back through the R command line, storing the returned matrix in Results. Note that Results could be a scalar (single) number, a vector/matrix, or a character string.

The best way to do this is to extend your R Programming Environment by building and installing an R Package. Once developed in accordance with the specifications built into the R software architecture, this package can be installed on any computer's R environment.Two R programming environment features support this: 1) An Applications Programming Interface (API) that enables R functions to call functions in C / C++ and Fortran 77; 2) software development tools for creating modular R packages for multiple computing platforms that incorporate the high-level language components.

This Use Case creates an R function that encapsulates a C-languge function, and then demonstrates how to produce installable R packages for the Linux and Microsoft Windows XP computing platforms.

Two R documents contain information that will help you expand this Use Case into an implementation of your own high-level language function as an R package:

Writing R Extensions describes in detail the development of installable R packages; in addition, it describes the R language interface to other functions written in C / C++ and Fortran.

R Installation and Administration describes the compliation and installation of the R environment, R help documentation, and R packages on different computing platforms. It includes detailed instructions for configuring the R-compatible software development environments required to construct R Packages for both Linux and Microsoft Windows platforms.

Creating a New R Package: Four Steps

  1. Create the 'R-callable' version of your high-level language function by creating a new argument list for the function.
  2. Prepare the required components of an installable R package: /src and /R folders containing the C code and R script developed in Steps 1 and 2),
    a /man folder containing a standard 'help' file for each R function in the package, and DESCRIPTION and NAMESPACE files used by the
    R package-building software.
  3. Configure your computer with the appropriate R package software development tools.
  4. Construct the R package using the R commands R CMD check and R CMD build (R CMD install --build on Windows XP platforms)

Once the package is constructed, you can install and test it within R on compatible computer platforms using the R install.packages() function.

Tutorial: Construct and Use an R package interface to a C function

Here are the steps in the R package develpment process, using a C language function and the R development tools under the UBUNTU Linux environment. The process steps should be similar on most UNIX environments. In order to create this package on your own Linux/Unix system, you will need to install the R programming environment, a software development toolkit, and the toolkit supporting documentation.

On Linux systems, The GNU C++ compiler (GCC) and its supporting software development tools are the correct choice for constructing R packages.

To create Windows-compatible R packages under the Windows XP environment, you will need to install the appropriate software development tools. Consult Writing R Extensions; which recommends the Rtools development package. To create Windows-compatible R packages under the Linux environment, consider installing the Linux version of the Rtools toolkit provided here.

Here are the four steps:

1) Create high-level language function and interface to the R Programming Environment

The C function and its make file

Following is the C function CalcMatrixAvg() that we wish to call from the R command line:

//
// this is the routine that we will call from R program
//
void CalcMatrixAvg(int *iNRow, 
 int *iNRowLen,
 int *iNCol,
 int *iNColLen,
 double *dAverage,
 int *idAvgLen,
 const double *InMat,
 int *iInMatLen,
 double *RetOutMat)
{
 double dSum = 0.0,
 dAvg = 0.0;
 int iCtr = 0,jCtr = 0;

 double *OutMat = NULL;
//
// dynamically allocate a local results matrix, making it
// the correct size for this specific call. This is why
// we use a one-dimensional vector, and two-dimensional
// pointer notation to simulate a two-dimensional matrix.
//
 OutMat = calloc((*iNRow) * (*iNCol),sizeof (double));
//
// initialize the local matrix to zero.
// 
 for (iCtr = 0; iCtr < *iInMatLen; iCtr++)
 *(OutMat + iCtr) = (double)0.0;

 for (iCtr = 0; iCtr < *iNRow; iCtr++)
 {
 for (jCtr = 0; jCtr < *iNCol; jCtr++)
 {
 printf("%6.2lf ",*(InMat + ((iCtr * *iNRow) + jCtr)) );
 dSum = dSum + *(InMat + ((iCtr * *iNCol) + jCtr));
 }
 printf("\n");
 } 
 *dAverage = dSum / (*iInMatLen);
//
// replace each element with the difference between the element and its average.
//
 for (iCtr = 0; iCtr < *iNRow; iCtr++)
 {
 for (jCtr = 0; jCtr < *iNCol; jCtr++)
 {
 *((OutMat) + ((iCtr * *iNRow) + jCtr)) = *(InMat + ((iCtr * *iNRow) + jCtr)) - *dAverage;
 }
 } 
 for (iCtr = 0; iCtr < *iNRow; iCtr++)
 {
 for (jCtr = 0; jCtr < *iNCol; jCtr++)
 {
 printf("%6.2lf ",*(OutMat + ((iCtr * *iNRow) + jCtr)) );
 }
 printf("\n");
 } 
// 
// Copy the output matrix from the local, dynamic results vector
// to the vector located in the calling (R) function memory space.
//
 for (iCtr = 0; iCtr < *iInMatLen; iCtr++)
 *(RetOutMat + iCtr) = *(OutMat + iCtr);
//
// de-allocate the local results vector.
//
 free(OutMat);
//
}

The last argument in the C function, RetOutMat, transfers data back to the R environment through the .C() interface (see Step 3).

Here is the makefile that produces the R shared library, compatible with the GNU cc compiler; the makefile is included in the package source for this use case.

# Makefile for building the C language shared library for the CalcMatrixAvg demonstration package.
C = gcc
OPTS = -c -fPIC
LOADER = gcc

OBJECTS = CalcMatrixAvg.o

 CalcMatrixAvg.so: $(OBJECTS)
	R CMD SHLIB -o CalcMatrixAvg.so $(OBJECTS)

.c.o: ; $(C) $(OPTS) -c $<

clean:
	-rm *.o *.so
The R function interface to the CalcMatrixAverage() function

The .C() function is R interface 'wrapper' to external C functions. Here is the .C() function interface for this example:

Lets focus on the C and R function call interfaces:

C function prototype
 
R language .C() function interface
//
// This is the routine that we will call in the R program
//
void CalcMatrixAvg(int *iNRow, 
 int *iNRowLen,
 int *iNCol,
 int *iNColLen,
 double *dAverage,
 int *idAvgLen,
 const double *InMat,
 int *iInMatLen,
 double *RetOutMat)
 
#
# 'Wrapping' the C function call in the R language .C interface function 
#
 RetVec2 = .C("CalcMatrixAvg",
 as.integer(iMatRow),
 as.integer((intLen)),
 as.integer(iMatCol),
 as.integer((intLen)),
 as.numeric(dAvg),
 as.integer((numericLen)),
 as.numeric(iInMat),
 as.integer(iInMatLen),
 iRetVec = numeric(iInMatLen))$iRetVec

Note that the .C() function statement requires that the length of each original C function argument be passed immediately following the argument.

Note three characteristics of the C function prototype:

  1. All of the C function arguments are passed by reference, using pointers.
  2. The addition to the C function of parameters *iNRowLen, *iNColLen, *idAvgLen, and *iInMatLen.
  3. The last C function argument, double *RetOutMat, passes information (in this case, a vector of floating point numbers)
    back and into the calling R function.

Here is the complete R script that calls the C routine.

CalcMatrixAvgR <- function(iMatRow,iMatCol,dAvg,iInMat,iInMatLen)
{
#
# Wrapper/interface to CalcMatrixAvg C function
# that computes the average of a two-dimensional matrix
# passed in as a one-dimensional vector, then creates
# a new matrix containing as entries the difference
# between the input cell value and the matrix average.
# Simple routine for use case demonstration.
#
# Programmer: Rick Reeves, NCEAS Scientific Programmer
# August 1, 2007
#
# Arguments: 
# iMatRow : Number of Rows, incoming matrix
# iMatCol : Number of Cols, incoming matrix
# dAvg : Average of the incoming matrix
# iInMat : Input Matrix
# iInMatLen : Number of cells (rows * cols) in the input matrix.
#
# note: to conform to the rules of the R .C 'C' language function 
# API, we will pass the length of each parameter into the C call, 
# immediately following each parameter.
#
 iRetVec = vector(mode="numeric",length = iInMatLen)
#
 numericLen = 1
 intLen = 1
#
# Set up a call to the C routine CalcMatrixAvg() using 
# the R .C() interface function. The C routine dynamically
# allocates a local results vector sized to match the
# dimensions (iMatRow, iMatCol) specified here. 
#
 RetVec2 = .C("CalcMatrixAvg",
 as.integer(iMatRow),
 as.integer((intLen)),
 as.integer(iMatCol),
 as.integer((intLen)),
 as.numeric(dAvg),
 as.integer((numericLen)),
 as.numeric(iInMat),
 as.integer(iInMatLen),
 iRetVec = numeric(iInMatLen))$iRetVec
#
# the answer is contained in vector RetVec2. 
# copy the answer back into arg list parameter iRetVec
#
 iRetVec[1:iInMatLen] <- RetVec2[1:iInMatLen]
 print("back from CalcMatAvg C function call. Hit key....")
 return(iRetVec)
}

Here is a short test R function that calls CalcMatAverageR():

# Test of CalcMatAvg
#
# global vector dimension parameters
# these must match the maximum dimensions MAX_ROW and MAX_COLUMN 
# in the trialswapR.c file.
#
 iMaxMatRow = 25
 iMaxMatCol = 25
#
TestWrapper <- function()
{
#
 library(CalcMatrixAvg)
#
 iOutMatLen = iMaxMatRow * iMaxMatCol
#
# generate vectors that we will treat as a two dimensional array, 
# initialize the variables to recognizable values for debugging.
#
 iInMat = vector(mode="numeric",length = iOutMatLen)
 iInMat[1:iOutMatLen] = seq(1,iOutMatLen)
 iOutMat = vector(mode="numeric",length = iOutMatLen)
 iOutMat[1:iOutMatLen] = -1.99
 dAverage = -999.0
#
# Call the wrapper function that manages communications with the C function.
#
 iOutMat = CalcMatrixAvgR(iMaxMatRow,iMaxMatCol,dAverage,iInMat,iOutMatLen)
#
 print("Back from CalcMatrixAverage! hit key...")
 browser()
#
# done. 
#
}

2) Prepare the required components for the installable R package

Chapter 1 of Writing R Extensions (Creating R packages) describes the structure of R packages, and we recommend that you read this chapter for more details regarding the file components of R packages. Note that the specific files and folders comprising R packages will vary; however, all packages must have a top level folder with the same name as the package, and a DESCRIPTION file.

The file structure for this particular package consists of a top-level folder named CalcMatrixAvg (the same as the package name); within this top folder are two files and three folders:

  • man: Contains one standard 'help' file for each separate function in the package, that are displayed when the user enters help(function_name) or ?function_name at the R command line. Click here to see the man page file for the CalcMatrixAvg function in the CalcMatrixAvg package.

  • R: The R script containing the R 'wrapper' function and any other functions added by the package.

  • src: The C source code file(s) and makefile.

  • DESCRIPTION: Contains basic package information in a standardized format.

  • NAMEPACE: Contents of this file specify which package variables should be exported to make them available to package users,
    and which should be imported from other packages.
The R package documentation and description files

Once the R and C package components are complete, create the NAMESPACE and DESCRIPTION files.
Here is the DESCRIPTION file: for the package described here:

Package: CalcMatrixAvg
Version: 0.0
Date: 2007-08-06
Title: Sample R package demonstrates the R / C Language interface
Author: Rick Reeves (reeves [at] nceas [dot] ucsb [dot] edu)
Maintainer: Rick Reeves (reeves [at] nceas [dot] ucsb [dot] edu)
Depends: R (>= 2.0.1)
Description: CalcMatrixAverage passes a two-dimensional matrix
into a C function via the .C() interface function, where
the average of the matrix is calculated.
This case demonstrates the details of the R / C function interface.
License: GPL (version 2 or later)

Here is the NAMESPACE file:

useDynLib(CalcMatrixAvg)
export(TestWrapper)
export(CalcMatAvgR)

The useDynLib statement specifies that the C function CalcMatrixAvg() is a dynamically loaded shared object.
The export statements exposes the R functions TestWrapper and CalcMatAvgR functions to the R command interface.

3) Configure your computer with the appropriate R software development tools

In the process of preparing this use case, we configured both (Ubuntu) Linux and Windows XP platforms for constructing R packages:

Ubuntu Linux package development: We used the Synaptic Package Manager , part of Ubuntu Linux Systems Administration utilites,to install the GCC software develpment tools.

Windows XP pacakge development: On the Windows XP platform, we downloaded and installed the Rtools development tools, distributed as a standard Windows installation package.

4) Construct, install, and test the new R package

Use the R commands R CMD check, R CMD build, and install.packages() to create, install, and test the CalcMatrixAvg R package.
The R command R CMD REMOVE removes an installed R package from the active R environment.
R CMD commands are executed from the UNIX command line on systems that have the R software installed.

  • R CMD REMOVE Removes an installed R package from the local R configuration.

  • R CMD check validates all of the components within the R package for internal consistency with the R system.

  • R CMD build constructs the installable R package archive (for packages validated by R CMD check) in the form of a compressed '.gz' archive.

  •      NOTE: On MS Windows platforms, use the command: R CMD INSTALL --build to construct and install a Windows-compatible compressed '.zip' installation archive.

  • The install.packages() R function installs the package on the user's local R environment.Once installed, the package
    functions are available to future R sessions on the local machine once the package is loaded using the library() function.

Here are the required command line and R commands for the UNIX operating system:
(within the MS Windows environment, open an MS/DOS Command Window and enter these commands, referring to the above NOTE:)

% cd (to the parent directory of the 'CalcMatrixAvg' folder containing the R package components); 
% R CMD REMOVE CalcMatrixAvg 
% R CMD check CalcMatrixAvg
% R CMD build CalcMatrixAvg
% R
> install.packages(repos=NULL,"CalcMatrixAvg_1_0.tar.gz")
> library(CalcMatrixAvg)
> debug(TestWrapper)
> TestWrapper()
> Function is operating in browse() mode: use the 'n' command to step through the function, evaluate varables, etc.
> Enter 'Q' to exit browse mode.

Learning More:

C language programming techniques : Creating dynamic two-dimensional arrays

Interfacing high-level programming language routines with R: Using External Compilers with R

Building Microsoft Windows Versions of R and R packages under Intel Linux: A Package Developer's Tool

Point of Contact for this Use Case: reeves [at] nceas [dot] ucsb [dot] edu (Rick Reeves), NCEAS Scientific Programmer
This Use Case was compiled March, 2008.