NCEAS Scientific Computing: Solutions Center Use Case

Integrating C Language Functions into R: Create an R Package

This case demonstrates, step-by-step, how to create, test, and distribute a complete R package that integrates C language source code into the R programmimg environment.

Getting Started:

Many scientists using R software would like to integrate often-used algorithms written in high-level programming languages (e.g., C or Fortran), into the R environment so that they can integrate them with the R environment's data management, computational, and graphical capabilities .

These R users have two options: The first is to re-code the algorithm as an R script (implemented as an R function), and call the function from the R command line. However, substantial time and effort may be required to re-code the algorithm in R, and thenvalidate the answers given by the new R script. The second option is to encapsulate the high-level language 'legacy' code within a new R package, and then add the new package to their R programming environment.

Assume that you have C function that you wish to call from an R script. Specifically, you wish to pass data from R into the C function, and have the function return the results to the R environment through the function call. For example:

> Results = ComputeFunctionR(ScalarParam, InputVector)

Where:

ScalarParam: A single input parameter
InputVector: A one-dimensional array
Results: Data generated by the C function, and returned to the R environment.

This function passes the two argument list parameters into the C function, receives the results matrix returned, and passes it back through the R command line, storing the returned matrix in Results. Note that Results could be a scalar (single) number, a vector/matrix, or a character string.

The best way to do this is to extend your R Programming Environment by building and installing an R Package. Once developed in accordance with the specifications built into the R software architecture, this package can be installed on any computer's R environment.Two R programming environment features support this: 1) An Applications Programming Interface (API) that enables R functions to call functions in C / C++ and Fortran 77; 2) software development tools for creating modular R packages for multiple computing platforms that incorporate the high-level language components.

This Use Case creates an R function that encapsulates a C-languge function, and then demonstrates how to produce installable R packages for the Linux and Microsoft Windows XP computing platforms.

Two R documents contain information that will help you expand this Use Case into an implementation of your own high-level language function as an R package:

Writing R Extensions describes in detail the development of installable R packages; in addition, it describes the R language interface to other functions written in C / C++ and Fortran.

R Installation and Administration describes the compliation and installation of the R environment, R help documentation, and R packages on different computing platforms. It includes detailed instructions for configuring the R-compatible software development environments required to construct R Packages for both Linux and Microsoft Windows platforms.

Creating a New R Package: Four Steps

  1. Create the 'R-callable' version of your high-level language function by creating a new argument list for the function.

  2. Prepare the required components of an installable R package: /src and /R folders containing the C code and R script developed in Steps 1 and 2),
    a /man folder containing a standard 'help' file for each R function in the package, and DESCRIPTION and NAMESPACE files used by the
    R package-building software.

  3. Configure your computer with the appropriate R package software development tools.

  4. Construct the R package using the R commands R CMD check and R CMD build (R CMD install --build on Windows XP platforms)

Once the package is constructed, you can install and test it within R on compatible computer platforms using the R install.packages() function.

Downloadable examples from this use case:

Click here to the complete R package source and directory structure for this Use Case (Windows and Linux)

Download the installable R package demonstrated here: Linux OR Windows XP platforms.

Tutorial: Construct and Use an R package interface to a C function

Here are the steps in the R package develpment process, using a C language function and the R development tools under the UBUNTU Linux environment. The process steps should be similar on most UNIX environments. In order to create this package on your own Linux/Unix system, you will need to install the R programming environment, a software development toolkit, and the toolkit supporting documentation.

On Linux systems, The GNU C++ compiler (GCC) and its supporting software development tools are the correct choice for constructing R packages.

To create Windows-compatible R packages under the Windows XP environment, you will need to install the appropriate software development tools. Consult Writing R Extensions; which recommends the Rtools development package. To create Windows-compatible R packages under the Linux environment, consider installing the Linux version of the Rtools toolkit provided here.

Here are the four steps:

1) Create the high-level language function and the interface to the R Programming Environment

The C function and its make file

Following is the C function CalcMatrixAvg() that we wish to call from the R command line:

//
// this is the routine that we will call from R program
//
void CalcMatrixAvg(int *iNRow, 
                   int *iNRowLen,
                   int *iNCol,
                   int *iNColLen,
                   double *dAverage,
                   int *idAvgLen,
                   const double *InMat,
                   int *iInMatLen,
                   double *RetOutMat)
{
   double dSum = 0.0,
          dAvg = 0.0;
   int iCtr = 0,jCtr = 0;

   double *OutMat = NULL;
//
// dynamically allocate a local results matrix, making it
// the correct size for this specific call. This is why
// we use a one-dimensional vector, and two-dimensional
// pointer notation to simulate a two-dimensional matrix.
//
   OutMat = calloc((*iNRow) * (*iNCol),sizeof (double));
//
// initialize the local matrix to zero.
// 
   for (iCtr = 0; iCtr < *iInMatLen; iCtr++)
      *(OutMat + iCtr) = (double)0.0;

   for (iCtr = 0; iCtr < *iNRow; iCtr++)
   {
      for (jCtr = 0; jCtr < *iNCol; jCtr++)
      {
         printf("%6.2lf ",*(InMat + ((iCtr * *iNRow) + jCtr)) );
         dSum = dSum + *(InMat + ((iCtr * *iNCol) + jCtr));
      }
      printf("\n");
   } 
   *dAverage = dSum / (*iInMatLen);
//
// replace each element with the difference between the element and its average.
//
   for (iCtr = 0; iCtr < *iNRow; iCtr++)
   {
      for (jCtr = 0; jCtr < *iNCol; jCtr++)
      {
         *((OutMat) + ((iCtr * *iNRow) + jCtr)) = *(InMat + ((iCtr * *iNRow) + jCtr)) - *dAverage;
      }
   } 
   for (iCtr = 0; iCtr < *iNRow; iCtr++)
   {
      for (jCtr = 0; jCtr < *iNCol; jCtr++)
      {
         printf("%6.2lf ",*(OutMat + ((iCtr * *iNRow) + jCtr)) );
      }
      printf("\n");
   } 
// 
// Copy the output matrix from the local, dynamic  results vector
// to the vector located in the calling (R) function memory space.
//
   for (iCtr = 0; iCtr < *iInMatLen; iCtr++)
      *(RetOutMat + iCtr) = *(OutMat + iCtr);
//
// de-allocate the local results vector.
//
   free(OutMat);
//
}

The last argument in the C function, RetOutMat, transfers data back to the R environment through the .C() interface (see Step 3).

Here is the makefile that produces the R shared library, compatible with the GNU cc compiler; the makefile is included in the package source for this use case.

# Makefile for building the C language shared library for the CalcMatrixAvg demonstration package.
C = gcc
OPTS    = -c -fPIC
LOADER  = gcc

OBJECTS =   CalcMatrixAvg.o

 CalcMatrixAvg.so:  $(OBJECTS)
	R CMD SHLIB -o  CalcMatrixAvg.so $(OBJECTS)

.c.o: ; $(C) $(OPTS) -c $<

clean:
	-rm *.o *.so
The R function interface to the CalcMatrixAverage() function

The .C() function is R interface 'wrapper' to external C functions. Here is the .C() function interface for this example:

Lets focus on the C and R function call interfaces:

C function prototype
R language .C() function interface
//
// This is the routine that we will call in the R program
//
void CalcMatrixAvg(int *iNRow, 
                   int *iNRowLen,
                   int *iNCol,
                   int *iNColLen,
                   double *dAverage,
                   int *idAvgLen,
                   const double *InMat,
                   int *iInMatLen,
                   double *RetOutMat)
#
# 'Wrapping' the C function call in the R language .C interface function 
#
   RetVec2 = .C("CalcMatrixAvg",
                as.integer(iMatRow),
                as.integer((intLen)),
                as.integer(iMatCol),
                as.integer((intLen)),
                as.numeric(dAvg),
                as.integer((numericLen)),
                as.numeric(iInMat),
                as.integer(iInMatLen),
                iRetVec = numeric(iInMatLen))$iRetVec

Note that the .C() function statement requires that the length of each original C function argument be passed immediately following the argument.


Note three characteristics of the C function prototype:

  1. All of the C function arguments are passed by reference, using pointers.
  2. The addition to the C function of parameters *iNRowLen, *iNColLen, *idAvgLen, and *iInMatLen.
  3. The last C function argument, double **RetOutMat, passes information (in this case, a vector of floating point numbers)
    back and into the calling R function.

Here is the complete R script that calls the C routine.

CalcMatrixAvgR <- function(iMatRow,iMatCol,dAvg,iInMat,iInMatLen)
{
#
# Wrapper/interface to CalcMatrixAvg C function
# that computes the average of a two-dimensional matrix
# passed in as a one-dimensional vector, then creates
# a new matrix containing as entries the difference
# between the input cell value and the matrix average.
# Simple routine for use case demonstration.
#
# Programmer: Rick Reeves, NCEAS Scientific Programmer
# August 1, 2007
#
# Arguments: 
#          iMatRow   : Number of Rows, incoming matrix
#          iMatCol   : Number of Cols, incoming matrix
#          dAvg      : Average of the incoming matrix
#          iInMat    : Input Matrix
#          iInMatLen : Number of cells (rows * cols) in the input matrix.
#
# note: to conform to the rules of the R .C 'C' language function 
# API, we will pass the length of each parameter into the C call, 
# immediately following each parameter.
#
   iRetVec = vector(mode="numeric",length = iInMatLen)
#
   numericLen = 1
   intLen = 1
#
# Set up a call to the C routine CalcMatrixAvg() using 
# the R .C() interface function. The C routine dynamically
# allocates a local results vector sized to match the
# dimensions (iMatRow, iMatCol) specified here. 
#
   RetVec2 = .C("CalcMatrixAvg",
                as.integer(iMatRow),
                as.integer((intLen)),
                as.integer(iMatCol),
                as.integer((intLen)),
                as.numeric(dAvg),
                as.integer((numericLen)),
                as.numeric(iInMat),
                as.integer(iInMatLen),
                iRetVec = numeric(iInMatLen))$iRetVec
#
# the answer is contained in vector RetVec2. 
# copy the answer back into arg list parameter iRetVec
#
  iRetVec[1:iInMatLen] <- RetVec2[1:iInMatLen]
  print("back from CalcMatAvg C function call. Hit key....")
  return(iRetVec)
}

Here is a short test R function that calls CalcMatAverageR():

# Test of CalcMatAvg
#
# global vector dimension parameters
# these must match the maximum dimensions MAX_ROW and MAX_COLUMN 
# in the trialswapR.c file.
#
   iMaxMatRow = 25
   iMaxMatCol = 25
#
TestWrapper <- function()
{
#
   library(CalcMatrixAvg)
#
   iOutMatLen = iMaxMatRow * iMaxMatCol
#
# generate vectors that we will treat as a two dimensional array, 
# initialize the variables to recognizable values for debugging.
#
   iInMat = vector(mode="numeric",length = iOutMatLen)
   iInMat[1:iOutMatLen] = seq(1,iOutMatLen)
   iOutMat = vector(mode="numeric",length = iOutMatLen)
   iOutMat[1:iOutMatLen] = -1.99
   dAverage = -999.0
#
# Call the wrapper function that manages communications with the C function.
#
   iOutMat = CalcMatrixAvgR(iMaxMatRow,iMaxMatCol,dAverage,iInMat,iOutMatLen)
#
   print("Back from CalcMatrixAverage! hit key...")
   browser()
#
# done. 
#
}

2) Prepare the required components for the installable R package

Chapter 1 of Writing R Extensions (Creating R packages) describes the structure of R packages, and we recommend that you read this chapter for more details regarding the file components of R packages. Note that the specific files and folders comprising R packages will vary; however, all packages must have a top level folder with the same name as the package, and a DESCRIPTION file.

Here is a brief description of file structure comprising the CalcMatrixAvg package presented here:

 

R Package

 

The file structure for this particular package consists of a top-level folder named CalcMatrixAvg (the same as the package name); within this top folder are two files and three folders:

  • man: Contains one standard 'help' file for each separate function in the package, that are displayed when the user enters help(function_name) or ?function_name at the R command line. Click here to see the man page file for the CalcMatrixAvg function in the CalcMatrixAvg package.

  • R: The R script containing the R 'wrapper' function and any other functions added by the package.

  • src: The C source code file(s) and makefile.

  • DESCRIPTION: Contains basic package information in a standardized format.

  • NAMEPACE: Contents of this file specify which package variables should be exported to make them available to package users,
    and which should be imported from other packages.
The R package documentation and description files

Once the R and C package components are complete, create the NAMESPACE and DESCRIPTION files.
Here is the DESCRIPTION file: for the package described here:

Package: CalcMatrixAvg
Version: 0.0
Date: 2007-08-06
Title: Sample R package demonstrates the R / C Language interface
Author: Rick Reeves (reeves@nceas.ucsb.edu)
Maintainer: Rick Reeves (reeves@nceas.ucsb.edu)
Depends: R (>= 2.0.1)
Description: CalcMatrixAverage passes a two-dimensional matrix
into a C function via the .C() interface function, where
the average of the matrix is calculated.
This case demonstrates the details of the R / C function interface.
License: GPL (version 2 or later)

Here is the NAMESPACE file:

useDynLib(CalcMatrixAvg)
export(TestWrapper)
export(CalcMatAvgR)

The useDynLib statement specifies that the C function CalcMatrixAvg() is a dynamically loaded shared object.
The export statements exposes the R functions TestWrapper and CalcMatAvgR functions to the R command interface.

3) Configure your computer with the appropriate R software development tools

In the process of preparing this use case, we configured both (Ubuntu) Linux and Windows XP platforms for constructing R packages:

Ubuntu Linux package development: We used the Synaptic Package Manager , part of Ubuntu Linux Systems Administration utilites,to install the GCC software develpment tools.

Windows XP pacakge development: On the Windows XP platform, we downloaded and installed the Rtools development tools, distributed as a stanrard Windows installation package. On the Ubuntu Linux platform, we downloaded and installed the Linux version of the Rtools development tools.

4) Construct, install, and test the new R package

Use the R commands R CMD check, R CMD build, and install.packages() to create, install, and test the CalcMatrixAvg R package.
The R command R CMD REMOVE removes an installed R package from the active R environment.
R CMD commands are executed from the UNIX command line on systems that have the R software installed.

  • R CMD REMOVE Removes an installed R package from the local R configuration.

  • R CMD check validates all of the components within the R package for internal consistency with the R system.

  • R CMD build constructs the installable R package archive (for packages validated by R CMD check) in the form of a compressed '.gz' archive.

  • The install.packages() R function installs the package on the user's local R environment.Once installed, the package
    functions are available to future R sessions on the local machine once the package is loaded using the library() function.

Here are the required UNIX and R commands:

% R CMD REMOVE CalcMatrixAvg		
% R CMD check CalcMatrixAvg
% R CMD build CalcMatrixAvg
% R
> install.packages(repos=NULL,"CalcMatrixAvg_1_0.tar.gz")
> library(CalcMatrixAvg)
> debug(TestWrapper)
> TestWrapper()
> Function is operating in browse() mode: use the 'n' command to step through the function, evaluate varables, etc.
> Enter 'Q' to exit browse mode.
		

Learning More:

C language programming techniques : Creating dynamic two-dimensional arrays

Interfacing high-level programming language routines with R: Using External Compilers with R

Building Microsoft Windows Versions of R and R packages under Intel Linux: A Package Developer's Tool

Point of Contact for this Use Case: Rick Reeves, NCEAS Scientific Programmer reeves@nceas.ucsb.edu
This Use Case was compiled March, 2008.

Site Home | NCEAS Home | KNB Home