This case demonstrates, step-by-step, how to create, test, and distribute a complete R package that integrates C language source code into the R programmimg environment.
Download examples for this use case:
Click here to the complete R package source and directory structure for this Use Case (Windows and Linux)
Download the installable R package demonstrated here: Linux OR Windows XP platforms.
Getting Started:
Many scientists using R software would like to integrate often-used algorithms written in high-level programming languages (e.g., C or Fortran), into the R environment so that they can integrate them with the R environment's data management, computational, and graphical capabilities .
These R users have two options: The first is to re-code the algorithm as an R script (implemented as an R function), and call the function from the R command line. However, substantial time and effort may be required to re-code the algorithm in R, and thenvalidate the answers given by the new R script. The second option is to encapsulate the high-level language 'legacy' code within a new R package, and then add the new package to their R programming environment.
Assume that you have C function that you wish to call from an R script. Specifically, you wish to pass data from R into the C function, and have the function return the results to the R environment through the function call. For example:
> Results = ComputeFunctionR(ScalarParam, InputVector)
Where:
ScalarParam: A single input parameter
InputVector: A one-dimensional array
Results: Data generated by the C function, and returned to the R environment.
This function passes the two argument list parameters into the C function, receives the results matrix returned, and passes it back through the R command line, storing the returned matrix in Results. Note that Results could be a scalar (single) number, a vector/matrix, or a character string.
The best way to do this is to extend your R Programming Environment by building and installing an R Package. Once developed in accordance with the specifications built into the R software architecture, this package can be installed on any computer's R environment.Two R programming environment features support this: 1) An Applications Programming Interface (API) that enables R functions to call functions in C / C++ and Fortran 77; 2) software development tools for creating modular R packages for multiple computing platforms that incorporate the high-level language components.
This Use Case creates an R function that encapsulates a C-languge function, and then demonstrates how to produce installable R packages for the Linux and Microsoft Windows XP computing platforms.
Two R documents contain information that will help you expand this Use Case into an implementation of your own high-level language function as an R package:
Writing R Extensions describes in detail the development of installable R packages; in addition, it describes the R language interface to other functions written in C / C++ and Fortran.
R Installation and Administration describes the compliation and installation of the R environment, R help documentation, and R packages on different computing platforms. It includes detailed instructions for configuring the R-compatible software development environments required to construct R Packages for both Linux and Microsoft Windows platforms.
Creating a New R Package: Four Steps
- Create the 'R-callable' version of your high-level language function by creating a new argument list for the function.
- Prepare the required components of an installable R package: /src and /R folders containing the C code and R script developed in Steps 1 and 2),
a /man folder containing a standard 'help' file for each R function in the package, and DESCRIPTION and NAMESPACE files used by the
R package-building software. - Configure your computer with the appropriate R package software development tools.
- Construct the R package using the R commands R CMD check and R CMD build (R CMD install --build on Windows XP platforms)
Once the package is constructed, you can install and test it within R on compatible computer platforms using the R install.packages() function.
Tutorial: Construct and Use an R package interface to a C function
Here are the steps in the R package develpment process, using a C language function and the R development tools under the UBUNTU Linux environment. The process steps should be similar on most UNIX environments. In order to create this package on your own Linux/Unix system, you will need to install the R programming environment, a software development toolkit, and the toolkit supporting documentation.
On Linux systems, The GNU C++ compiler (GCC) and its supporting software development tools are the correct choice for constructing R packages.
To create Windows-compatible R packages under the Windows XP environment, you will need to install the appropriate software development tools. Consult Writing R Extensions; which recommends the Rtools development package. To create Windows-compatible R packages under the Linux environment, consider installing the Linux version of the Rtools toolkit provided here.
Here are the four steps:
1) Create high-level language function and interface to the R Programming Environment
The C function and its make file
Following is the C function CalcMatrixAvg() that we wish to call from the R command line:
// // this is the routine that we will call from R program // void CalcMatrixAvg(int *iNRow, int *iNRowLen, int *iNCol, int *iNColLen, double *dAverage, int *idAvgLen, const double *InMat, int *iInMatLen, double *RetOutMat) { double dSum = 0.0, dAvg = 0.0; int iCtr = 0,jCtr = 0; double *OutMat = NULL; // // dynamically allocate a local results matrix, making it // the correct size for this specific call. This is why // we use a one-dimensional vector, and two-dimensional // pointer notation to simulate a two-dimensional matrix. // OutMat = calloc((*iNRow) * (*iNCol),sizeof (double)); // // initialize the local matrix to zero. // for (iCtr = 0; iCtr < *iInMatLen; iCtr++) *(OutMat + iCtr) = (double)0.0; for (iCtr = 0; iCtr < *iNRow; iCtr++) { for (jCtr = 0; jCtr < *iNCol; jCtr++) { printf("%6.2lf ",*(InMat + ((iCtr * *iNRow) + jCtr)) ); dSum = dSum + *(InMat + ((iCtr * *iNCol) + jCtr)); } printf("\n"); } *dAverage = dSum / (*iInMatLen); // // replace each element with the difference between the element and its average. // for (iCtr = 0; iCtr < *iNRow; iCtr++) { for (jCtr = 0; jCtr < *iNCol; jCtr++) { *((OutMat) + ((iCtr * *iNRow) + jCtr)) = *(InMat + ((iCtr * *iNRow) + jCtr)) - *dAverage; } } for (iCtr = 0; iCtr < *iNRow; iCtr++) { for (jCtr = 0; jCtr < *iNCol; jCtr++) { printf("%6.2lf ",*(OutMat + ((iCtr * *iNRow) + jCtr)) ); } printf("\n"); } // // Copy the output matrix from the local, dynamic results vector // to the vector located in the calling (R) function memory space. // for (iCtr = 0; iCtr < *iInMatLen; iCtr++) *(RetOutMat + iCtr) = *(OutMat + iCtr); // // de-allocate the local results vector. // free(OutMat); // }
The last argument in the C function, RetOutMat, transfers data back to the R environment through the .C() interface (see Step 3).
Here is the makefile that produces the R shared library, compatible with the GNU cc compiler; the makefile is included in the package source for this use case.
# Makefile for building the C language shared library for the CalcMatrixAvg demonstration package. C = gcc OPTS = -c -fPIC LOADER = gcc OBJECTS = CalcMatrixAvg.o CalcMatrixAvg.so: $(OBJECTS) R CMD SHLIB -o CalcMatrixAvg.so $(OBJECTS) .c.o: ; $(C) $(OPTS) -c $< clean: -rm *.o *.so
The R function interface to the CalcMatrixAverage() function
The .C() function is R interface 'wrapper' to external C functions. Here is the .C() function interface for this example:
Lets focus on the C and R function call interfaces:
C function prototype |
R language .C() function interface |
|
// // This is the routine that we will call in the R program // void CalcMatrixAvg(int *iNRow, int *iNRowLen, int *iNCol, int *iNColLen, double *dAverage, int *idAvgLen, const double *InMat, int *iInMatLen, double *RetOutMat) |
# # 'Wrapping' the C function call in the R language .C interface function # RetVec2 = .C("CalcMatrixAvg", as.integer(iMatRow), as.integer((intLen)), as.integer(iMatCol), as.integer((intLen)), as.numeric(dAvg), as.integer((numericLen)), as.numeric(iInMat), as.integer(iInMatLen), iRetVec = numeric(iInMatLen))$iRetVec |
Note that the .C() function statement requires that the length of each original C function argument be passed immediately following the argument.
Note three characteristics of the C function prototype:
- All of the C function arguments are passed by reference, using pointers.
- The addition to the C function of parameters *iNRowLen, *iNColLen, *idAvgLen, and *iInMatLen.
- The last C function argument, double *RetOutMat, passes information (in this case, a vector of floating point numbers)
back and into the calling R function.
Here is the complete R script that calls the C routine.
CalcMatrixAvgR <- function(iMatRow,iMatCol,dAvg,iInMat,iInMatLen) { # # Wrapper/interface to CalcMatrixAvg C function # that computes the average of a two-dimensional matrix # passed in as a one-dimensional vector, then creates # a new matrix containing as entries the difference # between the input cell value and the matrix average. # Simple routine for use case demonstration. # # Programmer: Rick Reeves, NCEAS Scientific Programmer # August 1, 2007 # # Arguments: # iMatRow : Number of Rows, incoming matrix # iMatCol : Number of Cols, incoming matrix # dAvg : Average of the incoming matrix # iInMat : Input Matrix # iInMatLen : Number of cells (rows * cols) in the input matrix. # # note: to conform to the rules of the R .C 'C' language function # API, we will pass the length of each parameter into the C call, # immediately following each parameter. # iRetVec = vector(mode="numeric",length = iInMatLen) # numericLen = 1 intLen = 1 # # Set up a call to the C routine CalcMatrixAvg() using # the R .C() interface function. The C routine dynamically # allocates a local results vector sized to match the # dimensions (iMatRow, iMatCol) specified here. # RetVec2 = .C("CalcMatrixAvg", as.integer(iMatRow), as.integer((intLen)), as.integer(iMatCol), as.integer((intLen)), as.numeric(dAvg), as.integer((numericLen)), as.numeric(iInMat), as.integer(iInMatLen), iRetVec = numeric(iInMatLen))$iRetVec # # the answer is contained in vector RetVec2. # copy the answer back into arg list parameter iRetVec # iRetVec[1:iInMatLen] <- RetVec2[1:iInMatLen] print("back from CalcMatAvg C function call. Hit key....") return(iRetVec) }
Here is a short test R function that calls CalcMatAverageR():
# Test of CalcMatAvg # # global vector dimension parameters # these must match the maximum dimensions MAX_ROW and MAX_COLUMN # in the trialswapR.c file. # iMaxMatRow = 25 iMaxMatCol = 25 # TestWrapper <- function() { # library(CalcMatrixAvg) # iOutMatLen = iMaxMatRow * iMaxMatCol # # generate vectors that we will treat as a two dimensional array, # initialize the variables to recognizable values for debugging. # iInMat = vector(mode="numeric",length = iOutMatLen) iInMat[1:iOutMatLen] = seq(1,iOutMatLen) iOutMat = vector(mode="numeric",length = iOutMatLen) iOutMat[1:iOutMatLen] = -1.99 dAverage = -999.0 # # Call the wrapper function that manages communications with the C function. # iOutMat = CalcMatrixAvgR(iMaxMatRow,iMaxMatCol,dAverage,iInMat,iOutMatLen) # print("Back from CalcMatrixAverage! hit key...") browser() # # done. # }
2) Prepare the required components for the installable R package
Chapter 1 of Writing R Extensions (Creating R packages) describes the structure of R packages, and we recommend that you read this chapter for more details regarding the file components of R packages. Note that the specific files and folders comprising R packages will vary; however, all packages must have a top level folder with the same name as the package, and a DESCRIPTION file.
The file structure for this particular package consists of a top-level folder named CalcMatrixAvg (the same as the package name); within this top folder are two files and three folders:
- man: Contains one standard 'help' file for each separate function in the package, that are displayed when the user enters help(function_name) or ?function_name at the R command line. Click here to see the man page file for the CalcMatrixAvg function in the CalcMatrixAvg package.
- R: The R script containing the R 'wrapper' function and any other functions added by the package.
- src: The C source code file(s) and makefile.
- DESCRIPTION: Contains basic package information in a standardized format.
- NAMEPACE: Contents of this file specify which package variables should be exported to make them available to package users,
and which should be imported from other packages.
The R package documentation and description files
Once the R and C package components are complete, create the NAMESPACE and DESCRIPTION files.
Here is the DESCRIPTION file: for the package described here:
Package: CalcMatrixAvg Version: 0.0 Date: 2007-08-06 Title: Sample R package demonstrates the R / C Language interface Author: Rick Reeves (reeves [at] nceas.ucsb.edu) Maintainer: Rick Reeves (reeves [at] nceas.ucsb.edu) Depends: R (>= 2.0.1) Description: CalcMatrixAverage passes a two-dimensional matrix into a C function via the .C() interface function, where the average of the matrix is calculated. This case demonstrates the details of the R / C function interface. License: GPL (version 2 or later)
Here is the NAMESPACE file:
useDynLib(CalcMatrixAvg) export(TestWrapper) export(CalcMatAvgR)
The useDynLib statement specifies that the C function CalcMatrixAvg() is a dynamically loaded shared object.
The export statements exposes the R functions TestWrapper and CalcMatAvgR functions to the R command interface.
3) Configure your computer with the appropriate R software development tools
In the process of preparing this use case, we configured both (Ubuntu) Linux and Windows XP platforms for constructing R packages:
Ubuntu Linux package development: We used the Synaptic Package Manager , part of Ubuntu Linux Systems Administration utilites,to install the GCC software develpment tools.
Windows XP pacakge development: On the Windows XP platform, we downloaded and installed the Rtools development tools, distributed as a standard Windows installation package.
4) Construct, install, and test the new R package
Use the R commands R CMD check, R CMD build, and install.packages() to create, install, and test the CalcMatrixAvg R package.
The R command R CMD REMOVE removes an installed R package from the active R environment.
R CMD commands are executed from the UNIX command line on systems that have the R software installed.
- R CMD REMOVE Removes an installed R package from the local R configuration.
- R CMD check validates all of the components within the R package for internal consistency with the R system.
- R CMD build constructs the installable R package archive (for packages validated by R CMD check) in the form of a compressed '.gz' archive.
- NOTE: On MS Windows platforms, use the command: R CMD INSTALL --build to construct and install a Windows-compatible compressed '.zip' installation archive.
- The install.packages() R function installs the package on the user's local R environment.Once installed, the package
functions are available to future R sessions on the local machine once the package is loaded using the library() function.
Here are the required command line and R commands for the UNIX operating system:
(within the MS Windows environment, open an MS/DOS Command Window and enter these commands, referring to the above NOTE:)
% cd (to the parent directory of the 'CalcMatrixAvg' folder containing the R package components); % R CMD REMOVE CalcMatrixAvg % R CMD check CalcMatrixAvg % R CMD build CalcMatrixAvg % R > install.packages(repos=NULL,"CalcMatrixAvg_1_0.tar.gz") > library(CalcMatrixAvg) > debug(TestWrapper) > TestWrapper() > Function is operating in browse() mode: use the 'n' command to step through the function, evaluate varables, etc. > Enter 'Q' to exit browse mode.
Learning More:
C language programming techniques : Creating dynamic two-dimensional arrays
Interfacing high-level programming language routines with R: Using External Compilers with R
Building Microsoft Windows Versions of R and R packages under Intel Linux: A Package Developer's Tool
Point of Contact for this Use Case: reeves [at] nceas.ucsb.edu (Rick Reeves), NCEAS Scientific Programmer
This Use Case was compiled March, 2008.