Sunday, January 11, 2009

Powering statistical applications with R

R is a software environment for statistical computing and statistical graphics. I am currently using (still exploring) R in conjunction with an Ad serving product that I am part of and R has been fabulously useful. Our main application is in Java and we needed to integrate R with Java, that is, use R from within Java. Here are few options as I evaluated them and my experience in using them.

Options
  • RServe: Use R as a remote server on TCP IP. From Rforge
  • JRI: Run R inside Java programs using JNI linkages.
  • SJava: Similar to JRI to be able to run R inside Java programs using JNI linkages. However the library is old and not maintained. I gave up midway trying to build the packages.
  • RWeb: Use R over HTTP. Rweb is a set of HTTP CGI scripts in Perl that take a set of R commands, execute them in an R instance while redirecting graphical output to a postscript file using the R postscript command. On completion of execution, it generates a HTML response with the text output of R and the postscript files converted to images. Though it was not of much use to my current purpose, the concept of using R to power nice web applications was interesting.

Using RServe

RServ sets us a listening TCP server hosting the R engine. Applications can connect to it and issue commands using client libraries available in different languages. More details at http://www.rforge.net/Rserve/doc.html.

On Unix for every connection Rserve forks off a new process and thus multiple connections can be made to process stuff in parallel. Unfortunately, on Windows, Rserve does not fork and create a new process for every connection, and hence it can not be used in parallel. There are work-arounds suggested, such as starting multiple Rserve processes on different ports, but the cleanest implementation would have been to use CreateProcess to start a new process for every connection and WSADuplicateSocket to pass on the accepted socket to the new process. I'll try and implement it when I get some spare time.
  • Install Rserve package in R
  • Create a config file as below:
    workdir E:\Libraries\R\R-2.8.0\RServWork
    pwdfile E:\Libraries\R\R-2.8.0\bin\rserv_pwd.txt
    remote disable
    auth required
    plaintext enable
    fileio disable
    port 6312
    maxinbuf 262144
    maxsendbuf 262144
  • Setting auth=required necessitates a login. A password file is required to be specified with pwdfile. The plaintext parameter tells Rserve whether the password file is plain or is encrypted. Rserv would pass these information (authorization requirement and password format) to the client upon connection. The client application should check authentication required flag to call the login method with user id and password. The login method looks at the plaintext attribute to either pass the plain password or crypt it before sending it to the server.
  • Setting remote to disabled does not prevent R from listening on all interfaces, but it rejects the connection if it is not from the same machine.
  • Build the REngine client from sources.

Using JRI


There are two ways to get the libraries to use from Java
  • Install the rJava library in R from CRAN. The libraries are available in library/rJava/jri folder in R.
  • Build JRI from source.
To build JRI from source, you need to get the R build environment setup. On Windows it was: Cygwin, MinGW and Rtools. Once the environment is set up,
  • Download JRI sources, follow instructions in JRI/README file to build JRI.jar and jri.dll
  • Run the test files in JRI/examples to verify
For simple stand alone applications, the JRI model might be an easier option. However, one should be cautious while using native libraries in Java. R is not threaded. For my purpose Rserve seemed to be the cleanest one.


Reading Material

No comments: