R6 Generator Maven Plugin: Key features

Features

The FeatureTest Java class is designed to showcase the main aspects of the R6 Generator Maven Plugin, and serves as a quick guide to Java programmers wishing to use the plugin. The source of the FeatureTest class is shown below, where the use of the Java annotations @RClass and @RMethod tag a class, and specific methods in that class for use in R. The code structure, parameters and return values or the tagged classes and methods are used to create an equivalent R6 class structure in an R library. In general Javadoc comments and tags are used to document the library, and where there are no applicable tags specific fields in the @RClass and @RMethod annotations can been used to specify needed imports, suggested R package dependencies and provide specific example code if needed.

/**
 * A test of the R6 generator templating
 * 
 * The feature test should allow mathjax in javadoc
 * 
 * $$e = mc^2$$
 * 
 * 
 * this is a details comment 
 * @author Rob Challen rob@terminological.co.uk 0000-0002-5504-7768
 * 
 */
@RClass(
        imports = {"ggplot2","readr","dplyr","tibble"},
        suggests = {"roxygen2","devtools","here","tidyverse"},
        exampleSetup = {
                "J = JavaApi$get()"
        },
        testSetup = {
                "J = JavaApi$get()"
        }
    )
public class FeatureTest {

    String message;
    static Logger log = LoggerFactory.getLogger(FeatureTest.class);
    
    /**
     * A maximum of one constructor of any signature can be used. <br>
     * 
     * If different constructors are needed then they may be used but not 
     * included in the R Api (i.e. not annotated with @RMethod.) <br>
     * 
     * Static factory methods can be used instead.
     * @param logMessage - a message which will be logged
     */
    @RMethod(examples = {
            "minExample = J$FeatureTest$new('Hello from Java constructor!')"
        })
    public FeatureTest(String logMessage) {
        log.info(logMessage);
        this.message = logMessage;
    }
    
...
    
    @RFinalize
    public void close() {
        log.info("The FeatureTest finalizer is called when the R6 object goes out of scope");
        throw new RuntimeException("Errors from the finalizer are ignored");
    }
    
    @RMethod
    public static RCharacter collider(RCharacter message1, RCharacter message2) {
        return RConverter.convert("feature test: "+message1.toString()+message2.toString());
    }
}

The packaging of this class into an R library is described elsewhere. The package name (in this case testRapi), the directory of the library (in this example ~/Git/r6-generator-maven-plugin-test/r-library/) and other metadata such as author and license details are defined in the Maven plugin configuration (in a file named pom.xml). This configuration is described in detail elsewhere. For the purposes of this we assume the Java code has been compiled, generating the testRapi R package which is ready for installation.

Installation and instantiation

The generated R package can be installed into R in more or less the same way as any other R library, depending on how it is deployed. Typical scenarios would be pushing the whole Java project to Github and installing to R from Github using devtools::install_github(), installing directly from the local filesystem, with devtools::install(), or submitting the R library sub-directory as a project to CRAN and installing from there, using install.packages().

# not run
# remove installed versions
try(detach("package:testRapi", unload = TRUE),silent = TRUE)
remove.packages("testRapi")
rm(list = ls())

Restarting R maybe also required if there was a running java VM.

# locally compiled
devtools::install("~/Git/r6-generator-docs", upgrade = "never")
# pushed to github
# devtools::install_github("terminological/r6-generator-docs", upgrade = "never")
# submitted to CRAN
# install.packages("testRapi")

The R6 api to the Java classes requires a running instance of the Java Virtual Machine and JNI bridge provided by rJava. It also requires Java classpath dependencies to be loaded and application logging to be initialised. This is all managed by a specific generated R6 class called JavaApi and creating a singleton instance of this is the first step to using the library in R. In these examples the singleton instance J is referred to as the “root” of the api, as all the functions of the API stem from it.

J = testRapi::JavaApi$get(logLevel = "WARN")
J$changeLogLevel("DEBUG")
J$.log$debug("prove the logger is working and outputting debug statements...")
J$printMessages()

## prove the logger is working and outputting debug statements...

Using the FeatureTest class above requires a creating a new instance of the class. This is done through the root of the api as follows, and the FeatureTest constructor simply logs the logMessage parameter’s value.

feat1 = J$FeatureTest$new(logMessage = "Hello world. Creating a new object")

## Hello world. Creating a new object

Predictable data type conversion

    
    /**
     * A hello world function
     * 
     * More detailed description
     * 
     * @return this java method returns a String
     */
    @RMethod(examples = {
            "minExample = J$FeatureTest$new('Hello, R World!')",
            "minExample$doHelloWorld()"
        })
    public RCharacter doHelloWorld() {
        return RConverter.convert("Hello world from Java!");
    }
        
    /**
     * Add some numbers (1).
     * 
     * The doSum function description = it adds two numerics
     * @param a the A parameter, can be NA
     * @param b the B parameter
     * @return A+B of course, NAs in inputs are converted to null in Java. This catches the resulting NPE in java idiom and returns an explicit NA. 
     * This only matters if you care about the difference between NA_real_ and NaN in R. 
     */
    @RMethod(tests = {
            "minExample = J$FeatureTest$new('Hello from Java constructor!')",
            "result = minExample$doSum(2,7.5)",
            "testthat::expect_equal(result,9.5)"
    })
    public RNumeric doSum(RNumeric a, RNumeric b) {
        try {
            return RConverter.convert(a.get()+b.get());
        } catch (NullPointerException e) {
            log.info("Java threw a NPE - could have had a NA input?");
            return RNumeric.NA;
        }
    }
    
    
    /**
     * Adds some numbers
     * 
     * Do sum 2 uses native ints rather than RNumerics
     * It should throw an error if given something that cannot be coerced to an integer. 
     * This also demonstrated the use of the `@RDefault` annotation
     * @param a the A parameter
     * @param b the B parameter
     * @return A+B of course
     */
    @RMethod
    public int doSum2(int a, @RDefault(rCode = "10") int b) {
        return a+b;
    }
    
    /**
     * Static methods are also supported. 
     * 
     * These are accessed through the
     * root of the R api, and as a functional interface
     * 
     * @param message a message
     */
    @RMethod(examples = {
            "J = JavaApi$get()",
            "J$FeatureTest$demoStatic('Ola, el mundo')",
            "demo_static('Bonjour, le monde')"
    })
    public static void demoStatic(String message) {
        log.info(message);
    }

The FeatureClass.doHelloWorld() method takes no arguments and returns a value to R. A detailed discussion of R and Java data types is to be found elsewhere but our approach has involved developing a specific set of Java datatypes that have close relationships to the native R datatypes. This enables loss-less round tripping of data from R to Java and back again, but requires the mapping of Java data types to R. This is handled by the uk.co.terminological.rjava.RConverter class which provides a range of datatype transformers, and the uk.co.terminological.rjava.types.* classes which specify Java equivalents to R data types. These are needed as R’s dynamic datatypes contain concepts which are not readily represented in the primitive Java datatypes that are transferred across the JNI. Thus some marshaling is required on both sides to ensure translation is 100% accurate, including for example, conversion of R logical vectors containing NA values, to Java List<Boolean> via JNI primitive arrays, or support for typed NA values (e.g. NA_int_ versus NA_logical_).

The doHelloWorld() function returns a character vector, The doSum() function expects 2 R numeric values and seamlessly handles both datatype coercion and NA values.

feat1$doHelloWorld()

## [1] "Hello world from Java!"

class(feat1$doHelloWorld())

## [1] "character"

feat1$doSum(3L, 4.1)

## [1] 7.1

class(feat1$doSum(3L, 4.1))

## [1] "numeric"

feat1$doSum(3.0, NA_real_)

## Java threw a NPE - could have had a NA input?

## [1] NA

class(feat1$doSum(3.0, NA_real_))

## Java threw a NPE - could have had a NA input?

## [1] "numeric"

Wrapping and unwrapping every datatype is inconvenient for the Java programmer so some single valued primitive types are supported as parameters and return types of Java functions, particularly int, char, double, boolean, and java.lang.String, but these come with constraints on use, particularly around NA values in R, and use in asynchronous code.

feat1$doSum2(3L, 4L)

## [1] 7

class(feat1$doSum2(3L, 4L))

## [1] "integer"

# casts inputs to integer
feat1$doSum2(3.0,4.0)

## [1] 7

class(feat1$doSum2(3.0,4.0))

## [1] "integer"

# fails as expects an integer
try(feat1$doSum2(3.0,4.5))

## Error in self$.api$.toJava$int(b) : not an integer

# fails as NA values are not supported by primitive types
try(feat1$doSum2(3L,NA_integer_))

## Error in self$.api$.toJava$int(b) : cant use NA as input to java int

Default values in R are demonstrated here with the @RDefault annotation which has a string of valid R code producing the value that you want as the default value when this method is called from R. Any valid R code that produces an input that can be coerced to the correct type is allowed here but string values must be double quoted and double escaped if needs be. (I.e. the R string hello..<newline>...world would be "hello...\n...world" in R so must be given as @RDefault(value="\"hello...\\n...world\"") here in an annotation).

Static Java methods are also supported. R6 does not have a concept of static methods, so to get the same look and feel as the object interface in Java, we use the root of the JavaApi as a place to hold the static methods. This enables auto-completion for static methods. In this example the static method demoStatic nothing (an R NULL is invisibly returned), but logs its input.

J$FeatureTest$demoStatic("Hello, static world, in a Java-like interface.")

## Hello, static world, in a Java-like interface.

As static methods are stateless they can also be implemented as more regular package functions, for which exactly the same functionality as the format above is made. For this to work all the static functions declared in the API must have different names. At the moment this is up to the developer to ensure this is the case, although at some point I will make a check for it. To differentiate the object style of call above from the function style more common in R packages we have converted static Java method names from camel case to snake case. Therefore the same exact same call as above in the functional style is as follows. Both functional and object oriented interfaces are generated for all static methods:

testRapi::demo_static("Hello, static world, in a more regular R-like interface.")

## Hello, static world, in a more regular R-like interface.

More complex objects

The generated API has support for the loss-less bi-directional transfer of a range of R data types into Java and back to R. Extensive tests are available elsewhere but in general support for vectors, dataframes and lists is mostly complete, including factors, but matrices and arrays are not yet implemented. Dataframes with named rows are also not yet supported. Dataframes as well as other objects can be serialised in Java and de-serialised. This serialisation has been done for the ggplot2::diamonds data set, and the resulting de-serialisation shown here. Factor levels and ordering are preserved when the factor is part of a vector or dataframe.

    
    /**
     * Consumes a data frame and logs its length
     * @param dataframe a dataframe
     */
    @RMethod
    public void doSomethingWithDataFrame(RDataframe dataframe) {
        log.info("dataframe length: "+dataframe.nrow());
    }
    
    /**
     * Creates a basic dataframe and returns it
     * @return a daatframe
     */
    @RMethod
    public RDataframe generateDataFrame() {
        RDataframe out = new RDataframe();
        for (int i=0; i<10; i++) {
            Map<String,Object> tmp = new LinkedHashMap<String,Object>();
            tmp.put("index", i);
            tmp.put("value", 10-i);
            out.addRow(tmp);
        }
        return out;
        
    }
    
    /**
     * The ggplot2::diamonds dataframe 
     * 
     * A copy serialised into java, using
     * RObject.writeRDS, saved within the jar file of the package, and exposed here
     * using RObject.readRDS. 
     * @return the ggplot2::diamonds dataframe
     * @throws IOException if the serialised data file could not be found 
     */
    @RMethod(
        examples = {
            "dplyr::glimpse( diamonds() )"
        },
        tests = {
            "testthat::expect_equal(diamonds(), ggplot2::diamonds)"
        }
    )
    public static RDataframe diamonds() throws IOException {
        InputStream is = FeatureTest.class.getResourceAsStream("/diamonds.ser");
        if(is==null) throw new IOException("Could not locate /diamonds.ser");
        return RObject.readRDS(RDataframe.class, is);
    }

The basic smoke tests of this are as follows

feat1$doSomethingWithDataFrame(ggplot2::diamonds)

## dataframe length: 53940

feat1$generateDataFrame() %>% glimpse()

## Rows: 10
## Columns: 2
## $ index <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
## $ value <int> 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

J$FeatureTest$diamonds() %>% glimpse()

## Rows: 53,940
## Columns: 10
## $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
## $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
## $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
## $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
## $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
## $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
## $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
## $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
## $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…

if (identical(J$FeatureTest$diamonds(), ggplot2::diamonds)) {
  message("PASS: round tripping ggplot2::diamonds including java serialisation and deserialisation works")
} else {
  stop("FAIL: serialised diamonds from Java should be identical to the ggplot source")
}

## PASS: round tripping ggplot2::diamonds including java serialisation and deserialisation works

Objects, fluent apis and factory methods

The generated R6 code can handle return of Java objects to R, as long as they are a part of the api and annotated with @RClass. A common use case for this is fluent Apis, where the Java object is manipulated by a method and returns itself.

    
    /**
     * Get the message
     * 
     * message desciption
     * @return The message (previously set by the constructor)
     */
    @RMethod
    public RCharacter getMessage() {
        return RConverter.convert(message);
    }   
    
    /**
     * Set a message in a fluent way
     * 
     * A fluent method which updates the message in this object, returning the
     * same object. This is differentiated from factory methods which produce a new
     * instance of the same class by checking to see if the returned Java object is equal
     * to the calling Java object.  
     * @param message the message is a string
     * @return this should return exactly the same R6 object.
     */
    @RMethod
    public FeatureTest fluentSetMessage(@RDefault(rCode = "\"hello\nworld\"") RCharacter message) {
        this.message = message.toString();
        return this;
    }

The JavaApi root manages R’s perspective on the identity of objects in Java. This allows for fluent api methods, and method chaining. This is not flawless but should work for most common scenarios. It is possible that complex edge cases may appear equal in Java but not identical in R, so true equality should rely on the Java equals() method.

feat1$getMessage()

## [1] "Hello world. Creating a new object"

feat2 = feat1$fluentSetMessage("Hello world. updating message.")
feat2$getMessage()

## [1] "Hello world. updating message."

if(identical(feat1,feat2)) {
  message("PASS: the return value of a fluent setter returns the same object as the original")
} else {
  print(feat1$.jobj)
  print(feat2$.jobj)
  print(feat1$.jobj$equals(feat2$.jobj))
  stop("FAIL: these should have been identical")
}

## PASS: the return value of a fluent setter returns the same object as the original

if (feat1$equals(feat2)) {
  message("PASS: java based equality detection is supported")
} else {
  stop("FAIL: these should have been equal")
}

## PASS: java based equality detection is supported

feat1$getMessage()

## [1] "Hello world. updating message."

# Operations on feat2 are occurring on feat1 as they are the same underlying object
feat2$fluentSetMessage("Hello world. updating message again.")
feat1$getMessage()

## [1] "Hello world. updating message again."

Factory methods allow java methods to create and return Java objects. This is supported as long as the objects are a part of the api and annotated with @RClass. Arbitrary Java objects are not supported as return types and Java code that tries to return such objects will throw an exception during the maven packaging phase. This is by design to enforce formal contracts between the Java code and the R api. If you want dynamic manipulation of the Java objects then the jsr223 plugin is more appropriate for you.

    
    /**
     * A factory or builder method which constructs an object of another class from some parameters 
     * @param a the first parameter
     * @param b the second parameter
     * @return A MoreFeatureTest R6 reference
     */
    @RMethod
    public MoreFeatureTest factoryMethod(RCharacter a, @RDefault(rCode = "as.character(Sys.Date())") RCharacter b) {
        return new MoreFeatureTest(a,b);
    }
    
    @RMethod
    public String objectAsParameter(MoreFeatureTest otherObj) {
        return otherObj.toString();
    }

This Java code from refers to another class - MoreFeatureTest which has the following basic structure:

/**
 * This has no documentation
 */
@RClass
public class MoreFeatureTest {

    String message1;
    String message2;
    
    static Logger log = LoggerFactory.getLogger(MoreFeatureTest.class); 
    
    /**
     * the first constructor is used if there are none annotated 
     * @param message1 - the message to be printed 
     * @param message2 - will be used for toString
     */
    public MoreFeatureTest(RCharacter message1, RCharacter message2) {
        this.message1 = message1.toString();
        this.message2 = message2.toString();
        log.info("constuctor: {}, {}",this.message1, this.message2);
    }
    
    /** A static object constructor
     * @param message1 - the message to be printed 
     * @param message2 - will be used for toString
     * @return A MoreFeatureTest R6 object
     */
    @RMethod(examples = {
        "J = JavaApi$get()",
        "J$MoreFeatureTest$create('Hello,',' World')"
    })
    public static MoreFeatureTest create(RCharacter message1, RCharacter message2) {
        return new MoreFeatureTest(message1,message2);
    }
    
    public String toString() {
        return "toString: "+message2;
    }
...
}

The FeatureTest.factoryMethod(a,b) method allows us to construct instances of another class. This enables builder patterns in the R api. The MoreFeatureTest.create(message1,message2) method demonstrates static factory methods, which return instances of the same class. Static methods are implemented as methods in the JavaApi root, as demonstrated here, and accessed through the root object J:

# factory method from builder class
moreFeat1 = feat1$factoryMethod("Hello","World")

## constuctor: Hello, World

# static factory method accessed through the root of the API
moreFeat2 = J$MoreFeatureTest$create("Ola","El Mundo")

## constuctor: Ola, El Mundo

# either of these can be passed as a parameter
feat1$objectAsParameter(moreFeat1)

## [1] "toString: World"

Logging, printing and exceptions

The logging sub-system is based on slf4j with a log4j2 implementation. These are specified in the r6-generator-runtime dependency pom.xml, so anything that imports that will have them as a transitive dependency. These are needed as dynamic alteration of the logging level from R is dependent on implementation details of log4j. This is maybe possible to remove in the future.

Exceptions thrown from Java are handled in the same way as rJava, and printed messages are seen on the R console as expected. However rJava does something strange to messages from System.out that means they do not appear in knitr output. To resolve this a unsightly workaround (hack) has been adopted that collects messages from system out and prints them after the Java method has completed. This has the potential to cause all sorts of issues, which I think I have mostly resolved, but it is best described as a work in progress.

The logging level can be controlled at runtime by a function in the JavaApi root. Logging can be configured dynamically with a log4j properties file (not shown) to enable file based logging, for example.

    
    @RMethod 
    public void testLogging() {
        log.error("An error");
        log.warn("A warning");
        log.info("A info");
        log.debug("A debug");
        log.trace("A trace");
    }
            
    @RMethod
    public RCharacter throwCatchable() throws Exception {
        throw new Exception("A catchable exception has been thrown");
    }
    
    @RMethod
    public void printMessage() {
        System.out.println("Printed in java: "+message1+" "+message2);
    }
    
    @RMethod
    public RCharacter throwRuntime() {
        throw new RuntimeException("A runtime exception has been thrown");
    }

# System.out printing
moreFeat1$printMessage()

## Printed in java: Hello World

# Testing logging levels
J$changeLogLevel("ALL")
moreFeat1$testLogging()

## An error
## A warning
## A info
## A debug
## A trace

# Suppressing errors
try(moreFeat1$throwCatchable(),silent = TRUE)

# Handling errors
tryCatch(
  {
    moreFeat1$throwRuntime()
  }, 
  error = function(e) {
    message("the error object has a set of classes: ",paste0(class(e),collapse=";"))
    warning("error: ",e$message)
    # the e$jobj entry gives native access to the throwable java object thanks to rJava.
    e$jobj$printStackTrace()
  }, 
  finally = print("finally")
)

## the error object has a set of classes: RuntimeException;Exception;Throwable;Object;error;condition

## Warning in value[[3L]](cond): error: java.lang.RuntimeException: A runtime
## exception has been thrown

## [1] "finally"

J$changeLogLevel("ERROR")
moreFeat1$testLogging()

## An error

# J$reconfigureLog("/path/to/log4j.prop")

Finalising and clean up

The Java objects bound to R instances will stay in memory whilst they are needed. When they go out of scope they should automatically be garbage collected as a native feature of rJava. R6 object finalizers are also generated when specified by the code and these are triggered during release of the Java objects, and may call any closing code needed in the Java library (e.g. closing input streams etc.).

feat1 = J$FeatureTest$new(logMessage = "Hello world. Creating a new object")
feat1$doHelloWorld()

## [1] "Hello world from Java!"

When an object goes out of scope the finalizer will be called. This can happen much later, and any errors thrown by the finalizer code could cause issues. Code run in these finalizers can throw unchecked exceptions which are ignored and converted to logged errors.

feat1 = NULL
gc()

##           used (Mb) gc trigger (Mb) max used  (Mb)
## Ncells 1303624 69.7    2582580  138  2582580 138.0
## Vcells 2674625 20.5    8388608   64  7558231  57.7

The finalizer should also be called implicitly when the R6 object goes out of scope in R.

Support for debugging

Debugging compiled Java code running in the context of a R is not for the faint-hearted. It definitely makes sense to test and debug the Java code in Java first. To make this possible it is useful to be able to serialise some test data in the exact format in which it will arrive in Java from R. To that end all the Java structures supported can be serialised, and de-serialised for testing purposes. The testRapi library presented here has a set of functions that facilitate this as static methods of J$Serialiser.

    
    @RMethod
    public static void serialiseDataframe(RDataframe dataframe, String filename) throws IOException {
        FileOutputStream fos = new FileOutputStream(filename);
        dataframe.writeRDS(fos);
        log.info("dataframe written to: "+filename);
    }
    
    @RMethod
    public static RDataframe deserialiseDataframe(String filename) throws IOException {
        InputStream is = Files.newInputStream(Paths.get(filename));
        if(is==null) throw new IOException("Could not locate "+filename);
        return RObject.readRDS(RDataframe.class, is);
    }

s = tempfile(pattern = "diamonds", fileext = ".ser")
J$Serialiser$serialiseDataframe(dataframe = ggplot2::diamonds, filename = s)
J$Serialiser$deserialiseDataframe(filename=s) %>% glimpse()

## Rows: 53,940
## Columns: 10
## $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.…
## $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ver…
## $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I,…
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, …
## $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64…
## $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58…
## $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 34…
## $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.…
## $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.…
## $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.…

With serialised test data, as dataframes, lists or named lists, development of Java functions and unit tests can be created that output values of the correct RObject datatype. Correct packaging and integration with R is a question of running mvn install to compile the Java into a jar file and generate R library code, then using devtools::install to install the generated R library. As you iterate development I have found it necessary to install the package and restart the session for R to pick up new changes in the compiled Java files. There is probably a cleaner way to do this but I haven’t found it yet.

# compile Java code and package R library using `mvn install` command
cd ~/Git/r6-generator-docs
mvn install

setwd("~/Git/r6-generator-docs")
# remove previously installed versions
try(detach("package:testRapi", unload = TRUE),silent = TRUE)
remove.packages("testRapi")
# rm(list = ls()) may be required to clear old versions of the library code
# Restarting R maybe also required if there was a running java VM otherwise changes to the jars on the classpath are not picked up.
# install locally compiled R library:
devtools::install("~/Git/r6-generator-docs", upgrade = "never")
# N.B. devtools::load_all() does not appear to always successfully pick up changes in the compiled java code

For initial integration testing there is a debug flag in the maven pom.xml that enables remote Java debugging to the initialized when the library is first loaded in R. When set to true a Java debugging session on port 8998 is opened which can be connected to as a remote Java application. This allows breakpoints to be set on Java code and the state of the JVM to be inspected when Java code is executed from R, however Java code changes cannot be hot-swapped into the running JVM, and so debugging integration issues is relatively limited. For more details see the Maven configuration vignette.

There are other limitations with enabling Java debugging, not least being the potential for port conflicts with multiple running instances of the development library, and caching issues between running and loaded versions of the Java code. Whilst not too painful (compared to the alternatives) this is very definitely not a REPL experience and reserved for the last stage of debugging. Part of the purpose of strongly enforcing a datatype conversion contract between Java and R, and extensive use of code generation, is to decouple Java and R development as much as possible (N.B. do as I say - not as I do).

Asychronous and long running code

Java code that takes a long time to complete or requires interaction from the user creates a problem for rJava as the program control is passed completely to Java during the code execution. This can lock the R session until the Java code is finished. The fact that the R session is blocked pending the result from Java means there is no obvious way to terminate a running Java process from within R, and if a Java process goes rogue then the R session hangs.

We have approached this by creating a RFuture class which is bundled in any R package built with r6-generator-maven-plugin, and some Java infrastructure to allow a Java method call, initiated by R, to be run in its own thread. The thread is monitored using the R6 RFuture class. This allows instantaneous return from the Java call which executes asynchronously in the background, freeing up the R process to continue. The RFuture class has functions to cancel() a thread, or check whether it is complete (isDone()), cancelled (isCanceled()), or to wait for the result and get() it.

The RFuture thread wrapper is used for Java methods annotated with @RAsync instead of @RMethod.

    
    int invocation = 0;
    int timer = 10;
    
    @RAsync(synchronise = true)
    public RCharacter asyncCountdown() throws InterruptedException {
        invocation = invocation + 1;
        timer = 10;
        String label = "Async and run thread safe "+invocation;
        // This example deliberately uses a not thread
        // safe design. However the synchronise=true annotation
        // forces it to be synchronised on the feature test class.
        
        // Progress in this thread can be recorded and displayed in R
        // when `get()` is called on a result in progress. The total is 
        // not actually required
        RProgressMonitor.setTotal(timer);
        
        while (timer > 0) {
            System.out.println(label+" ... "+timer);
            Thread.sleep(1000);
            timer--;
            
            // This static method is keyed off the thread id so can be placed
            // anywhere in code.
            RProgressMonitor.increment();
        }
        
        RProgressMonitor.complete();
        
        return RCharacter.from(label+" completed.");
    }
    
    @RAsync
    public RCharacter asyncRaceCountdown() throws InterruptedException {
        invocation = invocation + 1;
        timer = 10;
        String label = "Async and not thread safe "+invocation;
        // This example deliberately uses a not thread
        // safe design to demonstrate race conditions. These are the
        // responsiblity of the Java programmer to avoid.
        
        RProgressMonitor.setTotal(timer);
        while (timer > 0) {
            System.out.println(label+" ... "+timer);
            Thread.sleep(1000);
            timer--;
            RProgressMonitor.increment();
        }
        RProgressMonitor.complete();
        return RCharacter.from(label+" completed.");
    }

A basic test of this follows which starts the execution of a 10 second countdown in Java. The countdown

# J = testRapi::JavaApi$get(logLevel = "WARN")
featAsyn = J$FeatureTest$new("Async testing")
# The asyncCountdown resets a timer in the FeatureTest class
tmp = featAsyn$asyncCountdown()
message("Control returned immediately.")

## Control returned immediately.

Sys.sleep(4)
# The countdown is not finished
if (tmp$isDone()){
  stop("FAIL: Too soon for the countdown to have finished..!")
} else {
  message("PASS: 4 seconds later the countdown is still running.")
}

## PASS: 4 seconds later the countdown is still running.

Sys.sleep(8)
if (!tmp$isDone()) {
  stop("FAIL: It should have been finished by now!")
} else {
  message("PASS: the countdown is finished.")
  # in this case getting the result returns nothing as the java method is void
  # but it should trigger printing the java output.
}

## PASS: the countdown is finished.

System output from asynchronous code can be very confusing if it appears out of sequence to other code. The system output of Java code running asynchronously is cached and only displayed when the result is retrieved via get()

tmp$get()

## Async and run thread safe 1 ... 10
## Async and run thread safe 1 ... 9
## Async and run thread safe 1 ... 8
## Async and run thread safe 1 ... 7
## Async and run thread safe 1 ... 6
## Async and run thread safe 1 ... 5
## Async and run thread safe 1 ... 4
## Async and run thread safe 1 ... 3
## Async and run thread safe 1 ... 2
## Async and run thread safe 1 ... 1

## [1] "Async and run thread safe 1 completed."

RFuture does not ensure thread safety, which in general is up to the Java programmer however in the situation where you are annotating a non thread safe class that might be used in an @RAsync annotated method there is a basic locking mechanism that prevents multiple synchronous calls of the same method in the same object.

# Potential for race condition is prevented by the sychronise=true annotation
tmp = featAsyn$asyncCountdown()
tmp2 = featAsyn$asyncCountdown()
Sys.sleep(5)
if (tmp$cancel()) print("First counter cancelled.")

## [1] "First counter cancelled."

Although both counters were triggered at the same time the second one is waiting to obtain a lock. In this example we cancel the first call after 5 seconds:

system.time({
  try(tmp$get())
})

## Async and run thread safe 2 ... 10
## Async and run thread safe 2 ... 9
## Async and run thread safe 2 ... 8
## Async and run thread safe 2 ... 7
## Async and run thread safe 2 ... 6
## Async and run thread safe 2 ... 5

## Error in tmp$get() : 
##   background call to asyncCountdown(...) was cancelled.

##    user  system elapsed 
##   0.001   0.001   0.002

After which the second call starts processing. If you are running this interactively you will notice a progress indicator appears.

system.time({
  tmp2$get()
})

## Async and run thread safe 3 ... 10

## Async and run thread safe 3 ... 9

## Async and run thread safe 3 ... 8

## Async and run thread safe 3 ... 7

## Async and run thread safe 3 ... 6

## Async and run thread safe 3 ... 5

## Async and run thread safe 3 ... 4

## Async and run thread safe 3 ... 3

## Async and run thread safe 3 ... 2

## Async and run thread safe 3 ... 1

##    user  system elapsed 
##   2.046   0.068   9.826

If the default @RAsync(synchronise=false) is used then race conditions may occur if the Java method changes the state of other objects. This is demonstrated here where both methods are altering the underlying counter alternating. As before, the output is only displayed when the result is requested:

# Potential for race condition is prevented by the sychronise=true annotation
system.time({
  tmp = featAsyn$asyncRaceCountdown()
  tmp2 = featAsyn$asyncRaceCountdown()
  tmp$get()
})

## Async and not thread safe 4 ... 10

## Async and not thread safe 4 ... 9

## Async and not thread safe 4 ... 7

## Async and not thread safe 4 ... 5

## Async and not thread safe 4 ... 3

## Async and not thread safe 4 ... 1

##    user  system elapsed 
##   1.290   0.039   6.008

In this case the execution takes far less that 10 seconds as both countdowns are running in parallel and using the same timer. The output from the second function

system.time({
  tmp2$get()
})

## Async and not thread safe 5 ... 10
## Async and not thread safe 5 ... 8
## Async and not thread safe 5 ... 6
## Async and not thread safe 5 ... 4
## Async and not thread safe 5 ... 2

##    user  system elapsed 
##   0.002   0.000   0.002

The RFuture class is also useful to prevent lock-ups due to Java code entering an infinite loop or waiting on external input that never arrives. Sometimes blocking the R process is useful, as long as the Java process can be terminated at the same time as the R process, so that we can be sure that a Java process is finished. This is supported by the @RBlocking annotation which places the Java method call in a thread that can be cleanly interrupted from R, but otherwise makes R wait for Java to finish.

    
    @RBlocking
    public RCharacter blockingCountdown() throws InterruptedException {
        invocation = invocation + 1;
        timer = 10;
        String label = "Blocking "+invocation;
        RProgressMonitor.setTotal(timer);
        
        while (timer > 0) {
            System.out.println(label+" ... "+timer);
            Thread.sleep(1000);
            timer--;
            RProgressMonitor.increment();
        }
        
        RProgressMonitor.complete();
        return RCharacter.from(label+" completed.");
    }

tmp = featAsyn$blockingCountdown()

## Blocking 6 ... 10

## Blocking 6 ... 9

## Blocking 6 ... 8

## Blocking 6 ... 7

## Blocking 6 ... 6

## Blocking 6 ... 5

## Blocking 6 ... 4

## Blocking 6 ... 3

## Blocking 6 ... 2

## Blocking 6 ... 1

Static methods are more likely to be type safe. Async methods can be static in which case there is no potential for race conditions and we don’t need to check for them.

    
     
    @RAsync
    public static RCharacter asyncStaticCountdown(RCharacter label, @RDefault(rCode = "10") RInteger rtimer) throws InterruptedException {
        // N.B. inputs in Async classes cannot be Java primitives
        int timer = rtimer.javaPrimitive();
        RProgressMonitor.setTotal(timer);;
        while (timer > 0) {
            System.out.println(label.get()+" ... "+timer);
            Thread.sleep(1000);
            timer--;
            RProgressMonitor.increment();
        }
        RProgressMonitor.complete();
        return RCharacter.from(label+" completed.");
    }
    
    @RAsync
    public static FactoryTest asyncFactory() throws InterruptedException {
        Thread.sleep(5000);
        return new FactoryTest();
    }

# debug(J$FeatureTest$asyncStaticCountdown)
tmp = J$FeatureTest$asyncStaticCountdown("hello 1",4)
tmp2 = J$FeatureTest$asyncStaticCountdown("hello 2",4)
Sys.sleep(5)
tmp$get()

## hello 1 ... 4
## hello 1 ... 3
## hello 1 ... 2
## hello 1 ... 1

## [1] "hello 1 completed."

tmp2$get()

## hello 2 ... 4
## hello 2 ... 3
## hello 2 ... 2
## hello 2 ... 1

## [1] "hello 2 completed."

Parameters and return types in asynchronous methods

ASync and blocking methods are handled slightly different internally. When writing a Java method you cannot use inputs that are primitives. All parameters must be subtypes of RObject such as RInteger rather than the primitive equivalent int. This is a result of dynamic type checking using reflection when calling the java method and may be dealt with in the future. Async methods can happily return Java objects annotated with @RClass which will be appropriately passed to R wrapped in an R6 class.

tmp3 = J$FeatureTest$asyncFactory()
result = tmp3$get()
result$generateFactorVec()

## [1] ONE   THREE <NA>  TWO  
## Levels: ONE < TWO < THREE

Monitoring the status of long running operations

As long running jobs are in the background the status of all long running jobs may need to be queried. The status may be “cancelled”, “in progress”, “result ready” or if the result has been already retrieved by get() it may be “processed”.

tmp = featAsyn$asyncCountdown()
status = testRapi::.background_status()
status

##    id                                       status
## 1  22       asyncCountdown(...) [result processed]
## 2  23              asyncCountdown(...) [cancelled]
## 3  24       asyncCountdown(...) [result processed]
## 4  25   asyncRaceCountdown(...) [result processed]
## 5  26   asyncRaceCountdown(...) [result processed]
## 6  27    blockingCountdown(...) [result processed]
## 7  28 asyncStaticCountdown(...) [result processed]
## 8  29 asyncStaticCountdown(...) [result processed]
## 9  30         asyncFactory(...) [result processed]
## 10 31              asyncCountdown(...) [0/10 (0%)]

Previous results can be retrieved from this list using the id.

oldFut = testRapi::.background_get_by_id(status$id[1])
oldFut$get()

## [1] "Async and run thread safe 1 completed."

Releasing old results may be necessary if memory is an issue. The tidy up clears all processed and cancelled background tasks, and frees up associated JVM memory.

testRapi::.background_tidy_up()
testRapi::.background_status()

##   id                          status
## 1 31 asyncCountdown(...) [0/10 (0%)]

Summary

The r6-generator-maven-plugin can be used to generate an R package with R6 classes that exposes selected Java methods to R. Given enough detail in Java the resulting generated R package can be quite feature rich and setup in a format ready to deploy to r-universe. The aim is to make the process of creating R clients for Java libraries easy and maintainable.

Rob Challen

19/10/2020