15 R6

This chapter describes the R6 object system. Unlike S3 and S4, it provides encapsulated OO, which means that:

  • R6 methods belong to objects, not generics.

  • R6 objects are mutable: the usual copy-on-modify semantics do not apply.

These properties make R6 objects behave more like objects in programming languages such as Python, Ruby and Java. This does not mean that R6 is good, and S3 and S4 are bad, it just means that R has a different heritage than most modern mainstream programming languages.

R6 is very similar to a built-in OO system called reference classes, or RC for short. I’m going to teach you R6 instead of RC for four reasons:

  • R6 is much simpler. Both R6 and RC are built on top of environments, but while R6 uses S3, RC uses S4. R6 is only ~500 lines of R code (and ~1700 lines of tests!). We’re not going to discuss the implementation in depth here, but if you’ve mastered the contents of this book, you should be able to read the source code and figure out how it works.

  • RC mingles variables and fields in the same stack of environments so that you get (field) and set fields (field <<- value) like regular values. R6 puts fields in a separate environment so you get (self$field) and set (self$field <- value) with a prefix. The R6 approach is more verbose but is worth the tradeoff because it makes code easier to understand. It also makes inheritance across packages simpler and more robust.

  • R6 is much faster than RC. Generally, the speed of method dispatch is not important outside of microbenchmarks but R6 is substantially better than RC. Switching from RC to R6 yielded substantial performance in shiny. vignette("Performance", "R6") provides more details on the performance.

  • Because the ideas that underlie R6 and RC are similar, it will only require a small amount of additional effort to learn RC if you need to.

Because R6 is not built into base R, you’ll need to install and load a package in order to use it:

library(R6)

If you’d like to learn more about R6 after reading this chapter, the best place to start is the vignettes included in the package. You can list them by calling browseVignettes(package = "R6").

15.1 Classes and methods

R6 only needs a single function call to create both the class and its methods: R6::R6Class(). And this is the only function from the package that you’ll ever use! The following example shows the two most important arguments:

  • The first argument is the classname. It’s not strictly needed, but it improves error messages and makes it possible to also use R6 objects with S3 generics. By convention, R6 classes use UpperCamelCase.

  • The second argument, public, supplies a list of methods (functions) and fields (anything else) that make up the public interface of the object. By convention, methods and fields use snake_case. Methods can access the methods and fields of the current object via self$.

Accumulator <- R6Class("Accumulator", list(
  sum = 0,
  add = function(x = 1) {
    self$sum <- self$sum + x 
    invisible(self)
  })
)

You should always assign the result of R6Class() into a variable with the same name as the class. This creates an R6 object that defines the R6 class:

Accumulator
#> <Accumulator> object generator
#>   Public:
#>     sum: 0
#>     add: function (x = 1) 
#>     clone: function (deep = FALSE) 
#>   Parent env: <environment: R_GlobalEnv>
#>   Locked objects: TRUE
#>   Locked class: FALSE
#>   Portable: TRUE

You construct a new object from the class by calling the new() method. Methods belong to R6 objects so you use $ to access new():

x <- Accumulator$new() 

You can then call methods and access fields with $:

x$add(4) 
x$sum
#> [1] 4

In this class, the fields and methods are public which means that you can get or set the value of any field. Later, we’ll see how to use private fields and methods to prevent casual access to the internals of your class.

To make it clear when we’re talking about fields and methods as opposed to variables and functions, when referring to them in text, we’ll prefix with $. For example, the Accumulate class has field $sum and method $add().

15.1.1 Method chaining

$add() is called primarily for its side-effect of updating $sum.

Accumulator <- R6Class("Accumulator", list(
  sum = 0,
  add = function(x = 1) {
    self$sum <- self$sum + x 
    invisible(self)
  })
)

Side-effect R6 methods should always return self invisibly. This returns the “current” object and makes it possible to chain together multiple method calls:

x$add(10)$add(10)$sum
#> [1] 24

Alternatively, for long chains, you can spread the call over multiple lines:

x$
  add(10)$
  add(10)$
  sum
#> [1] 44

This technique is called method chaining and is commonly used in encapsulated OO languages (like Python and JavaScript) to create fluent interfaces. Method chaining is deeply related to the pipe, and we’ll discuss the pros and cons of each approach in pipe vs message-chaining tradeoffs.

15.1.2 Important methods

There are two important methods that will be defined for most classes: $initialize() and $print(). You don’t have to provide them, but it’s a good idea to do so because they will make your class easier to use.

$initialize() overrides the default behaviour of $new(). For example, the following code defines an R6 Person class, similar to the S4 equivalent in S4. Unlike S4, R6 provides no checks for object type by default. $initialize() is a good place to check that name and age are the correct types.

Person <- R6Class("Person", list(
  name = NULL,
  age = NA,
  initialize = function(name, age = NA) {
    stopifnot(is.character(name), length(name) == 1)
    stopifnot(is.numeric(age), length(age) == 1)
    
    self$name <- name
    self$age <- age
  }
))

hadley <- Person$new("Hadley", age = 37)

If you have more expensive validation requirements, implement them in a separate $validate() and only call when needed.

Defining $print() allows you to override the default printing behaviour. As with any R6 method called for its side effects, $print() should return invisible(self).

Person <- R6Class("Person", list(
  name = NULL,
  age = NA,
  initialize = function(name, age = NA) {
    self$name <- name
    self$age <- age
  },
  print = function(...) {
    cat("Person: \n")
    cat("  Name: ", self$name, "\n", sep = "")
    cat("  Age:  ", self$age, "\n", sep = "")
    invisible(self)
  }
))

hadley2 <- Person$new("Hadley")
hadley2
#> Person: 
#>   Name: Hadley
#>   Age:  NA

This code illustrates an important aspect of R6. Because methods are bound to individual objects, the previously created hadley does not get this new method:

hadley
#> <Person>
#>   Public:
#>     age: 37
#>     clone: function (deep = FALSE) 
#>     initialize: function (name, age = NA) 
#>     name: Hadley

Indeed, from the perspective of R6, there is no relationship between hadley and hadley2. This can make interactive experimentation with R6 confusing. If you’re changing the code and can’t figure out why the results of method calls aren’t changed, make sure you’ve re-constructed R6 objects with the new class.

There’s a useful alternative to $print(): implement $format(), which should return a character vector. This will automatically be used by both print() and format() S3 generics.

Person <- R6Class("Person", list(
  age = NA,
  name = NULL,
  initialize = function(name, age = NA) {
    self$name <- name
    self$age <- age
  },
  format = function(...) {
    # The first `paste0()` is not necessary but it lines up
    # with the subsequent lines making it easier to see how
    # it will print
    c(
      paste0("Person:"),
      paste0("  Name: ", self$name),
      paste0("  Age:  ", self$age)
    )
  }
))

hadley3 <- Person$new("Hadley")
format(hadley3)
#> [1] "Person:"        "  Name: Hadley" "  Age:  NA"
hadley3
#> Person:
#>   Name: Hadley
#>   Age:  NA

15.1.3 Adding methods after creation

Instead of continuously creating new classes, it’s also possible to modify the methods of an existing class. This is useful when exploring interactively, and when you have a class with many functions that you’d like to break up into pieces.

Once the class has been defined, you can add elements to it with $set(), supplying the visibility (more on that below), the name, and the component.

Accumulator <- R6Class("Accumulator")
Accumulator$set("public", "sum", 0)
Accumulator$set("public", "add", function(x = 1) {
  self$sum <- self$sum + x 
  invisible(self)
})

$set() will not overwrite an existing method unless you explicitly ask for it:

Accumulator$set("public", "sum", 1)
#> Error in Accumulator$set("public", "sum", 1): Can't add sum because it already present in Accumulator generator.
Accumulator$set("public", "sum", 1, overwrite = TRUE)

Also note that adding methods will only affect new objects generated from the class. It does not retrospectively apply to existing objects:

x1 <- Accumulator$new()
Accumulator$set("public", "hello", function() message("Hi!"))
x1$hello()
#> Error in eval(expr, envir, enclos): attempt to apply non-function

x2 <- Accumulator$new()
x2$hello()
#> Hi!

15.1.4 Inheritance

To inherit behaviour from an existing class, provide the class object to the inherit argument:

AccumulatorChatty <- R6Class("AccumulatorChatty", 
  inherit = Accumulator,
  public = list(
    add = function(x = 1) {
      cat("Adding ", x, "\n", sep = "")
      super$add(x = x)
    }
  )
)

x2 <- AccumulatorChatty$new()
x2$add(10)$add(1)$sum
#> Adding 10
#> Adding 1
#> [1] 12

Note that $add() overrides the implementation in the superclass, but we can access the previous implementation through super$. Any methods which are overridden will automatically call the implementation in the parent class.

Like S3, R6 only supports single inheritance: you cannot supply a vector of classes to inherit.

15.1.5 Introspection

Every R6 object has an S3 class that reflects the hierarchy of R6 classes. This means that the easiest way to determine the class (and all classes it inherits from) is to use class():

class(hadley3)
#> [1] "Person" "R6"

The S3 hierarchy includes the base “R6” class. This provides common behaviour, including an print.R6() method which calls $print() or $format(), as described above.

You can list all methods and fields with names():

names(hadley3)
#> [1] ".__enclos_env__" "name"            "age"             "clone"          
#> [5] "format"          "initialize"

There’s one method that we haven’t defined: $clone(). It’s provided by R6 and we’ll come back to it in reference semantics.

15.1.6 Exercises

  1. Can subclasses access private fields/methods from their parent? Perform an experiment to find out.

15.2 Controlling access

R6Class() has two other arguments that work similarly to public: private and active. private allows you to create components that the user can not easily access, and active allows you to use accessor functions to define dynamic, or active, fields.

15.2.1 Privacy

With R6 you can define private fields and methods, elements that can only be accessed from within the class, not from the outside. There are two things that you need to know to take advantage of private elements:

  • The private argument works in the same way as the public argument: you give it a named list of methods (functions) and fields (everything else).

  • Fields and methods defined in private are available within the methods with private$ instead of self$. You cannot access private fields or methods outside of the class.

To make this concrete, we could make $age and $name fields of the Person class private. With this definition of Person we can only set $age and $name during object creation, and we cannot access their values from outside of the class.

Person <- R6Class("Person", 
  public = list(
    initialize = function(name, age = NA) {
      private$name <- name
      private$age <- age
    },
    print = function(...) {
      cat("Person: \n")
      cat("  Name: ", private$name, "\n", sep = "")
      cat("  Age:  ", private$age, "\n", sep = "")
    }
  ),
  private = list(
    age = NA,
    name = NULL
  )
)

hadley4 <- Person$new("Hadley")
hadley4$name
#> NULL

The distinction between public and private fields is important when you create complex networks of classes, and you want to make it as clear as possible what it’s ok for others to access. Anything that’s private can be more easily refactored because you know others aren’t relying on it. Private methods tend to be more important in other programming languages compared to R because the object hierarchies in R tend to be simpler.

15.2.2 Active fields

Active fields make allow you to define components that look like fields from the outside, but are defined with functions, like methods. For example, we can define an active field x that returns a different value every time you access it:

Rando <- R6::R6Class("Rando", active = list(
  random = function(value) {
    runif(1)
  }
))
x <- Rando$new()
x$random
#> [1] 0.0808
x$random
#> [1] 0.834
x$random
#> [1] 0.601

Active fields are particularly useful in conjunction with privacy, because they make it possible to implement components that work like fields from the outside but provide additional checks. For example, you can use them to implement read-only fields or fields that validate their inputs.

Active fields are implemented using active bindings from base R. Each active binding is a function that takes a single argument: value. If the argument is missing(), the value is being retrieved; otherwise it’s being modified. We can use that idea to make a read-only age field, and to ensure that name is a length 1 character vector.

Person <- R6Class("Person", 
  private = list(
    .age = NA,
    .name = NULL
  ),
  active = list(
    age = function(value) {
      if (missing(value)) {
        private$.age
      } else {
        stop("`$age` is read only", call. = FALSE)
      }
    },
    name = function(value) {
      if (missing(value)) {
        private$.name
      } else {
        stopifnot(is.character(name), length(name) == 1)
        private$.name <- value
        self
      }
    }
  ),
  public = list(
    initialize = function(name, age = NA) {
      private$.name <- name
      private$.age <- age
    }
  )
)

hadley5 <- Person$new("Hadley")
hadley5$name
#> [1] "Hadley"
hadley5$name <- 10
#> Error in stopifnot(is.character(name), length(name) == 1): object 'name' not found

hadley5$age
#> [1] NA
hadley5$age <- 20
#> Error: `$age` is read only

15.2.3 Exercises

  1. How would you define a write-only field?

15.3 Reference semantics

One of the big differences between R6 and most other objects in R is that they have reference semantics. This is because they are S3 objects built on top of environments:

typeof(x2)
#> [1] "environment"

The main consequence of reference semantics is that objects are not copied when modified:

y1 <- Accumulator$new() 
y2 <- y1

y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
#> y1 y2 
#> 11 11

Instead, if you want a copy, you’ll need to explicitly $clone() the object:

y1 <- Accumulator$new() 
y2 <- y1$clone()

y1$add(10)
c(y1 = y1$sum, y2 = y2$sum)
#> y1 y2 
#> 11  1

(Note that $clone() does not recursively clone nested R6 objects. If you want that, you’ll need to use $clone(deep = TRUE). Note that this only clones R6 objects: if you have other fields with reference semantics (e.g. environments) you’ll need to define your own $clone().)

There are three other less obvious consequences:

  • It is harder to reason about code that uses R6 objects because you need to understand more context.

  • It makes sense to think about when an R6 object is deleted, and you can write a finalizer() to complement the initializer().

  • If one of the fields is an R6 class, you must call $new() inside $initialize() not inside R6Class().

These are described in more detail below.

15.3.1 Reasoning

Generally, reference semantics makes code harder to reason about. Take this very simple example:

x <- list(a = 1)
y <- list(b = 2)

z <- f(x, y)

For the vast majority of functions, you know that the final line only modifies z.

Take a similar equivalent that uses an imaginary List reference class:

x <- List$new(a = 1)
y <- List$new(b = 2)

z <- f(x, y)

The final line is much harder to reason about - it’s completely possible that f() calls methods of x or y, modifying them in place. This is the biggest potential downside of R6. The best way to ameliorate this problem is to avoid writing functions that both return a value and modify R6 inputs.

That said, modifying R6 inputs can lead to substantially simpler code in some cases. One challenge of working with immutable data is known as threading state: if you want to return a value that’s modified in a deeply nested function, you need to return the modified value up through every function. This can complicate code, particularly if you need to modify multiple values. For example, ggplot2 uses R6 objects for scales. Scales are complex because they need to combine data across every facet and every layer. Using R6 makes the code substantially simpler, at the cost of introducing subtle bugs. Fixing those bugs required careful placement of calls to $clone() to ensure that independent plots didn’t accidentally share scale data. We’ll come back to this idea in [oo-tradeoffs].

15.3.2 Finalizer

One useful property of reference semantics is that it makes sense to think about when an R6 object is finalised, i.e. when it’s deleted. This doesn’t make sense for S3 and S4 objects because copy-on-modify semantics mean that there may be many transient versions of an object. For example, in the following code, there are actually two factor objects: the second is created when the levels are modified, leaving the first to be destroyed at the next garbage collection.

x <- factor(c("a", "b", "c"))
levels(x) <- c("c", "b", "a")

Since R6 objects are not copied-on-modify they will only get deleted once, and it makes sense to think about $finalize() as a complement to $initialize(). Finalizers usually play a similar role to on.exit(), cleaning up any resources created by the initializer. For example, the following class wraps up a temporary file, automatically deleting it when the class is finalised.

TemporaryFile <- R6Class("TemporaryFile", list(
  path = NULL,
  initialize = function() {
    self$path <- tempfile()
  },
  finalize = function() {
    message("Cleaning up ", self$path)
    unlink(self$path)
  }
))

tf <- TemporaryFile$new()

The finalise method will be run when R exits, or by the first garbage collection after the object has been removed. Generally, this will happen when it happens, but it can occasionally be useful to force a run with an explicit call to gc().

rm(tf)
invisible(gc())

15.3.3 R6 fields

A final consequence of reference semantics can crop up where you don’t expect it. Beware of setting a default value to an R6 class: it will be shared across all instances of the object. This is because the child object is only initialized once, when you defined the class, not each time you call new.

TemporaryDatabase <- R6Class("TemporaryDatabase", list(
  con = NULL,
  file = TemporaryFile$new(),
  initialize = function() {
    DBI::dbConnect(RSQLite::SQLite(), path = file$path)
  }
))

db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()

db_a$file$path == db_b$file$path
#> [1] TRUE

You can fix this by creating the object in $initialize():

TemporaryDatabase <- R6Class("TemporaryDatabase", list(
  con = NULL,
  file = NULL,
  initialize = function() {
    self$file <- TemporaryFile$new()
    DBI::dbConnect(RSQLite::SQLite(), path = file$path)
  }
))

db_a <- TemporaryDatabase$new()
db_b <- TemporaryDatabase$new()

db_a$file$path == db_b$file$path
#> [1] FALSE

15.3.4 Exercises