13 S3

13.1 Introduction

S3 is R’s first and simplest OO system. S3 is informal and ad hoc, but there is a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise. S3 is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.

S3 is very flexible, which means it allows you to do things that are quite ill-advised. If you’re coming from a strict environment like Java this will seem pretty frightening, but it gives R programmers a tremendous amount of freedom. It may be very difficult to prevent people from doing something you don’t want them to do, but your users will never be held back because there is something you haven’t implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will therefore teach you the conventions you should (almost) always follow.

The goal of this chapter is to show you how the S3 system works, not how to use it effectively to create new classes and generics. I’d recommend coupling the theoretical knowledge from this chapter with the practical knowledge encoded in the vctrs package.

Outline

  • Section 13.2 gives a rapid overview of all the main components of S3: classes, generics, and methods. You’ll also learn about sloop::s3_dispatch(), which we’ll use throughout the chapter to explore how S3 works.

  • Section 13.3 goes into the details of creating a new S3 class, including the three functions that should accompany most classes: a constructor, a helper, and a validator.

  • Section 13.4 describes how S3 generics and methods work, including the basics of method dispatch.

  • Section 13.5 discusses the four main styles of S3 objects: vector, record, data frame, and scalar.

  • Section 13.6 demonstrates how inheritance works in S3, and shows you what you need to make a class “subclassable”.

  • Section 13.7 concludes the chapter with a discussion of the finer details of method dispatch including base types, internal generics, group generics, and double dispatch.

Prerequisites

S3 classes are implemented using attributes, so make sure you’re familiar with the details described in Section 3.3. We’ll use existing base S3 vectors for examples and exploration, so make sure that you’re familiar with the factor, Date, difftime, POSIXct, and POSIXlt classes described in Section 3.4.

We’ll use the sloop package for its interactive helpers.

13.2 Basics

An S3 object is a base type with at least a class attribute (other attributes may be used to store other data). For example, take the factor. Its base type is the integer vector, it has a class attribute of “factor”, and a levels attribute that stores the possible levels:

f <- factor(c("a", "b", "c"))

typeof(f)
#> [1] "integer"
attributes(f)
#> $levels
#> [1] "a" "b" "c"
#> 
#> $class
#> [1] "factor"

You can get the underlying base type by unclass()ing it, which strips the class attribute, causing it to lose its special behaviour:

unclass(f)
#> [1] 1 2 3
#> attr(,"levels")
#> [1] "a" "b" "c"

An S3 object behaves differently from its underlying base type whenever it’s passed to a generic (short for generic function). The easiest way to tell if a function is a generic is to use sloop::ftype() and look for “generic” in the output:

ftype(print)
#> [1] "S3"      "generic"
ftype(str)
#> [1] "S3"      "generic"
ftype(unclass)
#> [1] "primitive"

A generic function defines an interface, which uses a different implementation depending on the class of an argument (almost always the first argument). Many base R functions are generic, including the important print():

print(f)
#> [1] a b c
#> Levels: a b c

# stripping class reverts to integer behaviour
print(unclass(f))
#> [1] 1 2 3
#> attr(,"levels")
#> [1] "a" "b" "c"

Beware that str() is generic, and some S3 classes use that generic to hide the internal details. For example, the POSIXlt class used to represent date-time data is actually built on top of a list, a fact which is hidden by its str() method:

time <- strptime(c("2017-01-01", "2020-05-04 03:21"), "%Y-%m-%d")
str(time)
#>  POSIXlt[1:2], format: "2017-01-01" "2020-05-04"

str(unclass(time))
#> List of 9
#>  $ sec  : num [1:2] 0 0
#>  $ min  : int [1:2] 0 0
#>  $ hour : int [1:2] 0 0
#>  $ mday : int [1:2] 1 4
#>  $ mon  : int [1:2] 0 4
#>  $ year : int [1:2] 117 120
#>  $ wday : int [1:2] 0 1
#>  $ yday : int [1:2] 0 124
#>  $ isdst: int [1:2] 0 0
#>  - attr(*, "tzone")= chr "UTC"

The generic is a middleman: its job is to define the interface (i.e. the arguments) then find the right implementation for the job. The implementation for a specific class is called a method, and the generic finds that method by performing method dispatch.

You can use sloop::s3_dispatch() to see the process of method dispatch:

s3_dispatch(print(f))
#> => print.factor
#>  * print.default

We’ll come back to the details of dispatch in Section 13.4.1, for now note that S3 methods are functions with a special naming scheme, generic.class(). For example, the factor method for the print() generic is called print.factor(). You should never call the method directly, but instead rely on the generic to find it for you.

Generally, you can identify a method by the presence of . in the function name, but there are a number of important functions in base R that were written before S3, and hence use . to join words. If you’re unsure, check with sloop::ftype():

ftype(t.test)
#> [1] "S3"      "generic"
ftype(t.data.frame)
#> [1] "S3"     "method"

Unlike most functions, you can’t see the source code for most S3 methods68 just by typing their names. That’s because S3 methods are not usually exported: they live only inside the package, and are not available from the global environment. Instead, you can use sloop::s3_get_method(), which will work regardless of where the method lives:

weighted.mean.Date
#> Error in eval(expr, envir, enclos): object 'weighted.mean.Date' not found

s3_get_method(weighted.mean.Date)
#> function (x, w, ...) 
#> .Date(weighted.mean(unclass(x), w, ...))
#> <bytecode: 0x7f9682f700b8>
#> <environment: namespace:stats>

13.2.1 Exercises

  1. Describe the difference between t.test() and t.data.frame(). When is each function called?

  2. Make a list of commonly used base R functions that contain . in their name but are not S3 methods.

  3. What does the as.data.frame.data.frame() method do? Why is it confusing? How could you avoid this confusion in your own code?

  4. Describe the difference in behaviour in these two calls.

    set.seed(1014)
    some_days <- as.Date("2017-01-31") + sample(10, 5)
    
    mean(some_days)
    #> [1] "2017-02-06"
    mean(unclass(some_days))
    #> [1] 17203
  5. What class of object does the following code return? What base type is it built on? What attributes does it use?

    x <- ecdf(rpois(100, 10))
    x
    #> Empirical CDF 
    #> Call: ecdf(rpois(100, 10))
    #>  x[1:18] =  2,  3,  4,  ..., 2e+01, 2e+01
  6. What class of object does the following code return? What base type is it built on? What attributes does it use?

    x <- table(rpois(100, 5))
    x
    #> 
    #>  1  2  3  4  5  6  7  8  9 10 
    #>  7  5 18 14 15 15 14  4  5  3

13.3 Classes

If you have done object-oriented programming in other languages, you may be surprised to learn that S3 has no formal definition of a class: to make an object an instance of a class, you simply set the class attribute. You can do that during creation with structure(), or after the fact with class<-():

# Create and assign class in one step
x <- structure(list(), class = "my_class")

# Create, then set class
x <- list()
class(x) <- "my_class"

You can determine the class of an S3 object with class(x), and see if an object is an instance of a class using inherits(x, "classname").

class(x)
#> [1] "my_class"
inherits(x, "my_class")
#> [1] TRUE
inherits(x, "your_class")
#> [1] FALSE

The class name can be any string, but I recommend using only letters and _. Avoid . because (as mentioned earlier) it can be confused with the . separator between a generic name and a class name. When using a class in a package, I recommend including the package name in the class name. That ensures you won’t accidentally clash with a class defined by another package.

S3 has no checks for correctness which means you can change the class of existing objects:

# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
#> [1] "lm"
print(mod)
#> 
#> Call:
#> lm(formula = log(mpg) ~ log(disp), data = mtcars)
#> 
#> Coefficients:
#> (Intercept)    log(disp)  
#>       5.381       -0.459

# Turn it into a date (?!)
class(mod) <- "Date"

# Unsurprisingly this doesn't work very well
print(mod)
#> Error in as.POSIXlt.Date(x): 'list' object cannot be coerced to type 'double'

If you’ve used other OO languages, this might make you feel queasy, but in practice this flexibility causes few problems. R doesn’t stop you from shooting yourself in the foot, but as long as you don’t aim the gun at your toes and pull the trigger, you won’t have a problem.

To avoid foot-bullet intersections when creating your own class, I recommend that you usually provide three functions:

  • A low-level constructor, new_myclass(), that efficiently creates new objects with the correct structure.

  • A validator, validate_myclass(), that performs more computationally expensive checks to ensure that the object has correct values.

  • A user-friendly helper, myclass(), that provides a convenient way for others to create objects of your class.

You don’t need a validator for very simple classes, and you can skip the helper if the class is for internal use only, but you should always provide a constructor.

13.3.1 Constructors

S3 doesn’t provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. the same base type and the same attributes with the same types). Instead, you must enforce a consistent structure by using a constructor.

The constructor should follow three principles:

  • Be called new_myclass().

  • Have one argument for the base object, and one for each attribute.

  • Check the type of the base object and the types of each attribute.

I’ll illustrate these ideas by creating constructors for base classes69 that you’re already familiar with. To start, lets make a constructor for the simplest S3 class: Date. A Date is just a double with a single attribute: its class is “Date”. This makes for a very simple constructor:

new_Date <- function(x = double()) {
  stopifnot(is.double(x))
  structure(x, class = "Date")
}

new_Date(c(-1, 0, 1))
#> [1] "1969-12-31" "1970-01-01" "1970-01-02"

The purpose of constructors is to help you, the developer. That means you can keep them simple, and you don’t need to optimise error messages for public consumption. If you expect users to also create objects, you should create a friendly helper function, called class_name(), which I’ll describe shortly.

A slightly more complicated constructor is that for difftime, which is used to represent time differences. It is again built on a double, but has a units attribute that must take one of a small set of values:

new_difftime <- function(x = double(), units = "secs") {
  stopifnot(is.double(x))
  units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks"))

  structure(x,
    class = "difftime",
    units = units
  )
}

new_difftime(c(1, 10, 3600), "secs")
#> Time differences in secs
#> [1]    1   10 3600
new_difftime(52, "weeks")
#> Time difference of 52 weeks

The constructor is a developer function: it will be called in many places, by an experienced user. That means it’s OK to trade a little safety in return for performance, and you should avoid potentially time-consuming checks in the constructor.

13.3.2 Validators

More complicated classes require more complicated checks for validity. Take factors, for example. A constructor only checks that types are correct, making it possible to create malformed factors:

new_factor <- function(x = integer(), levels = character()) {
  stopifnot(is.integer(x))
  stopifnot(is.character(levels))

  structure(
    x,
    levels = levels,
    class = "factor"
  )
}

new_factor(1:5, "a")
#> Error in as.character.factor(x): malformed factor
new_factor(0:1, "a")
#> Error in as.character.factor(x): malformed factor

Rather than encumbering the constructor with complicated checks, it’s better to put them in a separate function. Doing so allows you to cheaply create new objects when you know that the values are correct, and easily re-use the checks in other places.

validate_factor <- function(x) {
  values <- unclass(x)
  levels <- attr(x, "levels")

  if (!all(!is.na(values) & values > 0)) {
    stop(
      "All `x` values must be non-missing and greater than zero",
      call. = FALSE
    )
  }

  if (length(levels) < max(values)) {
    stop(
      "There must be at least as many `levels` as possible values in `x`",
      call. = FALSE
    )
  }

  x
}

validate_factor(new_factor(1:5, "a"))
#> Error: There must be at least as many `levels` as possible values in `x`
validate_factor(new_factor(0:1, "a"))
#> Error: All `x` values must be non-missing and greater than zero

This validator function is called primarily for its side-effects (throwing an error if the object is invalid) so you’d expect it to invisibly return its primary input (as described in Section 6.7.2). However, it’s useful for validation methods to return visibly, as we’ll see next.

13.3.3 Helpers

If you want users to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. A helper should always:

  • Have the same name as the class, e.g. myclass().

  • Finish by calling the constructor, and the validator, if it exists.

  • Create carefully crafted error messages tailored towards an end-user.

  • Have a thoughtfully crafted user interface with carefully chosen default values and useful conversions.

The last bullet is the trickiest, and it’s hard to give general advice. However, there are three common patterns:

  • Sometimes all the helper needs to do is coerce its inputs to the desired type. For example, new_difftime() is very strict, and violates the usual convention that you can use an integer vector wherever you can use a double vector:

    new_difftime(1:10)
    #> Error in new_difftime(1:10): is.double(x) is not TRUE

    It’s not the job of the constructor to be flexible, so here we create a helper that just coerces the input to a double.

    difftime <- function(x = double(), units = "secs") {
      x <- as.double(x)
      new_difftime(x, units = units)
    }
    
    difftime(1:10)
    #> Time differences in secs
    #>  [1]  1  2  3  4  5  6  7  8  9 10
  • Often, the most natural representation of a complex object is a string. For example, it’s very convenient to specify factors with a character vector. The code below shows a simple version of factor(): it takes a character vector, and guesses that the levels should be the unique values. This is not always correct (since some levels might not be seen in the data), but it’s a useful default.

    factor <- function(x = character(), levels = unique(x)) {
      ind <- match(x, levels)
      validate_factor(new_factor(ind, levels))
    }
    
    factor(c("a", "a", "b"))
    #> [1] a a b
    #> Levels: a b
  • Some complex objects are most naturally specified by multiple simple
    components. For example, I think it’s natural to construct a date-time by supplying the individual components (year, month, day etc). That leads me to this POSIXct() helper that resembles the existing ISODatetime() function70:

    POSIXct <- function(year = integer(), 
                        month = integer(), 
                        day = integer(), 
                        hour = 0L, 
                        minute = 0L, 
                        sec = 0, 
                        tzone = "") {
      ISOdatetime(year, month, day, hour, minute, sec, tz = tzone)
    }
    
    POSIXct(2020, 1, 1, tzone = "America/New_York")
    #> [1] "2020-01-01 EST"

For more complicated classes, you should feel free to go beyond these patterns to make life as easy as possible for your users.

13.3.4 Exercises

  1. Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

  2. Enhance my factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

  3. Carefully read the source code of factor(). What does it do that my constructor does not?

  4. Factors have an optional “contrasts” attribute. Read the help for C(), and briefly describe the purpose of the attribute. What type should it have? Rewrite the new_factor() constructor to include this attribute.

  5. Read the documentation for utils::as.roman(). How would you write a constructor for this class? Does it need a validator? What might a helper do?

13.4 Generics and methods

The job of an S3 generic is to perform method dispatch, i.e. find the specific implementation for a class. Method dispatch is performed by UseMethod(), which every generic calls71. UseMethod() takes two arguments: the name of the generic function (required), and the argument to use for method dispatch (optional). If you omit the second argument, it will dispatch based on the first argument, which is almost always what is desired.

Most generics are very simple, and consist of only a call to UseMethod(). Take mean() for example:

mean
#> function (x, ...) 
#> UseMethod("mean")
#> <bytecode: 0x7f9682af1668>
#> <environment: namespace:base>

Creating your own generic is similarly simple:

my_new_generic <- function(x) {
  UseMethod("my_new_generic")
}

(If you wonder why we have to repeat my_new_generic twice, think back to Section 6.2.3.)

You don’t pass any of the arguments of the generic to UseMethod(); it uses deep magic to pass to the method automatically. The precise process is complicated and frequently surprising, so you should avoid doing any computation in a generic. To learn the full details, carefully read the Technical Details section in ?UseMethod.

13.4.1 Method dispatch

How does UseMethod() work? It basically creates a vector of method names, paste0("generic", ".", c(class(x), "default")), and then looks for each potential method in turn. We can see this in action with sloop::s3_dispatch(). You give it a call to an S3 generic, and it lists all the possible methods. For example, what method is called when you print a Date object?

x <- Sys.Date()
s3_dispatch(print(x))
#> => print.Date
#>  * print.default

The output here is simple:

The “default” class is a special pseudo-class. This is not a real class, but is included to make it possible to define a standard fallback that is found whenever a class-specific method is not available.

The essence of method dispatch is quite simple, but as the chapter proceeds you’ll see it get progressively more complicated to encompass inheritance, base types, internal generics, and group generics. The code below shows a couple of more complicated cases which we’ll come back to in Sections 14.2.4 and 13.7.

x <- matrix(1:10, nrow = 2)
s3_dispatch(mean(x))
#>    mean.matrix
#>    mean.integer
#>    mean.numeric
#> => mean.default

s3_dispatch(sum(Sys.time()))
#>    sum.POSIXct
#>    sum.POSIXt
#>    sum.default
#> => Summary.POSIXct
#>    Summary.POSIXt
#>    Summary.default
#> -> sum (internal)

13.4.2 Finding methods

sloop::s3_dispatch() lets you find the specific method used for a single call. What if you want to find all methods defined for a generic or associated with a class? That’s the job of sloop::s3_methods_generic() and sloop::s3_methods_class():

s3_methods_generic("mean")
#> # A tibble: 7 x 4
#>   generic class      visible source             
#>   <chr>   <chr>      <lgl>   <chr>              
#> 1 mean    Date       TRUE    base               
#> 2 mean    default    TRUE    base               
#> 3 mean    difftime   TRUE    base               
#> 4 mean    POSIXct    TRUE    base               
#> 5 mean    POSIXlt    TRUE    base               
#> 6 mean    quosure    FALSE   registered S3method
#> 7 mean    vctrs_vctr FALSE   registered S3method

s3_methods_class("ordered")
#> # A tibble: 4 x 4
#>   generic       class   visible source             
#>   <chr>         <chr>   <lgl>   <chr>              
#> 1 as.data.frame ordered TRUE    base               
#> 2 Ops           ordered TRUE    base               
#> 3 relevel       ordered FALSE   registered S3method
#> 4 Summary       ordered TRUE    base

13.4.3 Creating methods

There are two wrinkles to be aware of when you create a new method:

  • First, you should only ever write a method if you own the generic or the class. R will allow you to define a method even if you don’t, but it is exceedingly bad manners. Instead, work with the author of either the generic or the class to add the method in their code.

  • A method must have the same arguments as its generic. This is enforced in packages by R CMD check, but it’s good practice even if you’re not creating a package.

    There is one exception to this rule: if the generic has ..., the method can contain a superset of the arguments. This allows methods to take arbitrary additional arguments. The downside of using ..., however, is that any misspelled arguments will be silently swallowed72, as mentioned in Section 6.6.

13.4.4 Exercises

  1. Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it? Why?

    x <- structure(1:10, class = "test")
    t(x)
  2. What generics does the table class have methods for?

  3. What generics does the ecdf class have methods for?

  4. Which base generic has the greatest number of defined methods?

  5. Carefully read the documentation for UseMethod() and explain why the following code returns the results that it does. What two usual rules of function evaluation does UseMethod() violate?

    g <- function(x) {
      x <- 10
      y <- 10
      UseMethod("g")
    }
    g.default <- function(x) c(x = x, y = y)
    
    x <- 1
    y <- 1
    g(x)
    #>  x  y 
    #>  1 10
  6. What are the arguments to [? Why is this a hard question to answer?

13.5 Object styles

So far I’ve focussed on vector style classes like Date and factor. These have the key property that length(x) represents the number of observations in the vector. There are three variants that do not have this property:

  • Record style objects use a list of equal-length vectors to represent individual components of the object. The best example of this is POSIXlt, which underneath the hood is a list of 11 date-time components like year, month, and day. Record style classes override length() and subsetting methods to conceal this implementation detail.

    x <- as.POSIXlt(ISOdatetime(2020, 1, 1, 0, 0, 1:3))
    x
    #> [1] "2020-01-01 00:00:01 UTC" "2020-01-01 00:00:02 UTC"
    #> [3] "2020-01-01 00:00:03 UTC"
    
    length(x)
    #> [1] 3
    length(unclass(x))
    #> [1] 9
    
    x[[1]] # the first date time
    #> [1] "2020-01-01 00:00:01 UTC"
    unclass(x)[[1]] # the first component, the number of seconds
    #> [1] 1 2 3
  • Data frames are similar to record style objects in that both use lists of equal length vectors. However, data frames are conceptually two dimensional, and the individual components are readily exposed to the user. The number of observations is the number of rows, not the length:

    x <- data.frame(x = 1:100, y = 1:100)
    length(x)
    #> [1] 2
    nrow(x)
    #> [1] 100
  • Scalar objects typically use a list to represent a single thing. For example, an lm object is a list of length 12 but it represents one model.

    mod <- lm(mpg ~ wt, data = mtcars)
    length(mod)
    #> [1] 12

    Scalar objects can also be built on top of functions, calls, and environments73. This is less generally useful, but you can see applications in stats::ecdf(), R6 (Chapter 14), and rlang::quo() (Chapter 19).

Unfortunately, describing the appropriate use of each of these object styles is beyond the scope of this book. However, you can learn more from the documentation of the vctrs package (https://vctrs.r-lib.org); the package also provides constructors and helpers that make implementation of the different styles easier.

13.5.1 Exercises

  1. Categorise the objects returned by lm(), factor(), table(), as.Date(), as.POSIXct() ecdf(), ordered(), I() into the styles described above.

  2. What would a constructor function for lm objects, new_lm(), look like? Use ?lm and experimentation to figure out the required fields and their types.

13.6 Inheritance

S3 classes can share behaviour through a mechanism called inheritance. Inheritance is powered by three ideas:

  • The class can be a character vector. For example, the ordered and POSIXct classes have two components in their class:

    class(ordered("x"))
    #> [1] "ordered" "factor"
    class(Sys.time())
    #> [1] "POSIXct" "POSIXt"
  • If a method is not found for the class in the first element of the vector, R looks for a method for the second class (and so on):

    s3_dispatch(print(ordered("x")))
    #>    print.ordered
    #> => print.factor
    #>  * print.default
    s3_dispatch(print(Sys.time()))
    #> => print.POSIXct
    #>    print.POSIXt
    #>  * print.default
  • A method can delegate work by calling NextMethod(). We’ll come back to that very shortly; for now, note that s3_dispatch() reports delegation with ->.

    s3_dispatch(ordered("x")[1])
    #>    [.ordered
    #> => [.factor
    #>    [.default
    #> -> [ (internal)
    s3_dispatch(Sys.time()[1])
    #> => [.POSIXct
    #>    [.POSIXt
    #>    [.default
    #> -> [ (internal)

Before we continue we need a bit of vocabulary to describe the relationship between the classes that appear together in a class vector. We’ll say that ordered is a subclass of factor because it always appears before it in the class vector, and, conversely, we’ll say factor is a superclass of ordered.

S3 imposes no restrictions on the relationship between sub- and superclasses but your life will be easier if you impose some. I recommend that you adhere to two simple principles when creating a subclass:

  • The base type of the subclass should be that same as the superclass.

  • The attributes of the subclass should be a superset of the attributes of the superclass.

POSIXt does not adhere to these principles because POSIXct has type double, and POSIXlt has type list. This means that POSIXt is not a superclass, and illustrates that it’s quite possible to use the S3 inheritance system to implement other styles of code sharing (here POSIXt plays a role more like an interface), but you’ll need to figure out safe conventions yourself.

13.6.1 NextMethod()

NextMethod() is the hardest part of inheritance to understand, so we’ll start with a concrete example for the most common use case: [. We’ll start by creating a simple toy class: a secret class that hides its output when printed:

new_secret <- function(x = double()) {
  stopifnot(is.double(x))
  structure(x, class = "secret")
}

print.secret <- function(x, ...) {
  print(strrep("x", nchar(x)))
  invisible(x)
}

x <- new_secret(c(15, 1, 456))
x
#> [1] "xx"  "x"   "xxx"

This works, but the default [ method doesn’t preserve the class:

s3_dispatch(x[1])
#>    [.secret
#>    [.default
#> => [ (internal)
x[1]
#> [1] 15

To fix this, we need to provide a [.secret method. How could we implement this method? The naive approach won’t work because we’ll get stuck in an infinite loop:

`[.secret` <- function(x, i) {
  new_secret(x[i])
}

Instead, we need some way to call the underlying [ code, i.e. the implementation that would get called if we didn’t have a [.secret method. One approach would be to unclass() the object:

`[.secret` <- function(x, i) {
  x <- unclass(x)
  new_secret(x[i])
}
x[1]
#> [1] "xx"

This works, but is inefficient because it creates a copy of x. A better approach is to use NextMethod(), which concisely solves the problem of delegating to the method that would have been called if [.secret didn’t exist:

`[.secret` <- function(x, i) {
  new_secret(NextMethod())
}
x[1]
#> [1] "xx"

We can see what’s going on with sloop::s3_dispatch():

s3_dispatch(x[1])
#> => [.secret
#>    [.default
#> -> [ (internal)

The => indicates that [.secret is called, but that NextMethod() delegates work to the underlying internal [ method, as shown by the ->.

As with UseMethod(), the precise semantics of NextMethod() are complex. In particular, it tracks the list of potential next methods with a special variable, which means that modifying the object that’s being dispatched upon will have no impact on which method gets called next.

13.6.2 Allowing subclassing

When you create a class, you need to decide if you want to allow subclasses, because it requires some changes to the constructor and careful thought in your methods.

To allow subclasses, the parent constructor needs to have ... and class arguments:

new_secret <- function(x, ..., class = character()) {
  stopifnot(is.double(x))

  structure(
    x,
    ...,
    class = c(class, "secret")
  )
}

Then the subclass constructor can just call to the parent class constructor with additional arguments as needed. For example, imagine we want to create a supersecret class which also hides the number of characters:

new_supersecret <- function(x) {
  new_secret(x, class = "supersecret")
}

print.supersecret <- function(x, ...) {
  print(rep("xxxxx", length(x)))
  invisible(x)
}

x2 <- new_supersecret(c(15, 1, 456))
x2
#> [1] "xxxxx" "xxxxx" "xxxxx"

To allow inheritance, you also need to think carefully about your methods, as you can no longer use the constructor. If you do, the method will always return the same class, regardless of the input. This forces whoever makes a subclass to do a lot of extra work.

Concretely, this means we need to revise the [.secret method. Currently it always returns a secret(), even when given a supersecret:

`[.secret` <- function(x, ...) {
  new_secret(NextMethod())
}

x2[1:3]
#> [1] "xx"  "x"   "xxx"

We want to make sure that [.secret returns the same class as x even if it’s a subclass. As far as I can tell, there is no way to solve this problem using base R alone. Instead, you’ll need to use the vctrs package, which provides a solution in the form of the vctrs::vec_restore() generic. This generic takes two inputs: an object which has lost subclass information, and a template object to use for restoration.

Typically vec_restore() methods are quite simple: you just call the constructor with appropriate arguments:

vec_restore.secret <- function(x, to, ...) new_secret(x)
vec_restore.supersecret <- function(x, to, ...) new_supersecret(x)

(If your class has attributes, you’ll need to pass them from to into the constructor.)

Now we can use vec_restore() in the [.secret method:

`[.secret` <- function(x, ...) {
  vctrs::vec_restore(NextMethod(), x)
}
x2[1:3]
#> [1] "xxxxx" "xxxxx" "xxxxx"

(I only fully understood this issue quite recently, so at time of writing it is not used in the tidyverse. Hopefully by the time you’re reading this, it will have rolled out, making it much easier to (e.g.) subclass tibbles.)

If you build your class using the tools provided by the vctrs package, [ will gain this behaviour automatically. You will only need to provide your own [ method if you use attributes that depend on the data or want non-standard subsetting behaviour. See ?vctrs::new_vctr for details.

13.6.3 Exercises

  1. How does [.Date support subclasses? How does it fail to support subclasses?

  2. R has two classes for representing date time data, POSIXct and POSIXlt, which both inherit from POSIXt. Which generics have different behaviours for the two classes? Which generics share the same behaviour?

  3. What do you expect this code to return? What does it actually return? Why?

    generic2 <- function(x) UseMethod("generic2")
    generic2.a1 <- function(x) "a1"
    generic2.a2 <- function(x) "a2"
    generic2.b <- function(x) {
      class(x) <- "a1"
      NextMethod()
    }
    
    generic2(structure(list(), class = c("b", "a2")))

13.7 Dispatch details

This chapter concludes with a few additional details about method dispatch. It is safe to skip these details if you’re new to S3.

13.7.1 S3 and base types

What happens when you call an S3 generic with a base object, i.e. an object with no class? You might think it would dispatch on what class() returns:

class(matrix(1:5))
#> [1] "matrix" "array"

But unfortunately dispatch actually occurs on the implicit class, which has three components:

  • The string “array” or “matrix” if the object has dimensions
  • The result of typeof() with a few minor tweaks
  • The string “numeric” if object is “integer” or “double”

There is no base function that will compute the implicit class, but you can use sloop::s3_class()

s3_class(matrix(1:5))
#> [1] "matrix"  "integer" "numeric"

This is used by s3_dispatch():

s3_dispatch(print(matrix(1:5)))
#>    print.matrix
#>    print.integer
#>    print.numeric
#> => print.default

This means that the class() of an object does not uniquely determine its dispatch:

x1 <- 1:5
class(x1)
#> [1] "integer"
s3_dispatch(mean(x1))
#>    mean.integer
#>    mean.numeric
#> => mean.default

x2 <- structure(x1, class = "integer")
class(x2)
#> [1] "integer"
s3_dispatch(mean(x2))
#>    mean.integer
#> => mean.default

13.7.2 Internal generics

Some base functions, like [, sum(), and cbind(), are called internal generics because they don’t call UseMethod() but instead call the C functions DispatchGroup() or DispatchOrEval(). s3_dispatch() shows internal generics by including the name of the generic followed by (internal):

s3_dispatch(Sys.time()[1])
#> => [.POSIXct
#>    [.POSIXt
#>    [.default
#> -> [ (internal)

For performance reasons, internal generics do not dispatch to methods unless the class attribute has been set, which means that internal generics do not use the implicit class. Again, if you’re ever confused about method dispatch, you can rely on s3_dispatch().

13.7.3 Group generics

Group generics are the most complicated part of S3 method dispatch because they involve both NextMethod() and internal generics. Like internal generics, they only exist in base R, and you cannot define your own group generic.

There are four group generics:

Defining a single group generic for your class overrides the default behaviour for all of the members of the group. Methods for group generics are looked for only if the methods for the specific generic do not exist:

s3_dispatch(sum(Sys.time()))
#>    sum.POSIXct
#>    sum.POSIXt
#>    sum.default
#> => Summary.POSIXct
#>    Summary.POSIXt
#>    Summary.default
#> -> sum (internal)

Most group generics involve a call to NextMethod(). For example, take difftime() objects. If you look at the method dispatch for abs(), you’ll see there’s a Math group generic defined.

y <- as.difftime(10, units = "mins")
s3_dispatch(abs(y))
#>    abs.difftime
#>    abs.default
#> => Math.difftime
#>    Math.default
#> -> abs (internal)

Math.difftime basically looks like this:

Math.difftime <- function(x, ...) {
  new_difftime(NextMethod(), units = attr(x, "units"))
}

It dispatches to the next method, here the internal default, to perform the actual computation, then restore the class and attributes. (To better support subclasses of difftime this would need to call vec_restore(), as described in Section 13.6.2.)

Inside a group generic function a special variable .Generic provides the actual generic function called. This can be useful when producing error messages, and can sometimes be useful if you need to manually re-call the generic with different arguments.

13.7.4 Double dispatch

Generics in the Ops group, which includes the two-argument arithmetic and Boolean operators like - and &, implement a special type of method dispatch. They dispatch on the type of both of the arguments, which is called double dispatch. This is necessary to preserve the commutative property of many operators, i.e. a + b should equal b + a. Take the following simple example:

date <- as.Date("2017-01-01")
integer <- 1L

date + integer
#> [1] "2017-01-02"
integer + date
#> [1] "2017-01-02"

If + dispatched only on the first argument, it would return different values for the two cases. To overcome this problem, generics in the Ops group use a slightly different strategy from usual. Rather than doing a single method dispatch, they do two, one for each input. There are three possible outcomes of this lookup:

  • The methods are the same, so it doesn’t matter which method is used.

  • The methods are different, and R falls back to the internal method with a warning.

  • One method is internal, in which case R calls the other method.

This approach is error prone so if you want to implement robust double dispatch for algebraic operators, I recommend using the vctrs package. See ?vctrs::vec_arith for details.

13.7.5 Exercises

  1. Explain the differences in dispatch below:

    length.integer <- function(x) 10
    
    x1 <- 1:5
    class(x1)
    #> [1] "integer"
    s3_dispatch(length(x1))
    #>  * length.integer
    #>    length.numeric
    #>    length.default
    #> => length (internal)
    
    x2 <- structure(x1, class = "integer")
    class(x2)
    #> [1] "integer"
    s3_dispatch(length(x2))
    #> => length.integer
    #>    length.default
    #>  * length (internal)
  2. What classes have a method for the Math group generic in base R? Read the source code. How do the methods work?

  3. Math.difftime() is more complicated than I described. Why?