13 S3

S3 is R’s first and simplest OO system. S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. Because of these reasons, S3 should be your default choice for OO programming: you should use it unless you have a compelling reason otherwise. S3 is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.

S3 is a very flexible system: it allows you to do a lot of things that are quite ill-advised. If you’re coming from a strict environment like Java, this will seem pretty frightening (and it is!) but it does give R programmers a tremendous amount of freedom. While it’s very difficult to prevent someone from doing something you don’t want them to do, your users will never be held back because there is something you haven’t implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will teach you the conventions you should (almost) always adhere to in order to use S3 safely.

We’ll use the sloop package to fill in some missing pieces when it comes to S3.

# install_github("hadley/sloop")
library(sloop)

13.1 Basics

An S3 object is built on top of a base type with the “class” attribute set. The base type is typically a vector, although we will see later that it’s possible to use other types of classes. For example, take the factor. It is built on top of an integer vector, and the value of the class attribute is “factor”. It stores information about the “levels” in another attribute.

f <- factor("a")

typeof(f)
#> [1] "integer"
attributes(f)
#> $levels
#> [1] "a"
#> 
#> $class
#> [1] "factor"

An S3 object behaves differently from its underlying base type because of generic functions, or generics for short. A generic executes different code depending on the class of one of its arguments, almost always the first. You can see this difference with the most important generic function: print().

print(f)
#> [1] a
#> Levels: a
print(unclass(f))
#> [1] 1
#> attr(,"levels")
#> [1] "a"

unclass() strips the class attribute from its input, so it is a useful tool for seeing what special behaviour an S3 class adds.

str() shows the internal structure of S3 objects. Be careful when using str(): some S3 classes provide a custom str() method which can hide the underlying details. For example, take the POSIXlt class, which is one of the two classes used to represent date-time data:

time <- strptime("2017-01-01", "%Y-%m-%d")
str(time)
#>  POSIXlt[1:1], format: "2017-01-01"
str(unclass(time), list.len = 5)
#> List of 9
#>  $ sec  : num 0
#>  $ min  : int 0
#>  $ hour : int 0
#>  $ mday : int 1
#>  $ mon  : int 0
#>   [list output truncated]
#>  - attr(*, "tzone")= chr "UTC"

A generic and its methods are functions that operate on classes. The role of a generic is to find the right method for the arguments that it is provided, the process of method dispatch. A method is a function that implements the generic behaviour for a specific class. In other words the job of the generic is to find the right method; the job of the method is to do the work.

S3 methods are functions with a special naming scheme, generic.class(). For example, the Date method for the mean() generic is called mean.Date(), and the factor method for print() is called print.factor(). This is the reason that most modern style guides discourage the use of . in function names: it makes them look like S3 methods. For example, is t.test() the t method for test objects?

You can find some S3 methods (those in the base package and those that you’ve created) by typing their names. However, this will not work with most packages because S3 methods are not exported: they live only inside the package, and are not available from the global environment. Instead, you can use getS3method(), which will work regardless of where the method lives:

# Only works because the method is in the base package
mean.Date
#> function (x, ...) 
#> structure(mean(unclass(x), ...), class = "Date")
#> <bytecode: 0x4e635a8>
#> <environment: namespace:base>

# Always works
getS3method("mean", "Date")
#> function (x, ...) 
#> structure(mean(unclass(x), ...), class = "Date")
#> <bytecode: 0x4e635a8>
#> <environment: namespace:base>

13.1.1 Exercises

  1. The most important S3 objects in base R are factors, data frames, and date/times (Dates, POSIXct, POSIXlt). You’ve already seen the attributes and base type that factors are built on. What base types and attributes are the others built on?

  2. Describe the difference in behaviour in these two calls.

    set.seed(1014)
    some_days <- as.Date("2017-01-31") + sample(10, 5)
    
    mean(some_days)
    #> [1] "2017-02-05"
    mean(unclass(some_days))
    #> [1] 17202
  3. Draw a Venn diagram illustrating the relationships between functions, generics, and methods.

  4. What does the as.data.frame.data.frame() method do? Why is it confusing? How should you avoid this confusion in your own code?

  5. What does the following code return? What base type is it built on? What attributes does it use?

    x <- ecdf(rpois(100, 10))
    x
    #> Empirical CDF 
    #> Call: ecdf(rpois(100, 10))
    #>  x[1:18] =  2,  3,  4,  ..., 2e+01, 2e+01

13.2 Classes

S3 is a simple and ad hoc system, and has no formal definition of a class. To make an object an instance of a class, you simply take an existing object and set the class attribute. You can do that during creation with structure(), or after the fact with class<-():

# Create and assign class in one step
foo <- structure(list(), class = "foo")

# Create, then set class
foo <- list()
class(foo) <- "foo"

You can determine the class of any object using class(x), and see if an object inherits from a specific class using inherits(x, "classname").

class(foo)
#> [1] "foo"
inherits(foo, "foo")
#> [1] TRUE

The class name can be any character vector, but I recommend using only letters and _. Avoid .. Opinion is mixed whether to use underscores (my_class) or CamelCase (MyClass) for multi-word class names. Pick one convention and stick with it.

It’s possible to provide a vector of class names, which allows S3 to implement a basic style of inheritance. This allows you to reduce your workload by allowing classes to share code where possible. We’ll come back to this idea in inheritance.

S3 has no checks for correctness. This means you can change the class of existing objects:

# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
#> [1] "lm"
print(mod)
#> 
#> Call:
#> lm(formula = log(mpg) ~ log(disp), data = mtcars)
#> 
#> Coefficients:
#> (Intercept)    log(disp)  
#>       5.381       -0.459

# Turn it into a data frame (?!)
class(mod) <- "data.frame"

# Unsurprisingly this doesn't work very well
print(mod)
#>  [1] coefficients  residuals     effects       rank          fitted.values
#>  [6] assign        qr            df.residual   xlevels       call         
#> [11] terms         model        
#> <0 rows> (or 0-length row.names)

If you’ve used other OO languages, this might make you feel queasy. But surprisingly, this flexibility causes few problems: while you can change the type of an object, you never should. R doesn’t protect you from yourself: you can easily shoot yourself in the foot. As long as you don’t aim the gun at your foot and pull the trigger, you won’t have a problem.

To avoid foot-bullet intersections when creating your own class, you should always provide:

  • A constructor, new_x(), that efficiently creates new objects with the correct structure.

For more complicated classes, you may also want to provide:

  • A validator, validate_x(), that performs more expensive checks that the object has correct values.

  • A helper, x(), that provides a convenient and neatly parameterised way for others to construct and validate (create) objects of this class.

13.2.1 Constructors

S3 doesn’t provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. same attributes with the same types). Instead, you should enforce a consistent structure yourself by using a constructor. A constructor is a function whose job is to create objects of a given class, ensuring that they always have the same structure.

There are three rules that a constructor should follow. It should:

  1. Be called new_class_name().
  2. Have one argument for the base object, and one for each attribute. (More if the class can be subclassed, see inheritance.)
  3. Check the types of the base object and each attribute.

Base R generally does not provide constructors (three exceptions are the internal .difftime(), .POSIXct(), and .POSIXlt()) so we’ll demonstrate constructors by filling in some missing pieces in base. (If you want to use these constructors in your own code, you can use the versions exported by the sloop package, which complete a few details that we skip here in order to focus on the core issues.)

We’ll start with one of the simplest S3 classes in base R: Date, which is just a double with a class attribute. The constructor rules lead to the slightly awkward name new_Date(), because the existing base class uses a capital letter. I recommend using lower case class names to avoid this problem.

new_Date <- function(x) {
  stopifnot(is.double(x))
  structure(x, class = "Date")
}

new_Date(c(-1, 0, 1))
#> [1] "1969-12-31" "1970-01-01" "1970-01-02"

You can use the new_s3_*() helpers provided by the sloop to make this even simpler. They are wrappers around structure that require a class argument, and check the base type of x.

new_Date <- function(x) {
  sloop::new_s3_dbl(x, class = "Date")
}

The purpose of the constructor is to help the developer (you). That means you can keep them simple, and you don’t need to optimise the error messages for user friendliness. If you expect others to create your objects, you should also create a friendly helper function, called class_name(), that we’ll describe shortly.

A slightly more complicated example is POSIXct, which is used to represent date-times. It is again built on a double, but has an attribute that specifies the time zone, a length 1 character vector. R defaults to using the local time zone, which is represented by the empty string. To create the constructor, we need to make sure each attribute of the class gets an argument to the constructor. This gives us:

new_POSIXct <- function(x, tzone = "") {
  stopifnot(is.double(x))
  stopifnot(is.character(tzone), length(tzone) == 1)
  
  structure(x, 
    class = c("POSIXct", "POSIXt"),
    tzone = tzone
  )
}

new_POSIXct(1)
#> [1] "1970-01-01 00:00:01 UTC"
new_POSIXct(1, tzone = "UTC")
#> [1] "1970-01-01 00:00:01 UTC"

The constructor checks that x is a double, and that tzone is a length 1 character vector. We use stopifnot() here since the constructor is a developer focussed function so error messages don’t need to be that friendly. Note that POSIXct uses a class vector; we’ll come back to what that means in inheritance.

Generally, the constructor should not check that the values are valid because such checks are often expensive. For example, our new_POSIXct() constructor does not check that tzone is a valid value, and we get a warning when the object is printed.

x <- new_POSIXct(1, "Auckland NZ")
x
#> [1] "1970-01-01 00:00:01 Auckland"

13.2.2 Validators

More complicated classes will require more complicated checks for validity. Take factors, for example. The constructor function only checks that the structure is correct:

new_factor <- function(x, levels) {
  stopifnot(is.integer(x))
  stopifnot(is.character(levels))
  
  structure(
    x,
    levels = levels,
    class = "factor"
  )
}

So it’s possible to use this to create invalid factors:

new_factor(1:5, "a")
#> Error in as.character.factor(x): malformed factor
new_factor(0:1, "a")
#> Error in as.character.factor(x): malformed factor

Rather than encumbering the constructor with complicated checks, it’s better to put them in a separate function. This is a good idea because it allows you to cheaply create new objects when you know that the values are correct, and to re-use the checks in other places.

validate_factor <- function(x) {
  values <- unclass(x)
  levels <- attr(x, "levels")
  
  if (!all(!is.na(values) & values > 0)) {
    stop(
      "All `x` values must be non-missing and greater than zero",
      call. = FALSE
    )
  }
  
  if (length(levels) < max(values)) {
    stop(
      "There must at least as many `levels` as possible values in `x`",
      call. = FALSE
    )
  }
  
  x
}

validate_factor(new_factor(1:5, "a"))
#> Error: There must at least as many `levels` as possible values in `x`
validate_factor(new_factor(0:1, "a"))
#> Error: All `x` values must be non-missing and greater than zero

This function is called primarily for its side-effects (throwing an error if the object is invalid) so you’d expect it to invisibly return its primary input. However, unlike most functions called for their side effects, its useful for validation methods to return visibly, as we’ll see next.

13.2.3 Helpers

If you want others to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. This should have the same name as the class, and should be parameterised in a convenient way. factor() is a good example of this as well: you want to automatically derive the internal representation from a vector. The simplest possible implementation looks something like this:

factor <- function(x, levels = unique(x)) {
  ind <- match(x, levels)
  validate_factor(new_factor(ind, levels))
}
factor(c("a", "a", "b"))
#> [1] a a b
#> Levels: a b

The validator prevents the construction of invalid objects, but for a real helper you’d spend more time creating user friendly error messages.

factor(c("a", "a", "b"), levels = "a")
#> Error: All `x` values must be non-missing and greater than zero

In base R, neither Date nor POSIXct has a helper function. Instead there are two ways to construct them:

  • By coercing from another type with as.Date() and as.POSIXct(). These functions should be S3 generics, so we’ll come back to them in coercion.

  • With a helper function that either parses a string (strptime()) or creates a date from individual components (ISODatetime()).

These missing helpers mean that there’s no obvious default way to create a date or date-time in R. We can fill in those missing pieces with a couple of helpers:

Date <- function(year, month, day) {
  as.Date(ISOdate(year, month, day, tz = ""))
}

POSIXct <- function(year, month, day, hour, minute, sec, tzone = "") {
  ISOdatetime(year, month, day, hour, minute, sec, tz = tzone)
}

These helpers fill a useful role, but are not computationally efficient: behind the scenes ISODatetime() works by pasting the components into a string and then using strptime(). More efficient equivalents are lubridate::make_datetime() and lubridate::make_date().

13.2.4 Object styles

S3 gives you the freedom to build a new class on top of any existing base type. So far, we’ve focussed on vector-style where you take an existing vector type and add some attributes. Importantly, a single vector-style object represents multiple values. There are two other important styles: scalar-style and data-frame-style.

Each scalar-style object represents a single “value”, and are built on top of named lists. This is the style that you are most likely to use in practice. The constructor for the scalar type is slightly different because the arguments become named elements of the list, rather than attributes.

new_scalar_class <- function(x, y, z) {
  structure(
    list(
      x = x,
      y = y,
      z = z
    ),
    class = "scalar_class"
  )
}

(For a real constructor, you’d also check that the x, y, and z fields are the types that you expect.)

In base R, the most important example of this style is lm, the class returned when you fit a linear model:

mod <- lm(mpg ~ wt, data = mtcars)
typeof(mod)
#> [1] "list"
names(mod)
#>  [1] "coefficients"  "residuals"     "effects"       "rank"         
#>  [5] "fitted.values" "assign"        "qr"            "df.residual"  
#>  [9] "xlevels"       "call"          "terms"         "model"

The data-frame-style builds on top of a data frame (a named list where each element is a vector of the same length), and adds additional attributes to store important metadata. A data-frame-style constructor looks like:

new_df_class <- function(df, attr1, attr2) {
  stopifnot(is.data.frame(df))
  
  structure(
    df, 
    attr1 = attr1,
    attr2 = attr2,
    class = c("df_class", "data.frame")
  )
}

The most common data-frame-style class is the tibble, a modern reimagining of the data frame provided by the tibble package, and used extensively within the tidyverse.

Collectively, we’ll call the attributes of a vector-style or data-frame-style class and the names of a list-style class the fields of an object.

When creating your own classes, you should pick the vector style if your class closely resembles an existing vector type. Otherwise, use a scalar (list) style. The scalar type is generally easier to work with because implementing a full range of convenient vectorised methods is usually a lot of work. It’s typically obvious when you need to use a data-frame-style.

13.2.5 Exercises

  1. Categorise the objects returned by lm(), factor(), table(), as.Date(), ecdf(), ordered(), I() into “vector”, “scalar”, and “other”.

  2. Write a constructor for difftime objects. What base type are they built on? What attributes do they use? You’ll need to consult the documentation, read some code, and perform some experiments.

  3. Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

  4. Enhance our factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

  5. Carefully read the source code of factor(). What does it do that our constructor does not?

  6. What would a constructor function for lm objects, new_lm(), look like? Why is a constructor function less useful for linear models?

13.3 Generics and methods

The job of an S3 generic is to perform method dispatch, i.e. find the function designed to work specifically for the given class. S3 generics have a simple structure: they call UseMethod(), which then calls the right method. UseMethod() takes two arguments: the name of the generic function (required), and the argument to use for method dispatch (optional). If you omit the second argument it will dispatch based on the first argument, which is what I generally advise.

# Dispatches on x
generic <- function(x, y, ...) {
  UseMethod("generic")
}

# Dispatches on y
generic2 <- function(x, y, ...) {
  UseMethod("generic2", y)
}

Note that you don’t pass any of the arguments of the generic to UseMethod(); it uses black magic to pass them on automatically. Generally, you should avoid doing any computation in a generic, because the semantics are complicated and few people know the details. In general, any modifications to the arguments of the generic will be undone, leading to much confusion.

A generic isn’t useful without some methods, which are just functions that follow a naming scheme (generic.class). Because a method is just a function with a special name, you can call methods directly, but you generally shouldn’t. The main reason to call the method directly is that it sometimes leads to considerable performance improvements. See performance for an example.

generic.foo <- function(x, y, ...) {
  message("foo method")
}

generic(new_s3_scalar(class = "foo"))
#> foo method

You can see all the methods defined for a generic with s3_methods_generic():

s3_methods_generic("generic")
#> # A tibble: 2 x 4
#>   generic class    visible source    
#>   <chr>   <chr>    <lgl>   <chr>     
#> 1 generic foo      T       .GlobalEnv
#> 2 generic skeleton T       methods

Note the false positive: generic.skeleton() is not a method for our generic but an existing function in the methods package. It’s picked up because method definition relies only on a naming convention. This is another reason that you should avoid using . in non-method function names.

Remember that apart from methods that you’ve created, and those defined in the base package, most S3 methods will not be directly accessible. You’ll need to use getS3method("generic", "class") to see their source code.

13.3.1 Coercion

Many S3 objects can be naturally created from an existing object through coercion. If this is the case for your class, you should provide a coercion function, an S3 generic called as_class_name. Base R generally does not follow this convention, which can cause problems as illustrated by as.factor():

  • The name is confusing, since as.factor() is not the factor method of the as() generic.

  • as.factor() is not a generic, which means that if you create a new class that could be usefully converted to a factor, you can not extend as.factor().

We can fix these issues by creating a new generic coercion function and providing it with some methods:

as_factor <- function(x, ...) {
  UseMethod("as_factor")
}

Every as_y() generic should have a y method that returns its input unchanged:

as_factor.factor <- function(x, ...) x

This ensures that as_factor() works if the input is already a factor.

Two useful methods would be for character and integer vectors.

as_factor.character <- function(x, ...) {
  factor(x, levels = unique(x))
}
as_factor.integer <- function(x, ...) {
  factor(x, levels = as.character(unique(x)))
}

Typically the coercion methods will either call the constructor or the helper; pick the function that makes the code simpler. Here the helper is simplest. If you use the constructor, remember to also call the validator function.

If you think your coercion function will be frequently used, it’s worth providing a default method that gives a better error message. Default methods are called when no other method is appropriate, and are discussed in more detail in inheritance.

as_factor(1)
#> Error in UseMethod("as_factor"): no applicable method for 'as_factor' applied to an object of class "c('double', 'numeric')"

as_factor.default <- function(x, ...) {
  stop(
    "Don't know how to coerce object of class ", 
    paste(class(x), collapse = "/"), " into a factor", 
    call. = FALSE
  )
}
as_factor(1)
#> Error: Don't know how to coerce object of class numeric into a factor

13.3.2 Arguments

Methods should always have the same arguments as their generics. This is not usually enforced, but it is good practice because it will avoid confusing behaviour. If you do eventually turn your code into a package, R CMD check will enforce it, so it’s good to get into the habit now.

There is one exception to this rule: if the generic has ..., the method must still have all the same arguments (including ...), but can also have its own additional arguments. This allows methods to take additional arguments, which is important because you don’t know what additional arguments that a method for someone else’s class might need. The downside of using ..., however, is that any misspelled arguments will be silently swallowed.

13.3.3 Exercises

  1. Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it? Why?

    x <- structure(1:10, class = "test")
    t(x)
    #> 
    #>  One Sample t-test
    #> 
    #> data:  x
    #> t = 6, df = 9, p-value = 3e-04
    #> alternative hypothesis: true mean is not equal to 0
    #> 95 percent confidence interval:
    #>  3.33 7.67
    #> sample estimates:
    #> mean of x 
    #>       5.5
  2. Carefully read the documentation for UseMethod() and explain why the following code returns the results that it does. What two usual rules of function evaluation does UseMethod() violate?

    g <- function(x) {
      x <- 10
      y <- 10
      UseMethod("g")
    }
    g.default <- function(x) c(x = x, y = y)
    
    x <- 1
    y <- 1
    g(x)
    #>  x  y 
    #>  1 10

13.4 Method dispatch

At a high-level, S3 method dispatch is simple, and revolves around two functions, UseMethod() and NextMethod(). You’ll learn about these two functions below, and then we’ll come back to some of the additional wrinkles in dispatch details.

13.4.1 UseMethod()

The purpose of UseMethod() is to find the appropriate method to call given a generic and a class. It does this by creating a vector of function names, paste0("generic", ".", c(class(x), "default")), and looking for each method in turn. As soon as it finds a matching method, it calls it. If no matching method is found, it throws an error. To explore dispatch, we’ll use sloop::s3_dispatch(). You give it a call to an S3 generic, and it lists all the possible methods, noting which ones exist. For example, what happens when you try and print a POSIXct object?

x <- Sys.time()
s3_dispatch(print(x))
#> -> print.POSIXct
#>    print.POSIXt
#>  * print.default

print() will look for three possible methods, of which two exist, and one, print.POSIXct(), will be called. The last method is always the “default” method. This doesn’t correspond to a specific class, so is a useful catch all.

13.4.2 NextMethod()

Method dispatch usually terminates as soon as a matching method is found. However, methods can explicitly choose to call the next available method using NextMethod(). This is useful because it allows you to rely on code that others have already written, which we’ll come back to in inheritance. Let’s make NextMethod() concrete with an example. Here, I define a new generic (“showoff”) with three methods. Each method signals that it’s been called, and then calls the “next” method:

showoff <- function(x) {
  UseMethod("showoff")
}
showoff.default <- function(x) {
  message("showoff.default")
  TRUE
}
showoff.a <- function(x) {
  message("showoff.a")
  NextMethod()
}
showoff.b <- function(x) {
  message("showoff.b")
  NextMethod()
}

Let’s create a dummy object with classes “b” and “a”. s3_dispatch() shows that all three potential methods are available:

x <- new_s3_scalar(class = c("b", "a"))
s3_dispatch(showoff(x))
#> -> showoff.b
#>  * showoff.a
#>  * showoff.default

When you call NextMethod() it finds and calls the next available method in the dispatch list. When we call showoff(), the method for b forwards to the method for a, which forwards to the default method.

showoff(x)
#> showoff.b
#> showoff.a
#> showoff.default
#> [1] TRUE

Like UseMethod(), the precise semantics of NextMethod() are complex. It doesn’t actually work with the class attribute of the object, but instead uses a special global variable (.Class) to keep track of which method to call next. This means that modifying the argument that is dispatched upon has no impact, and you should avoid modifying the object that is being dispatched on.

Generally, you call NextMethod() without any arguments. However, if you do give arguments, they are passed on to the next method, as if they’d been supplied to the generic.

13.4.3 Exercises

  1. Which base generic has the greatest number of defined methods?

  2. Explain what is happening in the following code.

    generic2 <- function(x) UseMethod("generic2")
    generic2.a1 <- function(x) "a1"
    generic2.a2 <- function(x) "a2"
    generic2.b <- function(x) {
      class(x) <- "a1"
      NextMethod()
    }
    
    generic2(new_s3_scalar(class = c("b", "a2")))
    #> [1] "a2"

13.5 Inheritance

The class attribute is not limited to a single string, but can be a character vector. This, along with S3 method dispatch and NextMethod(), gives a surprising amount of flexibility that can be used creatively to reduce code duplication. However, this flexibility can also lead to code that is hard to understand or reason about, so you are best constraining yourself to simple styles of inheritance. Here we will focus on defining subclasses that inherit their fields, and some behaviour, from a parent class.

Subclasses use a character vector for the class attribute. There are two examples of subclasses that you might have come across in base R:

  • Generalised linear models are a generalisation of linear models that allow the error term to belong to a richer set of distributions, not just the normal distribution like the linear model. This is a natural case for the use of inheritance and indeed, in R, glm() returns objects of class c("glm", "lm").

  • Ordered factors are used when the levels of a factor have some intrinsic ordering, like c("Good", "Better", "Best"). Ordered factors are produced by ordered() which returns an object with class c("ordered", "factor").

You can think of the glm class “inheriting” behaviour from the lm class, and the ordered class inheriting behaviour from the factor class because of the way method dispatch works. If there is a method available for the subclass, R will use it, otherwise it will fall back to the “parent” class. For example, if you “plot” a glm object, it falls back to the lm method, but if you compute the ANOVA, it uses a glm-specific method.

mod1 <- glm(mpg ~ wt, data = mtcars)

s3_dispatch(plot(mod1))
#>    plot.glm
#> -> plot.lm
#>  * plot.default
s3_dispatch(anova(mod1))
#> -> anova.glm
#>  * anova.lm
#>    anova.default

13.5.1 Constructors

There are three principles to adhere to when creating a subclass:

  • A subclass should be built on the same base type as a parent.

  • The class() of the subclass should be of the form c(subclass, parent_class)

  • The fields of the subclass should include the fields of the parent.

And these properties should be enforced by the constructor.

When you create a class, you need to decide if you want to allow subclasses, because it requires changes to the constructor and careful thought in your methods. To allow subclasses, the parent constructor needs to have ... and subclass arguments:

new_my_class <- function(x, y, ..., subclass = NULL) {
  stopifnot(is.numeric(x))
  stopifnot(is.logical(y))
  
  structure(
    x,
    y = y,
    ...,
    class = c(subclass, "my_class")
  )
}

Then the implementation of the subclass constructor is simple: it checks the types of the new fields, then calls the parent constructor.

new_subclass <- function(x, y, z) {
  stopifnot(is.character(z))
  new_my_class(x, y, z, subclass = "subclass")
}

If you wanted to allow this subclass to be futher subclassed, you’d need to include ... and subclass arguments:

new_subclass <- function(x, y, z, ..., subclass = NULL) {
  stopifnot(is.character(z))
  
  new_my_class(x, y, z, ..., subclass = c(subclass, "subclass"))
}

If your subclass is more complicated, you’d also provide validator and helper functions, as described previously.

13.5.2 Coercion

You also need to make sure that there’s some way to convert the subclass back to the parent class. The best way to do that is to add a method to the coercion generic. Generally, this method should call the parent constructor:

as_my_class.sub_class <- function(x) {
  new_my_class(attr(x, "x"), attr(x, "y"))
}

13.5.3 Methods

The goal of creating a subclass is to reuse as much code as possible from the parent class. This means that you should not have to define every method that the parent class provides (if you do, reconsider if you actually need a subclass!). Generally, defining new methods is straightforward: you simply create a new method (generic.subclass) whenever the parent method doesn’t do quite the right thing. In many cases, the new method will be able to call NextMethod() in order to take advantage of the computation done in the parent.

One wrinkle arises when you have methods that return the same type of object as the primary input. For example, dplyr has many functions (arrange(), summarise(), mutate(), …) that input a data frame (or data frame-like object) and output a modified version of that data frame. Imagine you want to store the provenance of each data frame, i.e. who created it and when. To do so, you might create a data frame subclass called provenance:

new_provenance <- function(data, author, date = Sys.Date()) {
  stopifnot(is.data.frame(data))
  stopifnot(is.character(author), length(author) == 1)
  stopifnot(is.Date(date), length(date) == 1)
  
  structure(
    data,
    author = author, 
    date = date,
    class = c("provenance", "data.frame")
  )
}

And now you want to make this class work with dplyr. The class doesn’t change any of the computation related to the data frame, it just needs to preserve the attributes, which dplyr doesn’t know anything about. That means you need to provide a method for each dplyr generic. The computation is unchanged, so you can use NextMethod() to do all the hard work, but you need to manually reconstruct the provenance object.

arrange.provenance <- function(.data, ...) {
  new_provenance(
    NextMethod(), 
    author = attr(.data, "author"),
    date = attr(.data, "date")
  )
}

mutate.provenance <- function(.data, ...) {
  new_provenance(
    NextMethod(), 
    author = attr(.data, "author"),
    date = attr(.data, "date")
  )
}

To do this for all the dplyr generics would require a lot of copying and pasting. Let’s reduce some of that duplication by taking advantage of sloop::reconstruct(). reconstruct() is a generic function designed to reconstruct a subclass from an instance of the parent class, typically created by NextMethod(), and the original subclass. In other words, the job of a reconstructor is to take an object from a parent class, and copy over attributes from the subclass. (Note that reconstruct() is unusual in that it dispatches on the second argument. This allows a more natural specification.)

reconstruct.provenance  <- function(new, old) {
  new_provenance(
    new, 
    author = attr(old, "author"),
    date = attr(old, "date")
  )
}

Now we can rewrite the methods to minimise the amount of duplicated code:

arrange.provenance <- function(.data, ...) {
  reconstruct(NextMethod(), .data)
}

mutate.provenance <- function(.data, ...) {
  reconstruct(NextMethod(), .data)
}

This duplicated code could be avoided completely if arrange.data.frame(), provided by dplyr, called reconstruct() for you. And indeed, a future version of that function will.

When designing a class that can be subclassed, you need to carefully think through these issues. Generally, whenever you implement a method that returns the same type of object as the primary input, you should call reconstruct() to ensure that it also works for subclasses. That way implementors of a subclass will only need to provide methods when the computation is actually different.

13.5.4 Exercises

  1. The ordered class is a subclass of factor, but it’s implemented in a very ad hoc way in base R. Implement it in a principled way by building a constructor and an as_ordered generic.

    f1 <- factor("a", c("a", "b"))
    as.factor(f1)  
    #> [1] a
    #> Levels: a b
    as.ordered(f1) # loses levels
    #> [1] a
    #> Levels: a
  2. What classes have a method for the Math group generic in base R? Read the source code. How do the methods work?

  3. R has two classes for representing date time data, POSIXct and POSIXlt, which both inherit from POSIXt. Which generics have different behaviours for the two classes? Which generics share the same behaviour?

13.6 Dispatch details

This chapter concludes with a few additional details about method dispatch that is not well documented elsewhere. It is safe to skip these details if you’re new to S3.

13.6.1 Environments and namespaces

The precise rules for where a generic looks for the methods are a little complicated because there are two paths for discovery:

  1. In the calling environment of the function that called the generic.

  2. In the special .__S3MethodsTable__. object in the function environment of the generic. Every package has an .__S3MethodsTable__ which lists all the S3 methods exported by the package.

These details are not usually important, but are necessary in order for S3 generics to find the correct method when the generic and method are in different packages.

13.6.2 S3 and base types

What happens when you call an S3 generic with a non-S3 object, i.e. an object that doesn’t have the class attribute set? You might think it would dispatch on what class() returns:

class(matrix(1:5))
#> [1] "matrix"

But unfortunately dispatch actually occurs on the implicit class, which has three components:

  • “array” or “matrix” (if the object has dimensions).
  • typeof() (with a few minor tweaks).
  • If it’s “integer” or “double”, “numeric”.

There is no base function that will compute the implicit class, but you can use a helper from the sloop package:

s3_class(matrix(1:5))
#> [1] "matrix"  "integer" "numeric"

s3_dispatch() knows about the implicit class, so use it if you’re ever in doubt about method dispatch:

s3_dispatch(print(matrix(1:5)))
#>    print.matrix
#>    print.integer
#>    print.numeric
#> -> print.default

Note that this can lead to different dispatch for objects that look similar:

x1 <- 1:5
class(x1)
#> [1] "integer"
s3_dispatch(mean(x1))
#>    mean.integer
#>    mean.numeric
#> -> mean.default

x2 <- structure(x1, class = "integer")
class(x2)
#> [1] "integer"
s3_dispatch(mean(x2))
#>    mean.integer
#> -> mean.default

13.6.3 Internal generics

Some S3 generics, like [, sum(), and cbind(), don’t call UseMethod() because they are implemented in C. Instead, they call the C functions DispatchGroup() or DispatchOrEval(). These functions are called internal generics, because they do dispatch internally, in C code. Internal generics only exist in base R, so you can not create an internal generic in a package.

s3_dispatch() shows internal generics by including the name of the generic at the bottom of the method class. If this method is called, all the work happens in C code, typically using [switchpatch].

s3_dispatch(Sys.time()[1])
#> -> [.POSIXct
#>    [.POSIXt
#>    [.default
#>  * [

For performance reasons, internal generics do not dispatch to methods unless the class attribute has been set (is.object() is true). This means that internal generics do not use the implicit class. Again, if you’re confused, rely on s3_dispatch() to show you the difference.

x <- sample(10)
class(x)
#> [1] "integer"
s3_dispatch(x[1])
#>    [.integer
#>    [.numeric
#>    [.default
#> -> [

class(y) 
#> [1] "numeric"
s3_dispatch(mtcars[1])
#> -> [.data.frame
#>    [.default
#>  * [

13.6.4 Group generics

Group generics are the most complicated part of S3 method dispatch because they involve both NextMethod() and internal generics. Group generics are worth learning about, however, because they allow you to implement a whole swath of methods with one function. Like internal generics, they only exist in base R, and you can not define your own group generic.

Base R has four group generics, which are made up of the following generics:

  • Math: abs, sign, sqrt, floor, cos, sin, log, exp, …

  • Ops: +, -, *, /, ^, %%, %/%, &, |, !, ==, !=, <, <=, >=, >

  • Summary: all, any, sum, prod, min, max, range

  • Complex: Arg, Conj, Im, Mod, Re

Defining a single group generic for your class overrides the default behaviour for all of the members of the group. Methods for group generics are looked for only if the methods for the specific generic do not exist:

s3_dispatch(sum(Sys.time()))
#>    sum.POSIXct
#>    sum.POSIXt
#>    sum.default
#> -> Summary.POSIXct
#>    Summary.POSIXt
#>    Summary.default
#>  * sum

Most group generics involve a call to NextMethod(). For example, take difftime() objects. If you look at the method dispatch for abs(), you’ll see there’s a Math group generic defined.

y <- as.difftime(10, units = "mins")
s3_dispatch(abs(y))
#>    abs.difftime
#>    abs.default
#> -> Math.difftime
#>    Math.default
#>  * abs

Math.difftime basically looks like this:

Math.difftime <- function(x, ...) {
  new_difftime(NextMethod(), units = attr(x, "units"))
}

It dispatches to the next method, here the internal default, to perform the actual computation, then copies back over the the class and attributes.

Note that inside a group generic function a special variable .Generic provides the actual generic function called. This can be useful when producing error messages, and can sometimes be useful if you need to manually re-call the generic with different arguments.

13.6.5 Double dispatch

Generics in the “Ops” group, which includes the two-argument mathematical and logical operators like - and &, implement a special type of method dispatch. They dispatch on the type of both of the arguments, so called double dispatch. This is necessary to preserve the commutative property of many operators, i.e. a + b should equal b + a. Take the following simple example:

date <- as.Date("2017-01-01")
integer <- 1L

date + integer
#> [1] "2017-01-02"
integer + date
#> [1] "2017-01-02"

If + dispatched only on the first argument, it would return different values for the two cases. To overcome this problem, generics in the Ops group use a slightly different strategy from usual. Rather than doing a single method dispatch, they do two, one for each input. There are three possible outcomes of this lookup:

  • The methods are the same, so it doesn’t matter which method is used.

  • The methods are different, and R calls the first method with a warning.

  • One method is internal, in which case R calls the other method.

For the example above, we can look at the possible methods for each argument, taking advantage of the fact that we can call + with a single argument. In this case, the second argument would dispatch to the internal + function, so R will call +.Date.

s3_dispatch(+date)
#> -> +.Date
#>    +.default
#>  * Ops.Date
#>    Ops.default
#>  * +
s3_dispatch(+integer)
#>    +.integer
#>    +.numeric
#>    +.default
#>    Ops.integer
#>    Ops.numeric
#>    Ops.default
#> -> +

Let’s take a look at another case. What happens if you try and add a date to a factor? There is no method in common, so R calls the internal + method (which preserves the attributes of the LHS), with a warning.

factor <- factor("a")
s3_dispatch(+factor)
#>    +.factor
#>    +.default
#> -> Ops.factor
#>    Ops.default
#>  * +

date + factor
#> Warning: Incompatible methods ("+.Date", "Ops.factor") for "+"
#> [1] "2017-01-02"
factor + date
#> Warning: Incompatible methods ("Ops.factor", "+.Date") for "+"
#> Error in as.character.factor(x): malformed factor

Finally, what happens if we try to substract a POSIXct from a POSIXlt? A common -.POSIXt method is found and called.

dt1 <- as.POSIXct(date)
dt2 <- as.POSIXlt(date)

s3_dispatch(-dt1)
#>    -.POSIXct
#> -> -.POSIXt
#>    -.default
#>    Ops.POSIXct
#>  * Ops.POSIXt
#>    Ops.default
#>  * -
s3_dispatch(-dt2)
#>    -.POSIXlt
#> -> -.POSIXt
#>    -.default
#>    Ops.POSIXlt
#>  * Ops.POSIXt
#>    Ops.default
#>  * -

dt1 - dt2
#> Time difference of 0 secs

13.6.6 Exercises

  1. Math.difftime() is more complicated than I described. Why?