12 S3

12.1 Introduction

S3 is R’s first and simplest OO system. S3 is informal and ad hoc, but there is a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise. S3 is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.

S3 is very flexible, which means it allows you to do things that are quite ill-advised. If you’re coming from a strict environment like Java this will seem pretty frightening, but it gives R programmers a tremendous amount of freedom. It may be very difficult to prevent someone from doing something you don’t want them to do, but your users will never be held back because there is something you haven’t implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will therefore teach you the conventions you should (almost) always adhere to.

The goal of this chapter is to show you how the S3 system works, not how to use it effectively to create new classes and generics. I’d recommend coupling the theoretical knowledge from this chapter with the practical knowledge encoded in the vctrs package.

Outline

  • Section 12.2 gives a rapid overview of all the main components of S3: classes, generics, and methods. You’ll also learn about sloop::s3_dispatch(), which we’ll use throughout the chapter to explore how S3 works.

  • Section 12.3 goes into the details of creating a new S3 class, including the three functions that should accompany most classes: a constructor, a helper, and a validator.

  • Section 12.4 describes how S3 generics and methods work, including the basics of method dispatch.

  • Section 12.5 discusses the four main styles of S3 objects: vector, record, data frame, and scalar.

  • Section 12.6 demonstrates how inheritance works in S3, and shows you what you need to make a class “subclassable”.

  • Section 12.7 concludes the chapter with a discussion of the finer details of method dispatch including base types, internal generics, group generics, and double dispatch.

Prerequisites

S3 classes are implemented using attributes, so make sure you’re familiar with the details described in Section 3.3. We’ll use existing base S3 vectors for examples and exploration, so make sure that you’re familiar with the factor, Date, difftime, POSIXct, and POSIXlt classes described in Section 3.4.

We’ll use the sloop package for interactive helpers, and the vctrs package for some niceties for creating new S3 classes.

12.2 Basics

An S3 object is a base type with at least a “class” attribute (other attributes may be used to store other data). For example, take the factor. Its base type is the integer vector, it has a class attribute of “factor”, and a levels attribute that stores the possible levels:

You can get the “underlying” base type by unclass()ing it, which strips the class attribute, causing it to lose its special behaviour:

An S3 object behaves differently from its underlying base type whenever it’s passed to a generic (short for generic function). The easiest way to tell if a function is a generic is to use sloop::ftype() and look for “generic” in the output:

A generic function defines an interface, which uses a different implementation depending on the class of an argument (almost always the first argument). Many base R functions are generic, including the important print():

Beware that str() is generic, and some S3 classes use that generic to hide the internal details. For example, the POSIXlt class used to represent date-time data is actually built on top of a list, a fact which is hidden by its str() method:

The generic is a middleman: its job is to define the interface (i.e. the arguments) then find the right implementation for the job. The implementation for a specific class is called a method, and the generic finds that method by performing method dispatch.

You can use sloop::s3_dispatch() to see the process of method dispatch:

We’ll come back to the details of dispatch in Section 12.4.1, for now note that S3 methods are functions with a special naming scheme, generic.class(). For example, the factor method for the print() generic is called print.factor(). You should never call the method directly, but instead rely on the generic to find it for you.

Generally, you can identify a method by the presence of . in the function name, but there are a number of important functions in base R that were written before S3, and hence use . to join words. If you’re unsure, check with sloop::ftype():

Unlike most functions, you can’t see the source code for most S3 methods41 just by typing their names. That’s because S3 methods are not usually exported: they live only inside the package, and are not available from the global environment. Instead, you can use sloop::s3_get_method(), which will work regardless of where the method lives:

12.2.1 Exercises

  1. Describe the difference between t.test() and t.data.frame()? When is each function called?

  2. Make a list of commonly used base R functions that contain . in their name but are not S3 methods.

  3. What does the as.data.frame.data.frame() method do? Why is it confusing? How could you avoid this confusion in your own code?

  4. Describe the difference in behaviour in these two calls.

  5. What class of object does the following code return? What base type is it built on? What attributes does it use?

  6. What class of object does the following code return? What base type is it built on? What attributes does it use?

12.3 Classes

If you have done object oriented programming in other languages, you may be surprised to learn that S3 has no formal definition of a class: to make an object an instance of a class, you simply set the class attribute. You can do that during creation with structure(), or after the fact with class<-():

You can determine the class of an S3 object with class(x), and see if an object is an instance of a class using inherits(x, "classname").

The class name can be any string, but I recommend using only letters and _. Avoid . because (as mentioned earlier) it can be confused with the . separator between a generic name and a class name. When using a class in a package, I recommend including the package name in the class name. That ensures you won’t accidental clash with a class defined by another package.

S3 has no checks for correctness which means you can change the class of existing objects:

If you’ve used other OO languages, this might make you feel queasy, but in practice this flexibility causes few problems. R doesn’t stop you from shooting yourself in the foot, but as long as you don’t aim the gun at your toes and pull the trigger, you won’t have a problem.

To avoid foot-bullet intersections when creating your own class, I recommend that you usually provide three functions:

  • A low-level constructor, new_myclass(), that efficiently creates new objects with the correct structure.

  • A validator, validate_myclass(), that performs more expensive checks to ensure that the object has correct values.

  • A user-friendly helper, myclass(), that provides a convenient way for others to create objects of your class.

You don’t need a validator for very simple classes, and you can skip the helper if the class is for internal use only, but you should always provide a constructor.

12.3.1 Constructors

S3 doesn’t provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. the same base type and the same attributes with the same types). Instead, you must enforce a consistent structure yourself by using a constructor.

The constructor should follow three principles:

  • Be called new_myclass().

  • Have one argument for the base object, and one for each attribute.

  • Check the type of the base object and the types of each attribute.

I’ll illustrate these ideas by creating constructors for base classes42 that you’re already familiar with. To start, lets make a constructor for the simplest S3 class: Date. A Date is just a double with a “Date” class attribute, and no additional attributes. This makes for a very simple constructor:

The purpose of constructors is to help you, the developer. That means you can keep them simple, and you don’t need to optimise error messages for public consumption. If you expect users to also create objects, you should create a friendly helper function, called class_name(), which I’ll describe shortly.

A slightly more complicated constructor is that for difftime, which is used to represent time differences. It is again built on a double, but has a units attribute that must take one of a small set of values:

The constructor is a developer function: it will be called in many places, by an experienced user. That means it’s ok to trade a little safety in return for performance, and you should avoid potentially time-consuming checks in the constructor.

12.3.2 Validators

More complicated classes require more complicated checks for validity. Take factors, for example. A constructor only checks that types are correct, making it possible to create malformed factors:

Rather than encumbering the constructor with complicated checks, it’s better to put them in a separate function. Doing so allows you to cheaply create new objects when you know that the values are correct, and easily re-use the checks in other places.

This validator function is called primarily for its side-effects (throwing an error if the object is invalid) so you’d expect it to invisibly return its primary input (as described in Section (invisible)). However, it’s useful for validation methods to return visibly, as we’ll see next.

12.3.3 Helpers

If you want users to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. A helper should always:

  • Have the same name as the class, e.g. myclass().

  • Finish by calling the constructor, and the validator, if it exists.

  • Create carefully crafted error messages tailored towards an end-user.

  • Have a thoughtfully crafted user interface with carefully chosen default values and useful conversions.

The last bullet is the trickiest, and it’s hard to give general advice. However, there are three common patterns:

For more complicated classes, you should feel free to go beyond these patterns to make life as easy as possible for your users.

12.3.4 Exercises

  1. Write a constructor for data.frame objects. What base type is a data frame built on? What attributes does it use? What are the restrictions placed on the individual elements? What about the names?

  2. Enhance my factor() helper to have better behaviour when one or more values is not found in levels. What does base::factor() do in this situation?

  3. Carefully read the source code of factor(). What does it do that my constructor does not?

  4. Factors have an optional “contrasts” attribute. Read the help for C(), and briefly describe the purpose of the attribute. What type should it have? Rewrite the new_factor() constructor to include this attribute.

  5. Read the documentation for utils::as.roman(). How would you write a constructor for this class? Does it need a validator? What might a helper do?

12.4 Generics and methods

The job of an S3 generic is to perform method dispatch, i.e. find the specific implementation for a class. Method dispatch is performed by UseMethod(), which every generic calls44. UseMethod() takes two arguments: the name of the generic function (required), and the argument to use for method dispatch (optional). If you omit the second argument, it will dispatch based on the first argument, which is almost always what is desired.

Most generics are very simple, and consist of only a call to UseMethod(). Take mean() for example:

Creating your own generic is similarly simple:

(If you wonder why we have to repeat my_new_generic twice, think back to Section 5.2.1.)

Note that you don’t pass any of the arguments of the generic to UseMethod(); it uses deep magic to pass to the method automatically. The precise process is complicated and frequently surprising, so you should avoid doing any computation in a generic. To learn the full details, carefully read the “technical details” section in ?UseMethod.

12.4.1 Method dispatch

How does UseMethod() work? It basically creates a vector of method names, paste0("generic", ".", c(class(x), "default")), and then looks for each potential method in turn. We can see this in action with sloop::s3_dispatch(). You give it a call to an S3 generic, and it lists all the possible methods. For example, what method is called when you print a Date object?

The output here is simple:

  • => indicates the method that is called, here print.Date()
  • * indicates a method that is defined, but not called, here print.default().

The “default” class is a special pseudo-class. This is not a real class, but is included to make it possible to define a standard fallback that is found whenever a class-specific method is not available.

The essence of method dispatch is quite simple, but as the chapter proceeds you’ll see it get progressively more complicated to encompass inheritance, base types, internal generics, and group generics. The code below shows a couple of more complicated cases which we’ll come back to in Sections 13.2.4 and 12.7.

12.4.3 Creating methods

There are two wrinkles to be aware of when you create a new method:

  • First, you should only every write a method if you own the generic or the class. R will allow you to define a method even if you don’t, but it is exceedingly bad manners. Instead, work with the author of either the generic or the class to add the method in their code.

  • A method must have the same arguments as its generic. This is enforced in packages by R CMD check, but it’s good practice even if you’re not creating a package.

    There is one exception to this rule: if the generic has ..., the method can contain a superset of the arguments. This allows methods to take arbitrary additional arguments. The downside of using ..., however, is that any misspelled arguments will be silently swallowed45, as mentioned in Section 5.6.

12.4.4 Exercises

  1. Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it? Why?

  2. What generics does the table class have methods for?

  3. What generics does the ecdf class have methods for?

  4. Which base generic has the greatest number of defined methods?

  5. Carefully read the documentation for UseMethod() and explain why the following code returns the results that it does. What two usual rules of function evaluation does UseMethod() violate?

  6. What are the arguments to [? Why is this a hard question to answer?

12.5 Object styles

So far I’ve focussed on “vector style” classes like Date and factor. These have the key property that length(x) represents the number of observations in the vector. There are three variants that do not have this property:

Unfortunately, describing the appropriate use of each of these object styles is beyond the scope of this book. However, you can learn more from the documentation of the vctrs package (https://vctrs.r-lib.org); the package also provides constructors and helper that make implementation of the different styles easiser.

12.5.1 Exercises

  1. Categorise the objects returned by lm(), factor(), table(), as.Date(), as.POSIXct() ecdf(), ordered(), I() into the styles described above.

  2. What would a constructor function for lm objects, new_lm(), look like? Why is a constructor function less useful for linear models? (Think about what functions would call new_lm().)

12.6 Inheritance

S3 classes can share behaviour through a mechanism called inheritance. Inheritance is powered by three ideas:

Before we continue we need a bit of vocabulary to describe the relationship between the classes that appear together in a class vector. We’ll say that ordered is a subclass of factor because it always appears before it in the class vector, and, conversely, we’ll say factor is a superclass of ordered.

S3 imposes no restrictions on the relationship between sub- and superclasses but your life will be easier if you impose some yourself. I recommend that you adhere to two simple principles when creating a subclass:

  • The base type of the subclass should be that same as the superclass.

  • The attributes of the subclass should be a superset of the attributes of the superclass.

Note that POSIXt does not adhere to these principles becase POSIXct has type double, and POSIXlt has type list. This means that POSIXt is not a superclass, and illustrates that it’s quite possible to use the S3 inheritance system to implement other styles of code sharing (here POSIXt plays a role more like an interface), but you’ll need to figure out safe conventions yourself.

12.6.1 NextMethod()

NextMethod() is the hardest part of inheritance to understand, so we’ll start with a concrete example for the most common use case: [. We’ll start by creating a simple toy class: a secret class that hides its output when printed:

This works, but the default [ method doesn’t preserve the class:

To fix this, we need to provide a [.secret method. How could we implement this method? The naive approach won’t work because we’ll get stuck in an infinite loop:

Instead, we need some way to call the the underlying [ code, i.e. the implementation that would get called if we didn’t have a [.secret method. One appraoch would be to unclass() the object:

This works, but is inefficient because it creates a copy of x. A better approach is to use NextMethod(), which concisely solves the problem delegating to the method that would’ve have been called if [.secret didn’t exist:

We can see what’s going on with sloop::s3_dispatch():

The => indicates that [.secret is called, but that NextMethod() delegates work to the underlying internal [ method, as shown by the ->.

As with UseMethod(), the precise semantics of NextMethod() are complex. In particular, it tracks the list of potential next methods with a special variable, which means that modifying the object that’s being dispatched upon will have no impact on which method gets called next.

12.6.2 Allowing subclassing

When you create a class, you need to decide if you want to allow subclasses, because it requires some changes to the constructor and careful thought in your methods.

To allow subclasses, the parent constructor needs to have ... and class arguments:

Then the subclass constructor can just call to the parent class constructor with additional arguments as needed. For example, imagine we want to create a supersecret class which also hides the number of characters:

To allow inheritance, you also need to think carefully about your methods, as you can no longer use the constructor. If you do, the method will always return the same class, regardless of the input. This forces whoever makes a subclass to do a lot of extra work.

Concretely, this means we need to revise the [.secret method. Currently it always returns a secret(), even when given a supersecret:

We want to make sure that [.secret returns the same class as x even if it’s a subclass. As far as I can tell, there is no way to solve this problem using base R alone. Instead, you’ll need to use the vctrs package, which provides a solution in the form of the vctrs::vec_restore() generic. This generic takes two inputs: a object which has lost subclass information, and a template object to use for restoration.

Typically vec_restore() methods are quite simple: you just call the constructor with appropriate arguments:

(If your class has attributes, you’ll need to pass them from to into the constructor.)

Now we can use vec_restore() in the [.secret method:

(I only fully understood this issue quite recently, so at time of writing it is not used in the tidyverse. Hopefully by the time you’re reading this, it will have rolled put, making it much easier to (e.g.) subclass tibbles.)

If you build your class using the tools provided by the vctrs package, [ will gain this behaviour automatically. You will only need to provide your own [ method if you attributes that depend on the data or want non-standard subsetting behaviour. See ?vctrs::new_vctr for details.

12.6.3 Exercises

  1. How does [.Date support subclasses? How does it fail to support subclasses?

  2. R has two classes for representing date time data, POSIXct and POSIXlt, which both inherit from POSIXt. Which generics have different behaviours for the two classes? Which generics share the same behaviour?

  3. What do you expect this code to return? What does it actually return? Why?

12.7 Dispatch details

This chapter concludes with a few additional details about method dispatch. It is safe to skip these details if you’re new to S3.

12.7.1 S3 and base types

What happens when you call an S3 generic with a base object, i.e. an object with no class? You might think it would dispatch on what class() returns:

But unfortunately dispatch actually occurs on the implicit class, which has three components:

  • “array” or “matrix” (if the object has dimensions).
  • typeof() (with a few minor tweaks).
  • If it’s “integer” or “double”, “numeric”.

There is no base function that will compute the implicit class, but you can use sloop::s3_class()

This is used by s3_dispatch():

Note that this means that the class() of an object does not uniquely determine its dispatch:

12.7.2 Internal generics

Some base functions, like [, sum(), and cbind(), are called internal generics because they don’t call UseMethod() but instead call the C functions DispatchGroup() or DispatchOrEval(). s3_dispatch() shows internal generics by including the name of the generic followed by (internal):

For performance reasons, internal generics do not dispatch to methods unless the class attribute has been set, which means that internal generics do not use the implicit class. Again, if you’re ever confused about method dispatch, you can rely on s3_dispatch().

12.7.3 Group generics

Group generics are the most complicated part of S3 method dispatch because they involve both NextMethod() and internal generics. Like internal generics, they only exist in base R, and you cannot define your own group generic.

There are four group generics:

  • Math: abs(), sign(), sqrt(), floor(), cos(), sin(), log(), and more (see ?Math for the complete list).

  • Ops: +, -, *, /, ^, %%, %/%, &, |, !, ==, !=, <, <=, >=, and >.

  • Summary: all(), any(), sum(), prod(), min(), max(), and range().

  • Complex: Arg(), Conj(), Im(), Mod(), Re().

Defining a single group generic for your class overrides the default behaviour for all of the members of the group. Methods for group generics are looked for only if the methods for the specific generic do not exist:

Most group generics involve a call to NextMethod(). For example, take difftime() objects. If you look at the method dispatch for abs(), you’ll see there’s a Math group generic defined.

Math.difftime basically looks like this:

It dispatches to the next method, here the internal default, to perform the actual computation, then restore the class and attributes. (To better support subclasses of difftime this would need to call vec_restore(), as described in Section 12.6.2.)

Note that inside a group generic function a special variable .Generic provides the actual generic function called. This can be useful when producing error messages, and can sometimes be useful if you need to manually re-call the generic with different arguments.

12.7.4 Double dispatch

Generics in the “Ops” group, which includes the two-argument arithmetic and boolean operators like - and &, implement a special type of method dispatch. They dispatch on the type of both of the arguments, which is called double dispatch. This is necessary to preserve the commutative property of many operators, i.e. a + b should equal b + a. Take the following simple example:

If + dispatched only on the first argument, it would return different values for the two cases. To overcome this problem, generics in the Ops group use a slightly different strategy from usual. Rather than doing a single method dispatch, they do two, one for each input. There are three possible outcomes of this lookup:

  • The methods are the same, so it doesn’t matter which method is used.

  • The methods are different, and R falls back to the internal method with a warning.

  • One method is internal, in which case R calls the other method.

This approach is error prone so if you want to implement robust double dispatch for algebraic operators, I recommend using the vctrs package. See ?vctrs::vec_arith for details.

12.7.5 Exercises

  1. Explain the differences in dispatch below:

  2. What classes have a method for the Math group generic in base R? Read the source code. How do the methods work?

  3. Math.difftime() is more complicated than I described. Why?


  1. The exceptions are methods found in the base package, like t.data.frame, and methods that you’ve created.

  2. Recent versions of R have .Date(), .difftime(), .POSIXct(), and .POSIXlt() constructors but they are internal, not well documented, and do not follow the principles that I recommend.

  3. Note that this helper is not efficient: behind the scenes ISODatetime() works by pasting the components into a string and then using strptime(). A more efficient equivalent is available in lubridate::make_datetime().

  4. The exception is internal generics, which are implemented in C, and are the topic of Section 12.7.2.

  5. See https://github.com/hadley/ellipsis for an experimental way of warning when methods fail to use all the argument in ..., providing a potential resolution of this issue.

  6. You can also build an object on top of a pairlist, but I have yet to find a good reason to do so.