Introduction
In the following five chapters you’ll learn about object-oriented programming (OOP). OOP is a little more challenging in R than in other languages because:
There are multiple OOP systems to choose from. In this book, I’ll focus on the three that I believe are most important: S3, R6, and S4. S3 and S4 are provided by base R. R6 is provided by the R6 package, and is similar to the Reference Classes, or RC for short, from base R.
There is disagreement about the relative importance of the OOP systems. I think S3 is most important, followed by R6, then S4. Others believe that S4 is most important, followed by RC, and that S3 should be avoided. This means that different R communities use different systems.
S3 and S4 use generic function OOP which is rather different from the encapsulated OOP used by most languages popular today60. We’ll come back to precisely what those terms mean shortly, but basically, while the underlying ideas of OOP are the same across languages, their expressions are rather different. This means that you can’t immediately transfer your existing OOP skills to R.
Generally in R, functional programming is much more important than object-oriented programming, because you typically solve complex problems by decomposing them into simple functions, not simple objects. Nevertheless, there are important reasons to learn each of the three systems:
S3 allows your functions to return rich results with user-friendly display and programmer-friendly internals. S3 is used throughout base R, so it’s important to master if you want to extend base R functions to work with new types of input.
R6 provides a standardised way to escape R’s copy-on-modify semantics. This is particularly important if you want to model objects that exist independently of R. Today, a common need for R6 is to model data that comes from a web API, and where changes come from inside or outside of R.
S4 is a rigorous system that forces you to think carefully about program design. It’s particularly well-suited for building large systems that evolve over time and will receive contributions from many programmers. This is why it is used by the Bioconductor project, so another reason to learn S4 is to equip you to contribute to that project.
The goal of this brief introductory chapter is to give you some important vocabulary and some tools to identify OOP systems in the wild. The following chapters then dive into the details of R’s OOP systems:
Chapter 12 details the base types which form the foundation underlying all other OO system.
Chapter 13 introduces S3, the simplest and most commonly used OO system.
Chapter 14 discusses R6, an encapsulated OO system built on top of environments.
Chapter 15 introduces S4, which is similar to S3 but more formal and more strict.
Chapter 16 compares these three main OO systems. By understanding the trade-offs of each system you can appreciate when to use one or the other.
This book focusses on the mechanics of OOP, not its effective use, and it may be challenging to fully understand if you have not done object-oriented programming before. You might wonder why I chose not to provide more immediately useful coverage. I have focused on mechanics here because they need to be well described somewhere (writing these chapters required a considerable amount of reading, exploration, and synthesis on my behalf), and using OOP effectively is sufficiently complex to require a book-length treatment; there’s simply not enough room in Advanced R to cover it in the depth required.
OOP systems
Different people use OOP terms in different ways, so this section provides a quick overview of important vocabulary. The explanations are necessarily compressed, but we will come back to these ideas multiple times.
The main reason to use OOP is polymorphism (literally: many shapes). Polymorphism means that a developer can consider a function’s interface separately from its implementation, making it possible to use the same function form for different types of input. This is closely related to the idea of encapsulation: the user doesn’t need to worry about details of an object because they are encapsulated behind a standard interface.
To be concrete, polymorphism is what allows summary()
to produce different outputs for numeric and factor variables:
diamonds <- ggplot2::diamonds
summary(diamonds$carat)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.20 0.40 0.70 0.80 1.04 5.01
summary(diamonds$cut)
#> Fair Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
You could imagine summary()
containing a series of if-else statements, but that would mean only the original author could add new implementations. An OOP system makes it possible for any developer to extend the interface with implementations for new types of input.
To be more precise, OO systems call the type of an object its class, and an implementation for a specific class is called a method. Roughly speaking, a class defines what an object is and methods describe what that object can do. The class defines the fields, the data possessed by every instance of that class. Classes are organised in a hierarchy so that if a method does not exist for one class, its parent’s method is used, and the child is said to inherit behaviour. For example, in R, an ordered factor inherits from a regular factor, and a generalised linear model inherits from a linear model. The process of finding the correct method given a class is called method dispatch.
There are two main paradigms of object-oriented programming which differ in how methods and classes are related. In this book, we’ll borrow the terminology of Extending R61 and call these paradigms encapsulated and functional:
In encapsulated OOP, methods belong to objects or classes, and method calls typically look like
object.method(arg1, arg2)
. This is called encapsulated because the object encapsulates both data (with fields) and behaviour (with methods), and is the paradigm found in most popular languages.In functional OOP, methods belong to generic functions, and method calls look like ordinary function calls:
generic(object, arg2, arg3)
. This is called functional because from the outside it looks like a regular function call, and internally the components are also functions.
With this terminology in hand, we can now talk precisely about the different OO systems available in R.
OOP in R
Base R provides three OOP systems: S3, S4, and reference classes (RC):
S3 is R’s first OOP system, and is described in Statistical Models in S.62 S3 is an informal implementation of functional OOP and relies on common conventions rather than ironclad guarantees. This makes it easy to get started with, providing a low cost way of solving many simple problems.
-
S4 is a formal and rigorous rewrite of S3, and was introduced in Programming with Data.63 It requires more upfront work than S3, but in return provides more guarantees and greater encapsulation. S4 is implemented in the base methods package, which is always installed with R.
(You might wonder if S1 and S2 exist. They don’t: S3 and S4 were named according to the versions of S that they accompanied. The first two versions of S didn’t have any OOP framework.)
RC implements encapsulated OO. RC objects are a special type of S4 objects that are also mutable, i.e., instead of using R’s usual copy-on-modify semantics, they can be modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve in the functional OOP style of S3 and S4.
A number of other OOP systems are provided by CRAN packages:
R664 implements encapsulated OOP like RC, but resolves some important issues. In this book, you’ll learn about R6 instead of RC, for reasons described in Section 14.5.
R.oo65 provides some formalism on top of S3, and makes it possible to have mutable S3 objects.
proto66 implements another style of OOP based on the idea of prototypes, which blur the distinctions between classes and instances of classes (objects). I was briefly enamoured with prototype based programming67 and used it in ggplot2, but now think it’s better to stick with the standard forms.
Apart from R6, which is widely used, these systems are primarily of theoretical interest. They do have their strengths, but few R users know and understand them, so it is hard for others to read and contribute to your code.
sloop
Before we go on I want to introduce the sloop package:
The sloop package (think “sail the seas of OOP”) provides a number of helpers that fill in missing pieces in base R. The first of these is sloop::otype()
. It makes it easy to figure out the OOP system used by a wild-caught object:
otype(1:10)
#> [1] "base"
otype(mtcars)
#> [1] "S3"
mle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2)
otype(mle_obj)
#> [1] "S4"
Use this function to figure out which chapter to read to understand how to work with an existing object.