Preface

Welcome to the 2nd edition of Advanced R. I had three main goals for this edition:

  • Improve coverage of important concepts that I fully understood only after the publication of the 1st edition.

  • Reduce coverage of topics time has shown to be less useful, or that I think are really exciting but turn out not to be that practical.

  • Generally make the material easier to understand with better text, clearer code, and many more diagrams.

If you’re familiar with the 1st edition, this preface describes the major changes so that you can focus your reading on the new areas (but I really do encourage you to skim throughout the book).

If you’re reading a printed version of this book you’ll notice one big change very quickly: Advanced R is now in colour! This has considerably improved the syntax highlighting of code chunks, and made it much easier to create helpful diagrams. I have taken advantage of this and included over 100 new diagrams throughout the book.

Another big change in this version is the use of packages, particularly rlang, which provides a clean interface to low-level data structures and operations. The 1st edition used base R functions almost exclusively, which created some pedagogical challenges because many functions evolved independently over multiple years, making it hard to see the big underlying ideas hidden amongst the incidental variations in function names and arguments. I continue to show base equivalents in sidebars, footnotes, and where needed, in individual sections, but if you want to see the purest base R expression of the ideas in this book, I recommend reading the 1st edition, which you can find online at http://adv-r.had.co.nz.

The foundations of R have not changed in the five years since the 1st edition, but my understanding of them certainly has. Thus, the overall structure of “Foundations” has remained roughly the same, but many of the individual chapters have been considerably improved:

  • Chapter 2, “Names and values”, is a brand new chapter that helps you understand the difference between objects and names of objects. This helps you more accurately predict when R will make a copy of a data structure, and lays important groundwork to understand functional programming.

  • Chapter 3, “Vectors” (previously called data structures), and has been rewritten to focus on vector types like integers, factors, and data frames. It contains more details of important S3 vectors (like dates and date-times), and discusses the data frame variation provided by the tibble package.

  • Chapter 4, “Subsetting”, now distinguishes between [ and [[ by their intention: [ extracts many values and [[ extracts a single value (previously they were characterised by whether they “simplified” or “preserved”). Section 4.3 draws the “train” to help you understand how [[ works with lists, and introduces the new purrr::pluck() and purrr::chuck() functions that provide more consistent behaviour for out-of-bounds indices.

  • Chapter 5, “Functions”, has an improved ordering, introduces the pipe (%>%) as a third way to compose functions (Section 5.3), and has considerably improved coverage of function forms (Section 5.8).

  • Chapter 6, “Environments”, has a reorganised treatment of special environments (Section 6.4), and a much improved discussion of the call stack (Section 6.5).

  • Chapter 7, “Conditions”, contains material previously in “Exceptions and debugging”, and much new content on how R’s condition system works. It also shows you how to create your own custom condition classes (Section 7.5).

The chapters following foundations have been re-organised around the three most important programming paradigms in R: functional programming, object oriented programming, and metaprogramming.

  • Functional programming is now more cleanly divided into the three main techniques: “Functionals” (Chapter 8), “Function factories” (Chapter 9), and “Function operators” (Chapter 10). I’ve focussed in on ideas that have practical applications in data science and reduced the amount of pure theory.

    These chapters now use functions provided by the purrr package, which allow me to focus more on the underlying ideas and less on the incidental variations. This also lead to a considerable simplification of the function operators chapter since a major use was to work around the absence of ... in base functionals.

  • Object oriented programming (OOP) now forms a major section of the book with completely new chapters on base types (Chapter 11), S3 (Chapter 12), S4 (Chapter 14), R6 (Chapter 13), and the tradeoffs between the systems (Chapter 15).

    These chapters focus on how the different object systems work, not how to use them effectively. This is unfortunate, but necessary, because many of the technical details are not described elsewhere, and effective use of OOP needs a whole book of its own.

  • Metaprogramming (previously called “computing on the language”) describes the suite of tools that you can use to generate code with code. Compared to the 1st edition this material has been substantially expanded and now focusses on “tidy evaluation”, a set of ideas and theory that that make metaprogramming safe, well-principled, and accessible to many more R programmers. Chapter 16, “Expressions”, describes the underlying data structures; Chapter 17, “Quasiquotation”, quoting and unquoting; Chapter 18, “Evaluation”, evaluating code in special environments; and Chapter 19, “Translations”, pulls all the themes together to show how you might translate from one (programming) language to another.

The final section of the book collects chapters on programming techniques, including debugging, profiling, improving performance, and connecting R and C++.

While the 2nd edition has mostly expanded coverage of existing material, there were four chapters that have been removed:

  • The vocabulary chapter has been removed because it was always a bit of an odd duck, and there are more effective ways to present vocabulary lists than in a book chapter.

  • The style chapter has been replaced with an online style guide, http://style.tidyverse.org/. The style guide is paired with the new styler package which can automatically apply many of the rules.

  • The C chapter has been moved to a new repo, https://github.com/hadley/r-internals, which, over time, will provide a guide to writing C code that work’s with R’s data structures.

  • The memory chapter has been removed. Much of the material has been integrated into Chapter 2 and the remainder felt excessively technical and not that important to understand.