Welcome to the 2nd edition of Advanced R. I had three main goals for this edition:
Improve coverage of important concepts that I fully understood only after the publication of the 1st edition.
Reduce coverage of topics time has shown to be less useful, or that I think are really exciting but turn out not to be that practical.
Generally make the material easier to understand with better text, clearer code, and many more diagrams.
If you’re familiar with the 1st edition, this preface describes the major changes so that you can focus your reading on the new areas. If you’re reading a printed version of this book you’ll notice one big change very quickly: Advanced R is now in colour! This has considerably improved the syntax highlighting of code chunks, and made it much easier to create helpful diagrams. I have taken advantage of this and included over 100 new diagrams throughout the book.
Another big change in this version is the use of packages, particularly rlang, which provides a clean interface to low-level data structures and operations. The 1st edition used base R functions almost exclusively, which created some pedagogical challenges because many functions evolved independently over multiple years, making it hard to see the big underlying ideas hidden amongst the incidental variations in function names and arguments. I continue to show base equivalents in sidebars, footnotes, and where needed, in individual sections, but if you want to see the purest base R expression of the ideas in this book, I recommend reading the 1st edition, which you can find online at http://adv-r.had.co.nz.
The foundations of R have not changed in the five years since the 1st edition, but my understanding of them certainly has. Thus, the overall structure of “Foundations” has remained roughly the same, but many of the individual chapters have been considerably improved:
Chapter 2, “Names and values”, is a brand new chapter that helps you understand the difference between objects and names of objects. This helps you more accurately predict when R will make a copy of a data structure, and lays important groundwork to understand functional programming.
Chapter 3, “Vectors” (previously called data structures), and has been rewritten to focus on vector types like integers, factors, and data frames. It contains more details of important S3 vectors (like dates and date-times), discusses the data frame variation provided by the tibble, and generally reflect my improved understand of vector data types.
Chapter 4, “Subsetting”, now distinguishes between
[[by their intention:
[extracts many values and
[[extracts a single value (previously they were characterised by whether they “simplified” or “preserved”). Section 4.3 draws the “train” to help you understand how
[[works with lists, and introduces new functions that provide more consistent behaviour for out-of-bounds indices.
Chapter 5, “Control flow”, is a new chapter: somehow I previously managed to forget about such important tools like
Chapter 8, “Conditions”, contains material previously in “Exceptions and debugging”, and much new content on how R’s condition system works. It also shows you how to create your own custom condition classes (Section 8.5).
The chapters following foundations have been re-organised around the three most important programming paradigms in R: functional programming, object oriented programming, and metaprogramming.
Functional programming is now more cleanly divided into the three main techniques: “Functionals” (Chapter 9), “Function factories” (Chapter 10), and “Function operators” (Chapter 11). I’ve focussed in on ideas that have practical applications in data science and reduced the amount of pure theory.
These chapters now use functions provided by the purrr package, which allow me to focus more on the underlying ideas and less on the incidental details. This also lead to a considerable simplification of the function operators chapter since a major use was to work around the absence of
...in base functionals.
Object oriented programming (OOP) now forms a major section of the book with completely new chapters on base types (Chapter 12), S3 (Chapter 13), S4 (Chapter 15), R6 (Chapter 14), and the tradeoffs between the systems (Chapter 16).
These chapters focus on how the different object systems work, not how to use them effectively. This is unfortunate, but necessary, because many of the technical details are not described elsewhere, and effective use of OOP needs a whole book of its own.
Metaprogramming (previously called “computing on the language”) describes the suite of tools that you can use to generate code with code. Compared to the 1st edition this material has been substantially expanded and now focusses on “tidy evaluation”, a set of ideas and theory that that make metaprogramming safe, well-principled, and accessible to many more R programmers. Chapter 18, “Expressions”, describes the underlying data structures; Chapter 19, “Quasiquotation”, quoting and unquoting; Chapter 20, “Evaluation”, evaluating code in special environments; and Chapter 21, “Translations”, pulls all the themes together to show how you might translate from one (programming) language to another.
The final section of the book pulls together the chapters on programming techniques: profiling, measuring and improving performance, and Rcpp. The contents are very similar to the first edition, although the organisation is a little different. I have made light updates throughout these chapters particularly to use newer packges (microbenchmark -> bench, lineprof -> profvis), but the majority of the text is the same.
While the 2nd edition has mostly expanded coverage of existing material, there were five chapters that have been removed:
The vocabulary chapter has been removed because it was always a bit of an odd duck, and there are more effective ways to present vocabulary lists than in a book chapter.
The C chapter has been moved to a new repo, https://github.com/hadley/r-internals, which, over time, will provide a guide to writing C code that work’s with R’s data structures.
The memory chapter has been removed. Much of the material has been integrated into Chapter 2 and the remainder felt excessively technical and not that important to understand.
The chapter on R’s performance as a language was removed. This delivered few actionable insights, and easily becomes out date as R itself changes.