“Flexibility in syntax, if it does not lead to ambiguity, would seem a reasonable thing to ask of an interactive programming language.”
— Kent Pitman
One of the most surprising things about R is its capability for metaprogramming: the ability of code to inspect and modify other code. In R, functions that use metaprogramming are commmonly said to use non-standard evalution, or NSE for short. That’s because they evaluate one (or more) of their arguments in a non-standard way. As you might guess, defining these tools by what they are not (standard evaluation) is challenging, so you’ll learn more precise vocabulary as you work through these chapters.
Additionally, implementation of the underlying ideas has occurred piecemeal over the last twenty years. These two forces tend to make base R metaprogramming code harder to understand than it could be: the key ideas are obscured by unimportant details. To focus on the main ideas, the following chapters will start with functions from the rlang package, which have been developed more recently with an eye for consistency. Once you have the basic ideas with rlang, I’ll show you the equivalent with base R so you can use your knowledge to understand existing code.
Metaprogramming is particularly important in R because it is well suited to facilitating interactive data analysis. There are two primary uses of metaprogramming that you have probably already seen:
It makes it possible to trade precision for concision in functions like
dplyr::filter()that make interactive data exploration faster at the cost of introducing some ambiguity.
It makes it possible build domain specific languages (DSLs) that tailor R’s semantics to specific problem domains like visualisation or data manipulation.
We’ll briefly illustrate these important concepts before diving into the details of how they work in the subsequent chapters.
19.0.1 Trading precision for concision
A common use of metaprogramming is to allow you to use names of variables in a dataframe as if they were objects in the environment. This makes interactive exploration more fluid at the cost of introducing some minor ambiguity. For example, take
base::subset(). It allows you to pick rows from a dataframe based on the values of their observations:
data("diamonds", package = "ggplot2") subset(diamonds, x == 0 & y == 0 & z == 0) #> # A tibble: 7 x 10 #> carat cut color clarity depth table price x y z #> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> #> 1 1.00 Very Good H VS2 63.3 53. 5139 0. 0. 0. #> 2 1.14 Fair G VS1 57.5 67. 6381 0. 0. 0. #> 3 1.56 Ideal G VS2 62.2 54. 12800 0. 0. 0. #> 4 1.20 Premium D VVS1 62.1 59. 15686 0. 0. 0. #> 5 2.25 Premium H SI2 62.8 59. 18034 0. 0. 0. #> 6 0.710 Good F SI2 64.1 60. 2130 0. 0. 0. #> 7 0.710 Good F SI2 64.1 60. 2130 0. 0. 0.
(Base R functions like
transform() inspired the development of dplyr.)
subset() is considerably shorter than the equivalent code using
$ because you only need to provide the name of the data frame once:
19.1 Domain specific languages
More extensive use of metaprogramming leads to DSLs like ggplot2 and dplyr. DSLs are particularly useful because they make it possible to translate R code into another language. For example, one of the headline features of dplyr is that you can write R code that is automatically translated into SQL:
library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:") mtcars_db <- copy_to(con, mtcars) mtcars_db %>% filter(cyl > 2) %>% select(mpg:hp) %>% head(10) %>% show_query() #> <SQL> #> SELECT `mpg`, `cyl`, `disp`, `hp` #> FROM `mtcars` #> WHERE (`cyl` > 2.0) #> LIMIT 10 DBI::dbDisconnect(con)
This is a useful technique because it makes it possible to retrieve data from a database without paying the high cognitive overhead of switching between R and SQL.
ggplot2 and dplyr are known as embedded DSLs, because they take advantage of R’s parsing and execution framework, but tailor R’s semantics for specific tasks. If you’re interested in learning more, I highly recommend Domain Specific Languages by Martin Fowler. It discusses many options for creating a DSL and provides many examples of different languages.
In the following chapters, you’ll learn about the three big ideas that underpin metaprogramming:
In Expressions, Expressions, you’ll learn that all R code forms a tree. You’ll learn how to visualise that tree, how the rules of R’s grammar convert linear sequences of characters into a tree, and how to use recursive functions to work with code trees.
In Quotation, Quotation, you’ll learn to use tools from rlang to capture (“quote”) unevaluated function arguments. You’ll also learn about quasiquotation, which provides a set of techniques for “unquoting” input that makes it possible to easily generate new trees from code fragments.
In Evaluation, Evaluation, you’ll learn about the inverse of quotation: evaluation. Here you’ll learn about an important data structure, the quosure, which ensures correct evaluation by capturing both the code to evaluate, and the environment in which to evaluate it. This chapter will show you how to put all the pieces together to understand how NSE in base R works, and how to write your own functions that work like
Finally, in Translating R code, [Translation], you’ll see how to combine first class environments, lexical scoping, and metaprogramming to translate R code in to other languages, namely HTML and LaTeX.
Each chapter follows the same basic structure. You’ll get the lay of the land in introduction, then see a motivating example. Next you’ll learn the big ideas using functions from rlang, and then we’ll circle back to talk about how those ideas are expressed in base R. Each chapter finishes with a case study, using the ideas to solve a bigger problem.