# 20 Quasiquotation

## 20.1 Introduction

Now that you understand the tree structure of R code, it’s time to come back to one of the fundamental ideas that make expr() and ast() work: quasiquotation. There are two sides to quasiquotation:

• Quotation allows you to capture the AST associated with an argument. As a function author, this gives you a lot of power to influence how expressions are evaluated.

• Unquotation allows you to selectively evaluate parts of a quoted expression. This is a powerful tool that makes it easy to build up a complex AST from simpler fragments.

The combination of these two ideas makes it easy to compose expressions that are mixtures of direct and indirect specification, and helps to solve a wide variety of challenging problems.

Quoting functions have deep connections to Lisp macros. But macros are ususally run at compile-time, which doesn’t have any meaning in R, and they always input and output ASTs. (Lumley (2001) shows one way you might implement them in R). Quoting functions are more closely related to Lisp fexprs, functions where all arguments are quoted by default. These terms are useful to know when looking for related techniques in other programming languages.

### Prerequisites

Make sure you’re familiar with the tree structure of code described in Abstract syntax trees.

You’ll also need the development version of rlang:

if (packageVersion("rlang") < "0.2.0") {
stop("This chapter requires rlang 0.2.0", call. = FALSE)
}
library(rlang)

## 20.2 Motivation

We’ll start with a simple and concrete example that helps motivate the need for unquoting, and hence quasiquotation. Imagine you’re creating a lot of strings by joining together words:

paste("Good", "morning", "Hadley")
paste("Good", "afternoon", "Alice")
#> [1] "Good afternoon Alice"

You are sick and tired of writing all those quotes, and instead you just want to use bare words. To that end, you’ve managed to write the following function:

cement <- function(...) {
dots <- exprs(...)
paste(purrr::map(dots, expr_name), collapse = " ")
}

cement(Good, afternoon, Alice)
#> [1] "Good afternoon Alice"

(You’ll learn what exprs() does shortly; for now just look at the results.)

Formally, this function quotes the arguments in .... You can think of it as automatically putting quotation marks around each argument. That’s not precisely true as the intermediate objects it generates are expressions, not strings, but it’s a useful approximation for now.

This function is nice because we no longer need to type quotes. The problem, however, comes when we want to use variables. It’s easy to use variables with paste() as we just don’t surround them with quotes:

name <- "Hadley"
time <- "morning"

paste("Good", time, name)
#> [1] "Good morning Hadley"

Obviously this doesn’t work with cement() because every input is automatically quoted:

cement(Good, time, name)
#> [1] "Good time name"

We need some way to explicitly unquote the input, to tell cement() to remove the automatic quote marks. Here we need time and name to be treated differently to Good. Quasiquotation give us a standard tool to do so: !!, called “unquote”, and prounounced bang-bang. !! tells a quoting function to drop the implicit quotes:

cement(Good, !!time, !!name)
#> [1] "Good morning Hadley"

It’s useful to compare cement() and paste() directly. paste() evaluates its arguments, so we need to quote where needed; cement() quotes its arguments, so we need to unquote where needed.

paste("Good", time, name)
cement(Good, !!time, !!name)

### 20.2.1 Vocabulary

The distinction between quoted and evaluated arguments is important:

• An evaluated argument obeys R’s usual evaluation rules.

• A quoted argument is captured by the function and something unusual will happen.

If you’re even unsure about whether an argument is quoted or evaluated, try executing the code outside of the function. If it doesn’t work, then that argument is quoted. For example, you can use this technique to determine that the first argument to library() is quoted:

# works
library(MASS)

# fails
MASS
#> Error in eval(expr, envir, enclos): object 'MASS' not found

Talking about whether an argument is quoted or evaluated is a more precise way of stating whether or not a function uses NSE. I will sometimes use “quoting function” as short-hand for a “function that quotes one or more arguments”, but generally, I’ll refer to quoted arguments since that is the level at which the difference occurs.

### 20.2.2 Theory

Now that you’ve seen the basic idea, it’s time to talk a little bit about the theory. The idea of quasiquotation is an old one. It was first developed by a philosopher, Willard van Orman Quine11, in the early 1940s. It’s needed in philosophy because it helps when precisely delineating the use and mention of words, i.e. between the object and the words we use to refer to that object.

Quasiquotation was first used in a programming language, LISP, in the mid-1970s (Bawden 1999). LISP has one quoting function , and uses , for unquoting. Most languages with a LISP heritage behave similarly. For example, racket ( and @), clojure ( and ~), and julia (: and @) all have quasiquotation tools that different only slightly from LISP.

Quasiquotation has only come to R recently (2017). Despite its newness, I teach it in this book because it is a rich and powerful theory that makes many hard problems much easier. Quaisquotation in R is a little different to LISP and descendents. In LISP there is only one function that does quasiquotation (the quote function), and you must call it explicitly when needed. This makes these languages less ambiguous (because there’s a clear code signal that something odd is happening), but is less appropriate for R because quasiquotation is such an important part of DSLs for data analysis.

### 20.2.3 Exercises

1. For each function in the following base R code, identify which arguments are quoted and which are evaluated.

library(MASS)

mtcars2 <- subset(mtcars, cyl == 4)

with(mtcars2, sum(vs))
sum(mtcars2$am) rm(mtcars2) 2. For each function in the following tidyverse code, identify which arguments are quoted and which are evaluated. library(dplyr) library(ggplot2) by_cyl <- mtcars %>% group_by(cyl) %>% summarise(mean = mean(mpg)) ggplot(by_cyl, aes(cyl, mean)) + geom_point() ## 20.3 Quotation The first part of quasiquotation is quotation: capturing an AST without evaluating it. There are two components to this: capturing an expression directly, and capturing an expression from a lazily-evaluated function argument. We’ll discuss two sets of tools for these two ways of capturing: those provided by rlang, and those provided by base R. ### 20.3.1 With rlang There are four important quoting functions, broken down by whether they capture one or many expressions, and whether they capture the developer’s or users’ expression: Developer User One expr() enexpr() Many exprs() enexprs() For interactive exploration, the most important quoting function is expr(). It captures its argument exactly as provided: expr(x + y) #> x + y expr(1 / 2 / 3) #> 1/2/3 (Remember that white space and comments are not part of the AST, so will not be captured by an quoting function.) expr() is great for interactive exploration, because it captures what you, the developer, typed. It’s not useful inside a function: f1 <- function(x) expr(x) f1(a + b + c) #> x Instead, we need another function: enexpr(). This captures what the user supplies to the function by looking at the internal promise object that powers lazy evaluation. f2 <- function(x) enexpr(x) f2(a + b + c) #> a + b + c (Occassionaly you just want to capture symbols, and throw an error for other types of input. In that case you can use ensym(). In the next chapter, you’ll learn about enquo() which also captures the environment and is needed for tidy evaluation.) To capture multiple arguments, use enexprs(): f <- function(...) enexprs(...) f(x = 1, y = 10 * z) #>$x
#> [1] 1
#>
#> $y #> 10 * z Finally, exprs() is useful interactively to make a list of expressions: exprs(x = x ^ 2, y = y ^ 3, z = z ^ 4) #>$x
#> x^2
#>
#> $y #> y^3 #> #>$z
#> z^4
# shorthand for
# list(x = expr(x ^ 2), y = expr(y ^ 3), z = expr(z ^ 4))

Note that it can return missing arguments:

val <- exprs(x = )
#> [1] 1
#>
#> $y #> x + 2 There are two other important base quoting functions that we’ll cover elsewhere: • bquote() provides a limited form of quasiquotation, and is discussed in unquoting with base R. • ~, the formula, is a quoting function that also captures the environment. It’s the inspiration for quosures, the topic of the next chapter, and is discussed in [formulas]. ### 20.3.3 Exercises 1. What happens if you try and use enexpr() with an expression? What happens if you try and use enexpr() with a missing argument? 2. Compare and contrast the following two functions. Can you predict the ouput before running them? f1 <- function(x, y) { exprs(x = x, y = y) } f2 <- function(x, y) { enexprs(x = x, y = y) } f1(a + b, c + d) #>$x
#> x
#>
#> $y #> y f2(a + b, c + d) #>$x
#> a + b
#>
#> $y #> c + d 3. How are exprs(a) and exprs(a = ) different? Think about both the input and the output. 4. What does the following command return? What information is lost? Why? expr({ x + y # comment }) 5. The documentation for substitute() says: Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged. Create four examples that illustrate each of the different cases. ## 20.4 Evaluation Typically you have quoted a function argument for one of two reasons: • You want to operate on the AST using the techniques described in the previous chapter. • You want to run, or evaluate the code in a special context, as described in depth next chapter. Evaluation is a rich topic, so we’ll cover in depth in the next chapter. Here I’ll just illustrate the most important ideas. The most important base R function is base::eval(). Its first argument is the expression to evalute: ru5 <- expr(runif(5)) ru5 #> runif(5) eval(ru5) #> [1] 0.0808 0.8343 0.6008 0.1572 0.0074 eval(ru5) #> [1] 0.466 0.498 0.290 0.733 0.773 Note that every time we evaluate this expression we get a different result. The second argument to eval() is the environment in which the expression is evaluated. Manipulating this environment gives us amazing power to control the execution of R code. This is the basic technique gives dbplyr the ability to turn R code into SQL. x <- 9 fx <- expr(f(x)) eval(fx, env(f = function(x) x * 10)) #> [1] 90 eval(fx, env(f = function(x) x ^ 2)) #> [1] 81 ## 20.5 Unquotation Evaluation is a developer tool: in combination with quoting, it allows the author of a function to capture an argument and evaluate it in a special way. Unquoting is related to evaluation, but it’s a user tool: it allows the person calling the function to selectively evaluate parts of the expresion that would otherwise be quoted. ### 20.5.1 With rlang All quoting functions in rlang (expr(), enexpr(), and friends) supporting unquoting with !! (called “unquote”, and pronounced bang-bang) and !!! (called “unquote-splice”, and pronounced bang-bang-bang). They both replace nodes in the AST. !! is a one-to-one replacement. It takes a single expression and inlines the AST at the location of the !!. x <- expr(a + b + c) expr(f(!!x, y)) #> f(a + b + c, y) !!! is a one-to-many replacement. It takes a list of expressions and inserts them at them at the location of the !!!: x <- exprs(1, 2, 3, y = 10) expr(f(!!!x, z = z)) #> f(1, 2, 3, y = 10, z = z) ### 20.5.2 The polite fiction of !! So far we have acted as if !! and !!! are regular prefix operators like + , -, and !. They’re not. Instead, from R’s perspective, !! and !!! are simply the repeated application of !: !!TRUE #> [1] TRUE !!!TRUE #> [1] FALSE !! and !!! have special behaviour inside all quoting functions powered by rlang, and the unquoting operators are given precedence similar to + and -, not !. We do this because the operator precedence for ! is surprisingly low: it has lower precedence than that of the binary algebraic and logical operators. Most of the time this doesn’t matter as it is unusual to mix ! and binary operators (e.g. you typically would not write !x + y or !x > y). However, expressions like !!x + !!y are not uncommon when unquoting, and requring explicit parentheses, (!!x) + (!!y), feels onerous. For this reason, rlang manipulates the AST to give the unquoting operators a higher, more natural, precedence. You might wonder why rlang does not use a regular function call. Indeed, early versions of rlang provided UQ() and UQS() as alternatives to !! and !!!. However, these looked like regular function calls, rather than special syntactic operators, and evoked a misleading mental model, which made them harder to use correctly. In particular, function calls only happen (lazily) at evaluation time; unquoting always happens at quotation time. We adopted !! and !!! as the best compromise: they are strong visual symbols, don’t look like existing syntax, and take over a rarely used piece of syntax. (And if for some reason you do need to doubly negate a value in a quasiquoting function, you can just add parentheses !(!x).) One place where the illusion currently breaks down is base::deparse(): x <- quote(!!x + !!y) deparse(x) #> [1] "!(!x + (!(!y)))" Although the R parser can distinguish between !(x) and !x, the deparser currently does not. You are most likely to see this when printing the source for a function in another package, where the source references have been lost. rlang::expr_deparse() works around this problem if you need to manually deparse an expression, but often this does not help because the deparsing occurs outside of your control, as during debugging. expr_deparse(x) #> [1] "!!x + (!!y)" Hopefully this will be resolved in a future version of R, but for now, you’ll need to watch out for this problem. ### 20.5.3 With base R Base R has one function that implements quasiquotation: bquote(). It uses .() for unquoting: xyz <- bquote((x + y + z)) bquote(-.(xyz) / 2) #> -(x + y + z)/2 bquote() is a neat function, but is not used by any other function in base R. Instead functions that quote an argument use some other technique to allow indirect specification. There are four basic forms seen in base R: • A pair of quoting and non-quoting functions. For example, $ has two arguments, and the second argument is quoted. This is easier to see if you write in prefix form: mtcars$cyl is equivalent to $(mtcars, cyl). If you want to refer to a variable indirectly, you use [[, as it takes the name of a variable as a string.

x <- list(var = 1, y = 2)
var <- "y"

#>   x1 x2
#> 1  1  3
#>
#> $excl #> y z #> 1 a b Note the name of the first argument: .data. This is a standard convention through the tidyverse because you don’t need to explicitly name this argument (because it’s always used), and it avoids potential clashes with argument names in .... ### 20.6.3 Slicing an array One occassionally useful tool that’s missing from base R is the ability to extract a slice of an array given a dimension and an index. For example, we’d like to write slice(x, 2, 1) to extract the first slice along the second dimension, which you can write as x[, 1, ]. We’ll need to generated a call will multiple missing arguments. Fortunately is easy with rep() and missing_arg(). Once we have those arguments, we can unquote-splice them into a call: indices <- rep(list(missing_arg()), 3) expr(x[!!!indices]) #> x[, , ] We then wrap this into a function, using subset-assignment to insert the index in the desired position: slice <- function(x, along, index) { stopifnot(length(index) == 1) nd <- length(dim(x)) indices <- rep(list(missing_arg()), nd) indices[along] <- index expr(x[!!!indices]) } x <- array(sample(30), c(5, 2, 3)) slice(x, 1, 3) #> x[3, , ] slice(x, 2, 2) #> x[, 2, ] slice(x, 3, 1) #> x[, , 1] A real slice() would evaluate the generated call, but here I think it’s more illuminating to see the code that’s generated, as that’s the hard part of the challenge. ### 20.6.4 Creating functions Another powerful function to use in combination with unquoting is rlang::new_function(): it allows us to create a function by supplying the arguments, the body, and (optionally) the environment: new_function( exprs(x = , y = ), expr({x + y}) ) #> function (x, y) #> { #> x + y #> } One application is to create functions that work like graphics::curve(). curve() allows you to plot a mathematical expression, without creating a function: curve(sin(exp(4 * x)), n = 1000) Here x is a pronoun. As with . in pipelines and .x and .y in purrr functioanls, x doesn’t represent a single concrete value, but is instead a placeholder that varies over the range of the plot. Functions, like curve(), that use a expression containing a pronoun are known as anaphoric functions12. One way to implement curve() is to turn the expression into a function with a single argument, then call that function: curve2 <- function(expr, xlim = c(0, 1), n = 100) { expr <- enexpr(expr) f <- new_function(exprs(x = ), expr) x <- seq(xlim[1], xlim[2], length = n) y <- f(x) plot(x, y, type = "l", ylab = expr_text(expr)) } curve2(sin(exp(4 * x)), n = 1000) Another use for new_function() is as an alternative to simple function factories and function operators. The primary advantage is that the generated functions have readable source code: negate1 <- function(f) { force(f) function(...) !f(...) } negate1(is.null) #> function(...) !f(...) #> <environment: 0x4a3f2a0> negate2 <- function(f) { f <- enexpr(f) new_function(exprs(... = ), expr(!(!!f)(...)), caller_env()) } negate2(is.null) #> function (...) #> !is.null(...) Note that this is often useful if the higher order function have arguments that are expressions: inlining more complex objects into the AST can yield confusing source code. ### 20.6.5 Exercises 1. Implement arrange_desc(), a variant of dplyr::arrange() that sorts in descending order by default. 2. Implement filter_or(), a variant of dplyr::filter() that combines multiple arguments using | instead of &. 3. Implement partition_rows() which, like partition_cols(), returns two data frames, one containing the selected rows, and the other containing the rows that weren’t selected. 4. Add error handling to slice(). Give clear error messages if either along or index have invalid values (i.e. not numeric, not length 1, too small, or too big). 5. Re-implement the Box-Cox transform defined below using unquoting and new_function(): bc <- function(lambda) { if (lambda == 0) { function(x) log(x) } else { function(x) (x ^ lambda - 1) / lambda } } 6. Re-implement the simple compose() defined below using quasiquotation and new_function(): compose <- function(f, g) { function(...) f(g(...)) } ## 20.7 Dot-dot-dot (...) Quasiquotation ensures that every quoted argument has an escape hatch that allows the user to unquote, or evaluated, selected components, if needed. A similar and related needs arises with functions that take arbitrary additional arguments with .... Take the following two motivating problems: • What do you do if the elements you want to put in ... are already stored in a list? For example, imagine you have a list of data frames that you want to rbind() together: dfs <- list( a = data.frame(x = 1, y = 2), b = data.frame(x = 3, y = 4) ) You could solve this specific case with rbind(dfs$a, df\$b), but how do you generalise that solution to a list of arbitrary length?

• What do you do if you want to supply the argument name indirectly? For example, imagine you want to create a single column data frame where the name of the column is specified in a variable:

var <- "x"
val <- c(4, 3, 9)

In this case, you could create a data frame and then change names (ie. setNames(data.frame(val), var)), but this feels inelegant. How can we do better?

### 20.7.1do.call()

Base R provides a swiss-army knife to solve these problems: do.call(). do.call() has two main arguments. The first argument, what, gives a funtion to call. The second argument, args, is a list of arguments to pass to that function, and so do.call("f", list(x, y, z)) is equivalent to f(x, y, z).

• do.call() gives a straightforward solution to rbind()ing together many data frames:

do.call("rbind", dfs)
#>   x y
#> a 1 2
#> b 3 4
• With a little more work, we can use do.call() to solve the second problem. We first create a list of arguments, then name that, then use do.call():

args <- list(val)
names(args) <- var

do.call("data.frame", args)
#>   x
#> 1 4
#> 2 3
#> 3 9

### 20.7.2 The tidyverse approach

The tidyverse solves these problems in a different way to base R, by drawing parallel to quasiquotation:

• Row-binding multiple data frames is like unquote-splicing: we want to inline individual elements of the list into the call:

dplyr::bind_rows(!!!dfs)
#>   x y
#> 1 1 2
#> 2 3 4

When used in this context, the behaviour of !!! is known as spatting in Ruby, Go, PHP, and Julia. It is closely related to *args (star-args) and **kwarg (star-star-kwargs) in Python, which are sometimes called argument unpacking.

• The second problem is like unquoting on the LHS of =: rather than interpreting var literaly, we want to use the value stored in the variable called var:

tibble::tibble(!!var := val)
#> # A tibble: 3 x 1
#>       x
#>   <dbl>
#> 1    4.
#> 2    3.
#> 3    9.

Note the use of := (pronounced colon-equals) rather than =. Unforunately we need this new operation because R’s grammar does not allow expressions as argument names:

tibble::tibble(!!var = value)
#> Error: unexpected '=' in "tibble::tibble(!!var ="

:= is like a vestigal organ: it’s recognised by R’s parser, but it doesn’t have any code associated with it. It looks like an = but allows expressions on either side, making it a more flexible alternative to =. It is used in data.table for similar reasons.

### 20.7.3list2()

Both dplyr::bind_rows() and tibble::tibble() are powered by rlang::list2(...). This function is very similar to list(...), but it understands !!! and !!. If you want to take advantage of this behaviour in your own function, all you need to do is use list2() in your own code. For example, imagine you want to make a version of structure() that understands !!! and !!. We’ll call it set_attr():

set_attr <- function(.x, ...) {
attr <- rlang::list2(...)
attributes(.x) <- attr
.x
}

attrs <- list(x = 1, y = 2)
attr_name <- "z"

1:10 %>%
set_attr(w = 0, !!!attrs, !!attr_name := 3) %>%
str()
#>  atomic [1:10] 1 2 3 4 5 6 7 8 9 10
#>  - attr(*, "w")= num 0
#>  - attr(*, "x")= num 1
#>  - attr(*, "y")= num 2
#>  - attr(*, "z")= num 3

(rlang also provides a set_attr() function with a few extra conveniences, but the essence is the same.)

Note that we call the first argument .x: whenever you use ... to take arbitrary data, it’s good practice to give the other argument names a . prefix. This eliminates any ambiguity about who owns the argument, and in this case makes it possible to set the x attribute.

list2() provides one other handy feature: by default it will ignore any empty arguments at the end. This is useful in functions like tibble::tibble() because it means that you can easily change the order of variables without worrying about the final comma:

# Can easily move x to first entry:
tibble::tibble(
y = 1:5,
z = 3:-1,
x = 5:1,
)

# Need to remove comma from z and add column to x
data.frame(
y = 1:5,
z = 3:-1,
x = 5:1
)

As well as list2(), rlang also provides lgl(), int(), dbl(), and chr() which create atomic vectors in the same way.

### 20.7.4 Application: invoke() and lang()

One useful application of list2() is invoke():

invoke <- function(.f, ...) {
do.call(.f, list2(...), envir = parent.frame())
}

(At time of writing, both purrr::invoke() and rlang::invoke() have somewhat different definitions because they were written before we understood how quasiquotation syntax and ... intersected.)

As a wrapper around do.call(), invoke() gives powerful ways to call functions with arguments supplied directly (in …) or indirectly (in a list):

invoke("mean", x = 1:10, na.rm = TRUE)

# Equivalent to
x <- list(x = 1:10, na.rm = TRUE)
invoke("mean", !!!x)

It also allows us to specify argument names indirectly:

arg_name <- "na.rm"
arg_val <- TRUE
invoke("mean", 1:10, !!arg_name := arg_val)

Closely related to invoke() is rlang::call2(). It constructs a call from its components:

call2("mean", 1:10, !!arg_name := arg_val)
#> mean(1:10, na.rm = TRUE)

The chief advantage of call2() over expr() is that it can use :=.

### 20.7.5 Other approaches

Apart from rlang::list2() there are several other techniques used to overcome the motivating challenges described above. One technique is to take ... and a single unnamed argument that is a list, making f(list(x, y, z)) equivalent to f(x, y, z). The implementation looks something like this:

f <- function(...) {
dots <- list(...)
if (length(dots) == 1 && is.list(dots[[1]])) {
dots <- dots[[1]]
}

# Do something
...
}

Base functions that use this technique include interaction(), expand.grid(), options(), and par(). Since these functions take either a list or ..., but not both, they are slightly less flexible than functions powered by list2().

Another related technique is used the RCurl::getURL() function written by Duncan Temple Lang. getURL() take both ... and .opts which are concatenated together. This is useful when writing functions to call web APIs because you often have some options that need to be passed to every request. You put these in a common list and pass to .opts, saving ... for the options unique for a given call.

I found this technique particular compelling so you can see it used throughout the tidyverse. Now, however, rlang::list2() dots solves more problems, more elegantly, by using the ideas from tidy eval. The tidyverse is slowly migrating to list2() style for all functions that take ....

### 20.7.6 Exercises

1. Carefully read the source code for interaction(), expand.grid(), and par(). Compare and constract the techniques they use for switching between dots and list behaviour.

2. Explain the problem with this defintion of set_attr()

set_attr <- function(x, ...) {
attr <- rlang::list2(...)
attributes(x) <- attr
x
}
set_attr(1:10, x = 10)
#> Error in attributes(x) <- attr: attributes must be named`

### References

Lumley, Thomas. 2001. “Programmer’s Niche: Macros in R.” R News 1 (3):11–13. https://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf.

Bawden, Alan. 1999. “Quasiquotation in Lisp.” In PEPM ’99, 4–12. http://repository.readscheme.org/ftp/papers/pepm99/bawden.pdf.

1. You might be familiar with the name Quine from “quines”, computer programs that when run return a copy of their own source code.

2. Anaphoric comes from the linguistics term “anaphora”, an expression that is context dependent. Anaphoric functions are found in Arc (a LISP like language), Perl, and Clojure.