17 Quasiquotation

17.1 Introduction

Now that you understand the tree structure of R code, it’s time to come back to one of the fundamental ideas that make expr() and ast() work: quasiquotation. Quasiquotation is made up of two parts:

  • Quotation is the act of capturing an unevaluated expression.

  • Unquotation is the ability to selectively evaluate evaluate parts of an otherwise quoted expression.

The combination of these two ideas makes it easy create functions that combine code written by the function author with code written by the function user, and helps to solve a wide variety of challenging problems.

Quasiquotation is one of the three components of tidy evaluation. You’ll learn learn about the other two components (quosures and the data mask) in Chapter ??. By itself, quasiquotation is mostly useful for programming, particularly for generating code; when combined with the other techniques, tidy evalation is a powerful tool for data analysis.

Outline

  • Section 17.2 motivates the development of quasiquotation with a function, cement(), that works like paste() but automatically “quotes” its arguments so that you don’t have to.

  • Section 17.3 gvies you the tools to quote expressions, whether they come from you or the user, or whether you use rlang or base R tools.

  • Section 17.4 introduces the biggest difference between rlang quoting functions and base quoting function: unquoting with !! and !!!.

  • Section 17.5 discusses the three main “non-quoting” techniques that base R functions uses to disable quoting behaviour.

  • Section 17.6 explores another place that you can use !!!, functions that take .... It also introduces the special := operator, which allows you to dynamically change argument names.

  • Section 17.7 shows a few practical uses of quoting to solve problems that naturally require some code generation.

Prerequisites

Make sure you’ve read the metaprogramming overview in Chapter ?? to get a broad overview of the motivation and the basic vocabulary, and that you’re familiar with the tree structure of expressions as described in Section 16.

Code-wise, we’ll mostly be using the tools from rlang, but at the end of the chapter you’ll also see some powerful applications in conjunction with purrr.

17.2 Motivation

We’ll start with a simple and concrete example that helps motivate the need for unquoting, and hence quasiquotation. Imagine you’re creating a lot of strings by joining together words:

You are sick and tired of writing all those quotes, and instead you just want to use bare words. To that end, you’ve written the following function:

(You’ll learn what ensyms() does shortly; for now just look at the results.)

Formally, this function quotes the arguments in .... You can think of it as automatically putting quotation marks around each argument. That’s not precisely true as the intermediate objects it generates are expressions, not strings, but it’s a useful approximation, and meaning at the root of the term “quote”.

This function is nice because we no longer need to type quotation marks. The problem, however, comes when we want to use variables. It’s easy to use variables with paste(): just don’t surround them with quotation marks.

Obviously this doesn’t work with cement() because every input is automatically quoted:

We need some way to explicitly unquote the input, to tell cement() to remove the automatic quote marks. Here we need time and name to be treated differently to Good. Quasiquotation gives us a standard tool to do so: !!, called “unquote”, and pronounced bang-bang. !! tells a quoting function to drop the implicit quotes:

It’s useful to compare cement() and paste() directly. paste() evaluates its arguments, so we need to quote where needed; cement() quotes its arguments, so we need to unquote where needed.

17.2.1 Vocabulary

The distinction between quoted and evaluated arguments is important:

  • An evaluated argument obeys R’s usual evaluation rules.

  • A quoted argument is captured by the function and something unusual will happen.

paste() evaluates all its arguments; cement() quotes all its arguments.

If you’re ever unsure about whether an argument is quoted or evaluated, try executing the code outside of the function. If it doesn’t work (or does something profoundly different), then that argument is quoted. For example, you can use this technique to determine that the first argument to library() is quoted:

Talking about whether an argument is quoted or evaluated is a more precise way of stating whether or not a function uses non-standard evaluation (NSE). I will sometimes use “quoting function” as short-hand for a “function that quotes one or more arguments”, but generally, I’ll refer to quoted arguments since that is the level at which the difference applies.

17.2.2 History

The idea of quasiquotation is an old one. It was first developed by a philosopher, Willard van Orman Quine62, in the early 1940s. It’s needed in philosophy63 because it helps when precisely delineating the use and mention of words, i.e. between the object and the words we use to refer to that object.

Quasiquotation was first used in a programming language, LISP, in the mid-1970s (Bawden 1999). LISP has one quoting function `, and uses , for unquoting. Most languages with a LISP heritage behave similarly. For example, racket (` and @), clojure (` and ~), and julia (: and @) all have quasiquotation tools that differ only slightly from LISP. These languages have a single quoting function and you must call it explicitly.

In R, however, many functions quote one or more inputs. This introduces ambiguity (because you need to read the documentation to determine if an argument is quoted or not), but allows for concise and elegant data exploration code. In base R, only one function supports quasiquotation: bquote(), written in 2003 by Thomas Lumley. But bquote() is not used anywhere in base R, and has had relatively little impact on how R code is written. There are three challenges to effective use of bquote():

  • It is only easily used with your code; it is hard to apply it to abitrary code supplied by a user.

  • It does not provide an unquote-splice operator that allows you to unquote multiple expressions stored in a list.

  • It lacks the ability to handle code accompanied by an environment, which is crucial for functions that evaluate code in the context of a data frame, like subset() and friends.

Figuring out how to fix the first and second challenges lead to my lazyeval package (2014-2015). Identifying and remedying the third challenge (the topic of Chapter 18 lead to the tidy evaluation framework taught in this book and implemented in the rlang package by Lionel Henry, during 2017. Despite the newness of tidy eval, I teach it here because it is a rich and powerful theory that, once you master it, makes many hard problems much easier.

17.2.3 Exercises

  1. For each function in the following base R code, identify which arguments are quoted and which are evaluated.

  2. For each function in the following tidyverse code, identify which arguments are quoted and which are evaluated.

17.3 Quoting

The first part of quasiquotation is quotation: capturing an expression without evaluating it. There are two components to this: capturing an expression supplied directly, and capturing an expression supplied indirectly in a lazily-evaluated function argument. We’ll discuss two sets of tools for these two ways of capturing: those provided by rlang, and those provided by base R.

17.3.1 Capturing expressions

There are four important quoting functions. For interactive exploration, the most important quoting function is expr(). It captures its argument exactly as provided:

(Remember that white space and comments are not part of the expression, so will not be captured by a quoting function.)

expr() is great for interactive exploration, because it captures what you, the developer, typed. It’s not so useful inside a function:

We need another function to solve this problem: enexpr(). This captures what the caller supplied to the function by looking at the internal promise object that powers lazy evaluation (Section 5.5.2).

To capture multiple arguments (e.g. all arguments in ...), use enexprs().

Finally, exprs() is useful interactively to make a list of expressions:

In short, use enexpr() and enexprs() to capture the expressions supplied as arguments by the user. Use expr() and exprs() to capture expressions that you supply.

17.3.2 Capturing symbols

Sometimes you only want to allow the user to specify a variable name, not an arbtirary expression. In this case, you can use ensym() or ensyms(). These are variants of enexpr() and enexprs() that check the captured expression is either symbol or a string (which is converted to symbol64).

ensym() and ensyms() throw an error if given anything else.

17.3.3 With base R

The base equivalent of expr() is quote():

It is identical to expr() except that it does not support unquoting (which we’ll talk about very soon). This makes it is a quoting function, not a quasiquoting function.

The base function closest to enexpr() is substitute():

The base equivalent to exprs() is alist():

There the equivalent to enexprs() is an undocumented feature of substitute()65: you can pretend ... is a function to capture all arguments in ...:

There are two other important base quoting functions that we’ll cover elsewhere:

  • bquote() provides a limited form of quasiquotation, and is discussed in Section 17.5.

  • ~, the formula, is a quoting function that also captures the environment. It’s the inspiration for quosures, the topic of the next chapter, and is discussed in Section 18.3.4.

17.3.4 Summary

When quoting (i.e. capturing code), there are two important distinctions:

  • Is it supplied by the developer of the code or the user of the code? i.e. is it fixed or varying, supplied in the body of the function or via an argument?

  • Do you want to capture a single expression or multiple expressions?

This leads to a 2 x 2 table of function for rlang and base R:

  • rlang:

    Developer User
    One expr() enexpr()
    Many exprs() enexprs()
  • base R:

    Developer User
    One quote() substitute()
    Many alist() eval(substitute(alist()))

17.3.5 Exercises

  1. What does the following command return? What information is lost? Why?

  2. How is expr() implemented?

  3. Compare and contrast the following two functions. Can you predict the output before running them?

  4. What happens if you try to use enexpr() with an expression (i.e. enexpr(x + y) ? What happens if enexpr() is passed a missing argument?

  5. How are exprs(a) and exprs(a = ) different? Think about both the input and the output.

  6. What are other differences between exprs() and alist()? Read the documentation for the named arguments of exprs() to find out.

  7. The documentation for substitute() says:

    Substitution takes place by examining each component of the parse tree as follows:

    • If it is not a bound symbol in env, it is unchanged.
    • If it is a promise object (i.e., a formal argument to a function) the expression slot of the promise replaces the symbol.
    • If it is an ordinary variable, its value is substituted;
    • Unless env is .GlobalEnv in which case the symbol is left unchanged.

    Create four examples that illustrate each of the different cases.

17.4 Unquoting

So far, you’ve only seen relatively small advantages of the rlang quoting functions over the base R quoting functions: they have a more consistent naming scheme. The big difference is that rlang quoting functions are actually quasiquoting functions, because they support unquoting with !! (called “unquote”, and pronounced bang-bang).

Unquoting allows you to selectively execute, or evaluate, parts of the expression that would otherwise be quoted, and it effectively allows you to merge together ASTs using a template AST. Since base functions don’t use unquoting, they instead use a variety of other techniques, which you’ll learnn about in Section 17.5.

Unquoting is one inverse of quoting. It allows you to selectively evaluate code inside expr(), so that expr(!!x) is equivalent to x. In Chapter 18, you’ll learn about another inverse, evaluation. Evaluation happens outside expr(), so that eval(expr(x)) is equivalent to x.

17.4.1 Unquoting one argument

Use !! to unquote a single argument in a function call. !! takes a single expression, evaluates it, and inlines the result in to the AST.

I think this is easiest to understand with a diagram. !! introduces a placeholder in the AST, shown with dotted borders. Here the placeholder x is replaced by an AST, illustrated by a dotted connection.

As well as call objects, !! also works with symbols and constants:

Note that the right-hand side of !! can be a function call. !! will evaluate the call and insert the results in the AST:

Note that !! preserves operator precedence because it works with expressions.

If we simply pasted the text of the expressions together, we’d end up with x + 1 / x + 2, which has a very different AST:

17.4.2 Unquoting a function

!! is most commonly used to replace the arguments to a function, but you can also use it to replace the function itself. The only challenge here is operator precedence: expr(!!f(x, y)) unquotes the result of f(x, y), so you need an extra pair of parentheses.

This also works when f is itself a call:

Because of the large number of parentheses involved, it can be more clear to use rlang::call2():

17.4.3 Unquoting a missing argument

Very occasionally it is useful to unquote a missing argument (Section 16.6.2), but the naive approach doesn’t work:

You can work around this with the maybe_missing() helper:

17.4.4 Unquoting in special forms

There are a few special forms where unquoting is a syntax error:

Here you need to use the prefix form:

17.4.5 Unquoting many arguments

!! is a one-to-one replacement. !!! (called “unquote-splice”, and pronounced bang-bang-bang) is a one-to-many replacement. It takes a list of expressions and inserts them at the location of the !!!:

!!! can be used in any rlang function that takes ... regardless of whether or not ... is quoted or evaluated. We’ll come back to this in Section 17.6; for now note that this can be useful in call2().

17.4.6 The polite fiction of !!

So far we have acted as if !! and !!! are regular prefix operators like + , -, and !. They’re not. From R’s perspective, !! and !!! are simply the repeated application of !:

!! and !!! behave specially inside all quoting functions powered by rlang, where they behave like real operators with precedence equivalent to unary + and -. This requires considerable work inside rlang, but means that you can write !!x + !!y instead of (!!x) + (!!y).

The biggest downside66 to using a fake operator is that you might get silent errors when misusing !! outside of quasiquoting functions. Most of the time this is not an issue because !! is typically used to unquote expressions or quosures. Since expressions are not supported by the negation operator, you will get an argument type error in this case:

But you can get silently incorrect results when working with numeric values:

Given these drawbacks, you might wonder why we introduced new syntax instead of using regular function calls. Indeed, early versions of tidy eval used function calls like UQ() and UQS(), however they’re not really function calls, and pretending they leads to a misleading mental mode. We chose !! and !!! as the least-worst solution:

  • The are visually strong and don’t look like existing syntax. When you see !!x or !!!x it’s clear that something unusual is happenning.

  • They overrides a rarely used piece of syntax, as double negation is not a common pattern in R67. If you you do need it, you can just add parentheses !(!x).

17.4.7 Non-standard ASTs

With unquoting, it’s easy to create non-standard ASTs, i.e. ASTs that contain components that are not expressions. (It is also possible to create non-standard ASTs by directly manipulating the underlying objects, but it’s harder to do so accidentally.) These are valid, and occasionally useful, but their correct use is beyond the scope of this book. It’s important to learn about them, however, because they can be deparsed, and hence printed, in misleading ways.

For example, if you inline more complex objects, their attributes are not printed. This can lead to confusing output:

You have two main tools to reduce this confusion: rlang::expr_print() and lobstr::ast():

Another confusing case arises if you inline an integer sequence:

It’s also possible to create regular ASTs that can not be generated from code because of operator precedence. In this case, R will print parentheses that do not exist in the AST:

17.4.8 Exercises

  1. Given the following components:

    Use quasiquotation to construct the following calls:

  2. Explain why both !0 + !0 and !1 + !1 return FALSE while !0 + !1 returns TRUE.

  3. Base functions match.fun(), page(), and ls() all try to automatically determine whether you want standard or non-standard evaluation. Each uses a different approach. Figure out the essence of each approach by reading the source code, then compare and contrast the techniques.

  4. The following two calls print the same, but are actually different:

    What’s the difference? Which one is more natural?

17.5 Non-quoting in base R

Base R has one function that implements quasiquotation: bquote(). It uses .() for unquoting:

However, bquote() isn’t used in any other function in base R. Instead functions that quote an argument use some other technique to allow indirect specification. Rather than using use unquoting all base R approaches selectively turn quoting off, so I call them non-quoting techniques.

There are four basic forms seen in base R:

  • A pair of quoting and non-quoting functions. For example, $ has two arguments, and the second argument is quoted. This is easier to see if you write in prefix form: mtcars$cyl is equivalent to `$`(mtcars, cyl). If you want to refer to a variable indirectly, you use [[, as it takes the name of a variable as a string.

    There are three other quoting functions closely related to $: subset(), transform(), and with(). These are seen as wrappers around $ only suitable for interactive use so they all have the same non-quoting alternative: [

    <-/assign() and ::/getExportedValue() work similarly to $/[.

  • A pair of quoting and non-quoting arguments. For example, rm() allows you to provide bare variable names in ..., or a character vector of variable names in list:

    data() and save() work similarly.

  • An argument that controls whether a different argument is quoting or non-quoting. For example, in library(), the character.only argument controls the quoting behaviour of the first argument, package:

    demo(), detach(), example(), and require() work similarly.

  • Quoting if evaluation fails. For example, the first argument to help() is non-quoting if it evaluates to a string; if evaluation fails, the first argument is quoted.

    ls(), page(), and match.fun() work similarly.

Another important class of quoting functions are the base modelling and plotting functions, which follow the so-called standard non-standard evaluation rules: http://developer.r-project.org/nonstandard-eval.pdf. For example, lm() quotes the weight and subset arguments, and when used with a formula argument, the plotting function quotes the aesthetic arguments (col, cex, etc):

These functions have no built-in options for indirect specification, but you’ll learn how to simulate unquoting in Section 18.6.3.

17.6 Dot-dot-dot (...)

!!! is useful because it’s not uncommon to have a list of expressions that you want to insert into a call. It turns out that this pattern is common elsewhere. Take the following two motivating problems:

  • What do you do if the elements you want to put in ... are already stored in a list? For example, imagine you have a list of data frames that you want to rbind() together:

    You could solve this specific case with rbind(dfs$a, df$b), but how do you generalise that solution to a list of arbitrary length?

  • What do you do if you want to supply the argument name indirectly? For example, imagine you want to create a single column data frame where the name of the column is specified in a variable:

    In this case, you could create a data frame and then change names (i.e. setNames(data.frame(val), var)), but this feels inelegant. How can we do better?

One way to think about these problems is to draw explicit parallels to quasiquotation:

Base R takes a different approach, which we’ll come back to Section @ref{do-call}.

We say functions that support these tools, without quoting arguments, have tidy dots68. To gain tidy dots behaviour in your own function, all you need to do is use list2().

17.6.2 exec()

What if you want to use this technique with a function that doesn’t have tidy dots? One option is to use rlang::exec() which makes it easy to call functions with some arguments supplied directly (in …) and others indirectly (in a list):

And also makes it possible to supply argument names indirectly:

And finally, it’s useful if you have a vector of function names or a list of functions that you want to call with the same arguments:

exec() is closely related to call2(); where call2() returns an expression, exec() evaluates it.

17.6.3 dots_list()

list2() provides one other handy feature: by default it will ignore any empty arguments at the end. This is useful in functions like tibble::tibble() because it means that you can easily change the order of variables without worrying about the final comma:

list2() is a wrapper around rlang::dots_list() with defaults set to the most commonly used settings. You can get more control by calling dots_list() directly:

17.6.4 With base R

Base R provides a swiss-army knife to solve these problems: do.call(). do.call() has two main arguments. The first argument, what, gives a function to call. The second argument, args, is a list of arguments to pass to that function, and so do.call("f", list(x, y, z)) is equivalent to f(x, y, z).

Some base functions (including interaction(), expand.grid(), options(), and par()) use a trick to avoid do.call(): if the first component of ... is a list, they’ll take its components instead of looking at the other elements of .... The implementation looks something like this:

Another approach to avoiding do.call() is found in the RCurl::getURL() function written by Duncan Temple Lang. getURL() takes both ... and .opts which are concatenated together. This looks something like this:

At the time I discovered it, I found this technique particularly compelling so you can see it used throughout the tidyverse. Now, however, I prefer the approach described next.

17.6.5 Exercises

  1. One way to implement exec() below. Describe how it works. What are the key ideas?

  2. Carefully read the source code for interaction(), expand.grid(), and par(). Compare and contrast the techniques they use for switching between dots and list behaviour.

  3. Explain the problem with this definition of set_attr()

17.7 Case studies

To make the ideas of quasiquotation concrete, this section contains a few smaller case studies that show how you can use it to solve real problems. Some of the case studies also use purrr: I find the combination of quasiquotation and functional programming to be particularly elegant.

Most users of quasiquotation will not involve expr() or exprs() but will instead involve a function that calls enexpr() or enexprs(). You should be able tell if a function uses quasiquotation because the documentation will mention it, and consequently if you use enexpr() in a documented function, make sure to mention quasiquotation.

This technique allows you to write functions that wrap around quasiquotation functions with a simple pattern: quote with enexpr() then unquote with !!:

17.7.1 lobstr::ast()

Quasiquotation allows us to solve an annoying problem with lobstr::ast(): what happens if we’ve already captured the expression?

Because ast() supports quasiquotation, we can use !!:

17.7.2 Map-reduce to generate code

Quasiquotation gives us powerful tools for generating code, particularly when combined with purrr::map() and purr::reduce(). For example, assume you have a linear model specified by the following coefficients:

And you want to convert it into an expression like 10 + (5 * x1) + (-4 * x2). The first thing we need to do is turn the character names vector into a list of symbols. rlang::syms() is designed precisely for this case:

Next we need to combine each variable name with its coefficient. We can do this by combining rlang::expr() with purrr::map2():

In this case, the intercept is also a part of the sum, although it doesn’t involve a multiplication. We can just add it to the start of the summands vector:

Finally, we need to reduce the individual terms into a single sum by adding the pieces together:

We could make this even more general by allowing the user to supply the name of the coefficient, and instead of assuming many different variables, index into a single one.

And finish by wrapping this up into a function:

Note the use of ensym(). We want the user to supply the name of a single variable, not a more complex expression.

We could even make this into a function-generating function:

17.7.3 Slicing an array

An occassionally useful tool missing from base R is the ability to extract a slice of an array given a dimension and an index. For example, we’d like to write slice(x, 2, 1) to extract the first slice along the second dimension, which you can write as x[, 1, ]. This is a moderately challenging problem because it requires working with missing arguments.

We’ll need to generate a call with multiple missing arguments. Fortunately that’s easy with rep() and missing_arg(). Once we have those arguments, we can unquote-splice them into a call:

Then we use subset-assignment to insert the index in the desired position:

We then wrap this into a function, using a couple of stopifnot()s to make the interface clear:

A real slice() would evaluate the generated call, but here I think it’s more illuminating to see the code that’s generated, as that’s the hard part of the challenge.

17.7.4 Creating functions

Another powerful application of quotation is creating functions “by hand”, using rlang::new_function(). It’s a function that create a function from its three components (Section 5.2.2) arguments, body, and (optionally) environment:

One use of new_function() is as an alternative to function factories with scalar or symbol arguments. For example, we could write a function that generates functions that raise a function to the power of a number.

(Note that power() is not a quotating function. It inlines the value of exponent, not the expression that generates it.)

Another application of new_function() for functions that work like graphics::curve(). curve() allows you to plot a mathematical expression, without creating a function:

Here x is a pronoun: it doesn’t represent a single concrete value, but is instead a placeholder that varies over the range of the plot. One way to implement curve() is to turn that expression into a function with a single argument, x, then call that function:

Functions, like curve(), that use an expression containing a pronoun are known as anaphoric functions69.

17.7.5 Exercises

  1. In the linear-model example, we could replace the expr() in reduce(summands, ~ expr(!!.x + !!.y)) with call2(): reduce(summands, call2, "+"). Compare and constrast the two approaches. Which do you think is easier to read?

  2. Re-implement the Box-Cox transform defined below using unquoting and new_function():

  3. Re-implement the simple compose() defined below using quasiquotation and new_function():

References

Lumley, Thomas. 2001. “Programmer’s Niche: Macros in R.” R News 1 (3):11–13. https://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf.

Bawden, Alan. 1999. “Quasiquotation in Lisp.” In PEPM ’99, 4–12. http://repository.readscheme.org/ftp/papers/pepm99/bawden.pdf.


  1. You might be familiar with the name Quine from “quines”, computer programs that return a copy of their own source when run.

  2. A fun connection between philosophy and R is in https://johnmacfarlane.net/142/substitutional-quantifiers.pdf; this article is written by philosophy professor John MacFarlane, the author of pandoc, which powers RMarkdown.

  3. This is for compatibility with base R, which allows you to provide a string instead of a symbol in many places: "x" <- 1, "foo"(x, y), c("x" = 1).

  4. Discovered by Peter Meilstrup and described in R-devel on 2018-08-13.

  5. Prior to R 3.5.1, there was another major downside: the R deparser treated !!x as !(!x). This is why in old versions of R you might see extra parentheses when printing expressions. The good news is that these parentheses are not real and can be safely ignored most of the time. The bad news is that they will become real if you reparse that printed output to R code. These roundtripped functions will not work as expected since !(!x) does not unquote.

  6. Unlike, say, javascript, where !!x is a commonly used shortcut to convert an integer into a logical.

  7. This is admittedly not the most creative of names, but it clearly suggests it’s something that has been added to R after the fact.

  8. Anaphoric comes from the linguistics term “anaphora”, an expression that is context dependent. Anaphoric functions are found in Arc (a LISP like language), Perl, and Clojure.