20 Evaluation

20.1 Introduction

The user-facing opposite of quotation is unquotation: it gives the user the ability to selectively evaluate parts of an otherwise quoted argument. The developer-facing complement of quotation is evaluation: this gives the developer of the function the ability to evaluated quoted expressionsin special ways to create domain specific languages for data analysis like ggplot2 and dplyr.

library(rlang)
20.1.0.0.1 Outline
20.1.0.0.2 Prerequisites

Environments play a big role in evaluation, so make sure you’re familiar with Environments before continuing.

20.2 Evaluation basics

In the previous chapter, we briefly mentioned eval(). Here, rather than starting with eval(), we’re going to start with rlang::eval_bare() which is the purest evocation of the idea of evaluation. The first argument, expr is an expression to evaluate. This will usually be either a symbol or expression:

x <- 10
eval_bare(expr(x))
#> [1] 10

y <- 2

eval_bare(expr(x + y))
#> [1] 12

The second argument, env, gives the environment in which the expression should be evaluated, i.e. where should the values of x, y, and + be looked for?

env <- env(x = 1000)
eval_bare(expr(x + y), env)
#> [1] 1002

As well as symbols and expressions, eval_bare() (like all the evaluation functions), also takes any other R object. This makes eval_bare() very general, but it can lead to confusing results if you forget to quote() the input:

eval_bare(x + y)
#> [1] 12
eval_bare(x + y, env = env)
#> [1] 12

Now that you’ve seen the basics, let’s explore some applications. We’ll focus primarily on base R functions that you might have used before but not fully understood. Extracting out their essence and rewriting to use the helper functions in rlang should help illustrate the underlying principles.

20.2.1 Application: local()

Sometimes you want to perform a chunk of calculation that uses a bunch of intermediate variables. The intermediate variables have no long term use, so you’d rather not keep them around. One approach is to use rm() to clean up after yourself. Or you could create an anonymous function and then call it. A more elegant approach is to use local():

foo <- local({
  x <- 10
  y <- 200
  x + y
})

foo
#> [1] 210
x
#> [1] 10
y
#> [1] 2

The essence of local() is quite simple. We capture the expression, and create an new environment in which to evaluate it, which inherits from the caller environment.

local2 <- function(expr, env = child_env(caller_env())) {
  eval_bare(enexpr(expr), env)
}

It’s a bit harder to understand how base::local(), as it takes uses eval() and substitute() together in rather complicated ways.

20.2.2 Application: source()

With expr_text() and eval_tidy(), it’s possible to write a simple version of source(). We read in the file from disk, parse_expr() it, and then tidy_eval() each component in the specified environment. This version evaluates in the caller environment, and invisibly returns the result of the last expression in the file (like source()).

source2 <- function(file, env = caller_env()) {
  lines <- readLines(file, warn = FALSE)
  code <- paste(lines, collapse = "\n")
  exprs <- parse_exprs(code)

  res <- NULL
  for (i in seq_along(exprs)) {
    res <- eval_bare(exprs[[i]], env)
  }
  
  invisible(res)
}

The real source() is considerably more complicated because it can echo input and output, and also has many additional settings to control behaviour. Note that base::eval() can take an expression object, in which case it evaluates each component in turn. eval_bare() does not support expressions.

20.2.3 Base R

  • eval()

  • evalq(x, env) shortcut for eval(quote(x), env)

  • eval.parent(expr, n), shortcut for eval(x, env = parent.frame(n))

20.2.4 Exercises

  1. Carefully read the documentation for source(). What environment does it use by default? What if you supply local = TRUE? How do you provide a custom argument?

  2. Predict the results of the following lines of code:

    eval(quote(eval(quote(eval(quote(2 + 2))))))
    eval(eval(quote(eval(quote(eval(quote(2 + 2)))))))
    quote(eval(quote(eval(quote(eval(quote(2 + 2)))))))
  3. Write an equivalent to get() using sym() and eval_bare(). Write an equivalent to assign() using sym(), expr(), and eval_bare(). (Don’t worry about the multiple ways of choosing an environment that get() and assign() support; assume that the user supplies it explicitly.)

    # name is a string
    get2 <- function(name, env) {}
    assign2 <- function(name, value, env) {}
  4. Modify source2() so it returns the result of every expression, not just the last one. Can you eliminate the for loop?

  5. The code generated by source2() lacks source references. Read the source code for sys.source() and the help for srcfilecopy(), then modify source2() to preserve source references. You can test your code by sourcing a function that contains a comment. If successful, when you look at the function, you’ll see the comment and not just the source code.

  6. The third argument in subset() allows you to select variables. It treats variable names as if they were positions. This allows you to do things like subset(mtcars, , -cyl) to drop the cylinder variable, or subset(mtcars, , disp:drat) to select all the variables between disp and drat. How does this work? I’ve made this easier to understand by extracting it out into its own function that uses tidy evaluation.

    select <- function(df, vars) {
      vars <- enexpr(vars)
      var_pos <- set_names(as.list(seq_along(df)), names(df))
    
      cols <- eval_tidy(vars, var_pos)
      df[, cols, drop = FALSE]
    }
    select(mtcars, -cyl)

20.3 Quosures

The simplest form of evaluation couples an expression and an environment. This coupling is sufficiently important that we need a data structure that captures both pieces. We call this data structure a quosure, a portmanteau of quoting and closing.

20.3.1 Motivation

Quosures are particularly important when capturing arguments to a function. Take this simple example:

compute_mean <- function(df, x) {
  x <- enexpr(x)
  dplyr::summarise(df, mean = mean(!!x))
}

compute_mean(mtcars, mpg)
#>   mean
#> 1 20.1

It contains a subtle bug, which we can illustrate with this slightly forced example:

x <- 10
compute_mean(mtcars, log(mpg, base = x))
#> Error in summarise_impl(.data, dots): Evaluation error: non-numeric argument to mathematical function.

We get this error because inside the function x an AST. We don’t want arguments supplied to the function to look up variables inside the function. We want arguments to look up values of symbols in the place they are supposed to: the environment associated with that argument.

We can fix the bug by not just capturing the expression, but also capturing where it should be evaluated. That’s the job of enquo(), which otherwise works identically to enexpr().

compute_mean <- function(df, x) {
  x <- enquo(x)
  dplyr::summarise(df, mean = mean(!!x))
}

compute_mean(mtcars, log(mpg, base = x))
#>   mean
#> 1 1.28

20.3.2 Manipulating

x <- quo(x + 1)
quo_get_env(x)
#> <environment: R_GlobalEnv>
quo_get_expr(x)
#> x + 1

Can create from expression + environment with new_quosure() (but rarely needed).

For labelling:

quo_name(x)
#> [1] "x + 1"
quo_label(x)
#> [1] "`x + 1`"
quo_text(x)
#> [1] "x + 1"

20.3.3 Compared to

20.3.3.1 Expressions

  • expr() -> quo(), exprs() -> quos(): for experimenting interactively and for generating fixed expressions inside a function

  • enexpr() -> enquo(), enexprs() -> enquos(): for capturing what the user supplied to an argument.

Almost always want to use enquo() instead of enexpr(). The primary exception is when you’re working with a function that does not use tidy evaluation

20.3.3.2 Promises

These functions work because internally R represents function arguments with a special type of object called a promise. A promise captures the expression needed to compute the value and the environment in which to compute it. You’re not normally aware of promises because the first time you access a promise its code is evaluated in its environment, yielding a value.

Promises are hard to work with because they are quantum - attempting to look at them in R changes their behaviour.

Promise can only be evaluated once. At C level, promise objects stores expression, environment, and value (if evaluated).

A quosure captures a promise into a concrete form that requires explicit evaluation.

20.3.3.3 Formulas

The main inspiration for the quosure was the formula operator, ~, which also captures both the expression and its environment, and is used extremely heavily in R’s modelling functions.

~ is most similar to quo(), the main differences being:

  • ~ is not paired with an unquoting operator
  • ~ has two sides

(There’s no equivalent to enquo() or quos() etc.)

20.3.4 Exercises

  1. What does transform() do? Read the documentation. How does it work? Read the source code for transform.data.frame(). What does substitute(list(...)) do?

  2. What does with() do? How does it work? Read the source code for with.default(). What does within() do? How does it work? Read the source code for within.data.frame(). Why is the code so much more complex than with()?

20.4 Tidy evaluation

Tidy evaluation is the combination of three big ideas:

  • Quasiquotation to give the user control
  • Quosures to capture arguments expressions and their evaluation environment
  • A data mask + pronouns to reduce ambiguity

You’ve learned about quaisquotation and quosures, now time to learn about the data mask and how it why it’s important.

20.4.1 eval_tidy()

Once you have a quosure, you will need to use eval_tidy() instead of eval_bare().

x <- 10
eval_bare(expr(x), globalenv())
#> [1] 10
eval_tidy(quo(x))
#> [1] 10

While like eval_bare(), eval_tidy() has a env argument, you will typically not use it, because the environment is contained in the first arugment. Instead the second argument is data, which allows you to set up a data mask. This allows you to mask some variables (that would usually be looked up from the environment) with variables in a list or data frame. This is the key idea that powers helpful base R functions like with(), subset() and transform().

eval_tidy(quo(cyl + x), mtcars)
#>  [1] 16 16 14 16 18 16 18 14 14 16 16 18 18 18 18 18 18 14 14 14 14 18 18
#> [24] 18 18 14 14 14 18 16 18 14

Unlike environments, list and data frames don’t have parent-child relationships. When you use the data argument (of the enclos argument in base::eval()) you’re effectively create a new environment that contains the values of data and has a parent of env.

Performance overhead?

20.4.2 Base R

enclos argument.

20.4.3 Application: subset()

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))

subset(sample_df, a >= 4)
#>   a b c
#> 4 4 2 4
#> 5 5 1 1
# equivalent to:
# sample_df[sample_df$a >= 4, ]

subset(sample_df, b == c)
#>   a b c
#> 1 1 5 5
#> 5 5 1 1
# equivalent to:
# sample_df[sample_df$b == sample_df$c, ]

subset.data.frame()

subset2 <- function(data, subset) {
  subset <- enquo(subset)
  rows <- eval_tidy(subset, data)
  
  data[rows, , drop = FALSE]
}

subset(sample_df, b == c)
#>   a b c
#> 1 1 5 5
#> 5 5 1 1

Compared to base::subset() this will support quasiquotation (thanks to enquos()):

var <- expr(b)
val <- 5

subset2(sample_df, !!var == c)
#>   a b c
#> 1 1 5 5
#> 5 5 1 1

20.4.4 Lexical scoping, ambiguity, and pronouns

threshold_x <- function(df, val) {
  subset2(df, x >= val)
}

How can this function fail? There are two main ways:

  • df might not contain a variable called x. Depending on what variables exist in the global environment this might either return the incorrect results:

    no_x <- data.frame(y = 1:3)
    threshold_x(no_x, 2)
    #>   y
    #> 1 1
    #> 2 2
    #> 3 3

Or throw an error:

```r
rm(x)
threshold_x(no_x, 2)
#> Error in eval_tidy(subset, data): object 'x' not found
```
  • df might contain a variable called val, in which case the function will silently return an incorrect value:

    has_val <- data.frame(x = 1:3, val = 9:11)
    threshold_x(has_val, 2)
    #> [1] x   val
    #> <0 rows> (or 0-length row.names)

These failure modes arise because tidy evaluation is ambiguous: for each variable look up, it looks first in the data and then in the environment. But in this case, we always want to look up x in the data and val in the environment. To avoid this problem we can use pronouns:

threshold_x <- function(df, val) {
  subset2(df, .data$x >= .env$val)
}
x <- 10
threshold_x(no_x, 2)
#> Error: Column `x` not found in `.data`
threshold_x(has_val, 2)
#>   x val
#> 2 2  10
#> 3 3  11

Generally, whenever you use the .env pronoun, you can use unquoting instead:

threshold_x <- function(df, val) {
  subset2(df, .data$x >= !!val)
}

There are subtle differences in when val is evaluated. If you unquote, it is evaluated at quotation time; if you use a pronoun, it is evaluated at evaluation time. These differences usually don’t matter, so pick the form that looks most natural.

What if we generalise threshold_x() slightly so that the user can pick the variable used for thresholding. There are two basic approaches:

threshold <- function(df, var, val) {
  var <- ensym(var)
  subset2(df, `$`(data, !!var) >= !!val)
}

threshold <- function(df, var, val) {
  var <- as.character(ensym(var))
  subset2(df, data[[!!var]] >= !!val)
}
  • Both now involve capturing a symbol. Things fundamentally change if we capture an expression and we’ll see next.

  • df$!!var is not valid R syntax; we have to use prefix form. Alternatively we can use [[ and supply a string instead.

What if we generalise further to allow thresholding based on any expression. You could write:

threshold <- function(df, expr, val) {
  expr <- enquo(var)
  subset2(df, !!expr >= !!val)
}

There’s no way to ensure that expr is only evaluated in the data, and indeed that might not even be desirable because the user may use an expression that includes variables from the data and from the local environments. In this case, it is now the users responsibility to avoid ambiguity.

This particular function is now not very useful because it’s so general - you might as well just use subset2() directly.

20.4.5 Application: arrange()

  • Capture dots
  • Evaluate
  • Combine
  • Subset
invoke <- function(fun, ...) do.call(fun, dots_list(...))

arrange <- function(.data, ..., .na.last = TRUE) {
  args <- quos(...)
  
  ords <- purrr::map(args, eval_tidy, data = .data)
  ord <- invoke(order, !!!ords, na.last = .na.last)
  
  .data[ord, , drop = FALSE]
}

arrange(mtcars, cyl)
#>                      mpg cyl  disp  hp drat   wt qsec vs am gear carb
#> Datsun 710          22.8   4 108.0  93 3.85 2.32 18.6  1  1    4    1
#> Merc 240D           24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
#> Merc 230            22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
#> Fiat 128            32.4   4  78.7  66 4.08 2.20 19.5  1  1    4    1
#> Honda Civic         30.4   4  75.7  52 4.93 1.61 18.5  1  1    4    2
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.83 19.9  1  1    4    1
#> Toyota Corona       21.5   4 120.1  97 3.70 2.46 20.0  1  0    3    1
#> Fiat X1-9           27.3   4  79.0  66 4.08 1.94 18.9  1  1    4    1
#> Porsche 914-2       26.0   4 120.3  91 4.43 2.14 16.7  0  1    5    2
#> Lotus Europa        30.4   4  95.1 113 3.77 1.51 16.9  1  1    5    2
#> Volvo 142E          21.4   4 121.0 109 4.11 2.78 18.6  1  1    4    2
#> Mazda RX4           21.0   6 160.0 110 3.90 2.62 16.5  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.88 17.0  0  1    4    4
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.21 19.4  1  0    3    1
#> Valiant             18.1   6 225.0 105 2.76 3.46 20.2  1  0    3    1
#> Merc 280            19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
#> Merc 280C           17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
#> Ferrari Dino        19.7   6 145.0 175 3.62 2.77 15.5  0  1    5    6
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.44 17.0  0  0    3    2
#> Duster 360          14.3   8 360.0 245 3.21 3.57 15.8  0  0    3    4
#> Merc 450SE          16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
#> Merc 450SL          17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.25 18.0  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.42 17.8  0  0    3    4
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.34 17.4  0  0    3    4
#> Dodge Challenger    15.5   8 318.0 150 2.76 3.52 16.9  0  0    3    2
#> AMC Javelin         15.2   8 304.0 150 3.15 3.44 17.3  0  0    3    2
#> Camaro Z28          13.3   8 350.0 245 3.73 3.84 15.4  0  0    3    4
#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.85 17.1  0  0    3    2
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.17 14.5  0  1    5    4
#> Maserati Bora       15.0   8 301.0 335 3.54 3.57 14.6  0  1    5    8
arrange(mtcars, vs, -am)
#>                      mpg cyl  disp  hp drat   wt qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.62 16.5  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.88 17.0  0  1    4    4
#> Porsche 914-2       26.0   4 120.3  91 4.43 2.14 16.7  0  1    5    2
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.17 14.5  0  1    5    4
#> Ferrari Dino        19.7   6 145.0 175 3.62 2.77 15.5  0  1    5    6
#> Maserati Bora       15.0   8 301.0 335 3.54 3.57 14.6  0  1    5    8
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.44 17.0  0  0    3    2
#> Duster 360          14.3   8 360.0 245 3.21 3.57 15.8  0  0    3    4
#> Merc 450SE          16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
#> Merc 450SL          17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.25 18.0  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.42 17.8  0  0    3    4
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.34 17.4  0  0    3    4
#> Dodge Challenger    15.5   8 318.0 150 2.76 3.52 16.9  0  0    3    2
#> AMC Javelin         15.2   8 304.0 150 3.15 3.44 17.3  0  0    3    2
#> Camaro Z28          13.3   8 350.0 245 3.73 3.84 15.4  0  0    3    4
#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.85 17.1  0  0    3    2
#> Datsun 710          22.8   4 108.0  93 3.85 2.32 18.6  1  1    4    1
#> Fiat 128            32.4   4  78.7  66 4.08 2.20 19.5  1  1    4    1
#> Honda Civic         30.4   4  75.7  52 4.93 1.61 18.5  1  1    4    2
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.83 19.9  1  1    4    1
#> Fiat X1-9           27.3   4  79.0  66 4.08 1.94 18.9  1  1    4    1
#> Lotus Europa        30.4   4  95.1 113 3.77 1.51 16.9  1  1    5    2
#> Volvo 142E          21.4   4 121.0 109 4.11 2.78 18.6  1  1    4    2
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.21 19.4  1  0    3    1
#> Valiant             18.1   6 225.0 105 2.76 3.46 20.2  1  0    3    1
#> Merc 240D           24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
#> Merc 230            22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
#> Merc 280            19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
#> Merc 280C           17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
#> Toyota Corona       21.5   4 120.1  97 3.70 2.46 20.0  1  0    3    1

Missing: any error checking. Should at least check that each input yields a vector the same length as .data.

20.4.6 Multiple environments

Note that when using ... each component can have a different environment associated with it:

f <- function(...) {
  x <- 1
  g(..., x1 = x)
}
g <- function(...) {
  x <- 2
  h(..., x2 = x)
}
h <- function(...) {
  enquos(...)
}

x <- 0
qs <- f(x0 = x)
qs
#> $x0
#> <quosure>
#>   expr: ^x
#>   env:  global
#> 
#> $x1
#> <quosure>
#>   expr: ^x
#>   env:  0x30781c0
#> 
#> $x2
#> <quosure>
#>   expr: ^x
#>   env:  0x3078428
purrr::map(qs, quo_get_expr)
#> $x0
#> x
#> 
#> $x1
#> x
#> 
#> $x2
#> x
purrr::map(qs, quo_get_env)
#> $x0
#> <environment: R_GlobalEnv>
#> 
#> $x1
#> <environment: 0x30781c0>
#> 
#> $x2
#> <environment: 0x3078428>
purrr::map_dbl(qs, eval_tidy)
#> x0 x1 x2 
#>  0  1  2

20.4.7 Embedded quosures

make_x <- function(x) quo(x)
thirty <- quo(!!make_x(0) + !!make_x(10) + !!make_x(20))
thirty
#> <quosure>
#>   expr: ^(^x) + (^x) + (^x)
#>   env:  global

(Note that because quosures capture the complete environment you need to be a little careful if your function returns quosures. If you have large temporary objects they will not get gc’d until the quosure has been gc’d. See XXXXXXX for more details.)

If you’re viewing from the console, you’ll see that each quosure is coloured - the point of the colours is to emphasise that the quosures have different environments associated with them even though the expressions are the same.

eval_tidy(thirty)
#> [1] 30

This was a lot of work to get right. But means that quosures just work, even when embedded inside other quosures.

Note that this code doesn’t make any sense at all if we use expressions instead of quosures equivalents, the environment is never captured so all we have

make_x <- function(x) expr(x)
thirty <- expr(!!make_x(0) + !!make_x(10) + !!make_x(20))

thirty
#> x + x + x
eval_tidy(thirty)
#> [1] 0

20.4.8 When not to use quosures

  • In code generation.

  • When expression will be evaluated completely in data context

  • To call functions that don’t use tidy eval; fuller example next.

Sometimes you can avoid using a quosure by inlining/unquoting values.

base <- 2
quo(log(x, base = base))
#> <quosure>
#>   expr: ^log(x, base = base)
#>   env:  global
expr(log(x, base = !!base))
#> log(x, base = 2)

20.4.9 Exercises

  1. Improve subset2() to make it more like real subset function (subset.data.frame()):

    • All drop rows where subset evaluates to NA
    • Give a clear error message if subset doesn’t evalute to a logical vector
    • What happens if subset doesn’t yield a logical vector with length equal to the number of rows in data? What do you think should happen?
  2. What happens if you use expr() instead of enexpr() inside of subset2()?

  3. Implement a form of arrange() where you can request a variable to sorted in descending order using named arguments:

    arrange(mtcars, cyl, desc = mpg, vs)

    (Hint: The descreasing argument to order() will not help you. Instead, look at the definition of dplyr::desc(), and read the help for xtfrm().)

  4. Implement with() (code in with.default()).

  5. Implement a version of within.data.frame() that uses tidy evaluation. Read the documentation and make sure that you understand what within() does, then read the source code.

  6. Implement transform() (code in transform.data.frame()). Extend it so that a variable can refer to the variables just defined.

20.5 Case study: calling base NSE functions

We can combine expr() with eval_bare() to create wrappers around base NSE functions that don’t provide an escape hatch for quoting. Here we’ll focus on models, since since standard NSE doesn’t provide unquoting tool. But can use the same ideas with base graphics and any other function.

20.5.1 Basics

lm() is particularly challenging because it captures and prints the actual call. Ideally we want this to be useful.

lm2 <- function(data, formula, subset = NULL) {
  data <- enexpr(data)
  subset <- enexpr(subset)
  
  lm_call <- expr(lm(!!formula, data = !!data, subset = !!subset))
  eval_bare(lm_call, caller_env())
}
coef(lm2(mtcars, mpg ~ disp))
#> (Intercept)        disp 
#>     29.5999     -0.0412
coef(lm2(mtcars, mpg ~ disp, subset = cyl == 4))
#> (Intercept)        disp 
#>      40.872      -0.135

20.5.2 What environment to use

20.5.3 Missing vs NULL

I think it’s good practice to only leave missing the arguments that the user must supply. Instead, use NULL - has nice property that expr(NULL) is NULL. Then can use %||% and missing_arg() to replace, if needed. One final wrinkle is that unquoting a missing argument will yield an error about the missing argument; wrap in maybe_missing() to suppress

lm3 <- function(data, formula, subset = NULL) {
  data <- enexpr(data)
  subset <- enexpr(subset) %||% missing_arg()
  
  lm_call <- expr(lm(!!formula, data = !!data, subset = !!maybe_missing(subset)))
  eval_bare(lm_call, caller_env())
}
lm2(mtcars, mpg ~ disp)$call
#> lm(formula = mpg ~ disp, data = mtcars, subset = NULL)
lm3(mtcars, mpg ~ disp)$call
#> lm(formula = mpg ~ disp, data = mtcars)

20.5.4 Making formulas

First let’s show how you could generate a formula. Tricky thing about formulas is that the look same evaluated or not

y ~ x
#> y ~ x
expr(y ~ x)
#> y ~ x

But they’re not - you need to evaluate the call to get an actual formula:

class(y ~ x)
#> [1] "formula"
class(expr(y ~ x))
#> [1] "call"

Here’s a simple example of generating a formula in a different way:

build_formula <- function(resp, ..., env = caller_env()) {
  resp <- enexpr(resp)
  preds <- enexprs(...)
  
  pred_sum <- purrr::reduce(preds, ~ expr(!!.x + !!.y))
  eval_bare(expr(!!resp ~ !!pred_sum), env = env)
}
build_formula(y, a, b, c)
#> y ~ a + b + c

Can use the techniques described in the previous chapter to allow you to choose the interface to this function.

20.5.5 Exercises

20.6 Capturing the current call

(Where should this go???)

Many base R functions use the current call: the expression that caused the current function to be run. There are two ways to capture a current call:

  • sys.call() captures exactly what the user typed.

  • match.call() makes a call that only uses named arguments. It’s like automatically calling pryr::standardise_call() on the result of sys.call()

The following example illustrates the difference between the two:

f <- function(abc = 1, def = 2, ghi = 3) {
  list(sys = sys.call(), match = match.call())
}
f(d = 2, 2)
#> $sys
#> f(d = 2, 2)
#> 
#> $match
#> f(abc = 2, def = 2)

Modelling functions often use match.call() to capture the call used to create the model. This makes it possible to update() a model, re-fitting the model after modifying some of original arguments. Here’s an example of update() in action:

mod <- lm(mpg ~ wt, data = mtcars)
update(mod, formula = . ~ . + cyl)
#> 
#> Call:
#> lm(formula = mpg ~ wt + cyl, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)           wt          cyl  
#>       39.69        -3.19        -1.51

How does update() work? We can rewrite it using some tools from pryr to focus on the essence of the algorithm.

update_call <- function (object, formula., ...) {
  call <- object$call

  # Use update.formula to deal with formulas like . ~ .
  if (!missing(formula.)) {
    call$formula <- update.formula(formula(object), formula.)
  }

  modify_call(call, dots(...))
}
update_model <- function(object, formula., ...) {
  call <- update_call(object, formula., ...)
  eval(call, parent.frame())
}
update_model(mod, formula = . ~ . + cyl)

The original update() has an evaluate argument that controls whether the function returns the call or the result. But I think it’s better, on principle, that a function returns only one type of object, rather than different types depending on the function’s arguments.

This rewrite also allows us to fix a small bug in update(): it re-evaluates the call in the global environment, when what we really want is to re-evaluate it in the environment where the model was originally fit — in the formula.

f <- function() {
  n <- 3
  lm(mpg ~ poly(wt, n), data = mtcars)
}
mod <- f()
update(mod, data = mtcars)
#> Error in degree < 1: comparison (3) is possible only for atomic and list types

update_model <- function(object, formula., ...) {
  call <- update_call(object, formula., ...)
  eval(call, environment(formula(object)))
}
update_model(mod, data = mtcars)
#> Error in update_call(object, formula., ...): could not find function "update_call"

This is an important principle to remember: if you want to re-run code captured with match.call(), you also need to capture the environment in which it was evaluated, usually the parent.frame(). The downside to this is that capturing the environment also means capturing any large objects which happen to be in that environment, which prevents their memory from being released. This topic is explored in more detail in garbage collection.

Some base R functions use match.call() where it’s not necessary. For example, write.csv() captures the call to write.csv() and mangles it to call write.table() instead:

write.csv <- function(...) {
  Call <- match.call(expand.dots = TRUE)
  for (arg in c("append", "col.names", "sep", "dec", "qmethod")) {
    if (!is.null(Call[[arg]])) {
      warning(gettextf("attempt to set '%s' ignored", arg))
    }
  }
  rn <- eval.parent(Call$row.names)
  Call$append <- NULL
  Call$col.names <- if (is.logical(rn) && !rn) TRUE else NA
  Call$sep <- ","
  Call$dec <- "."
  Call$qmethod <- "double"
  Call[[1L]] <- as.name("write.table")
  eval.parent(Call)
}

To fix this, we could implement write.csv() using regular function call semantics:

write.csv <- function(x, file = "", sep = ",", qmethod = "double", 
                      ...) {
  write.table(x = x, file = file, sep = sep, qmethod = qmethod, 
    ...)
}

This is much easier to understand: it’s just calling write.table() with different defaults. This also fixes a subtle bug in the original write.csv(): write.csv(mtcars, row = FALSE) raises an error, but write.csv(mtcars, row.names = FALSE) does not. The lesson here is that it’s always better to solve a problem with the simplest tool possible.

20.6.1 Exercises

  1. Compare and contrast update_model() with update.default().

  2. Why doesn’t write.csv(mtcars, "mtcars.csv", row = FALSE) work? What property of argument matching has the original author forgotten?

  3. Rewrite update.formula() to use R code instead of C code.

  4. Sometimes it’s necessary to uncover the function that called the function that called the current function (i.e., the grandparent, not the parent). How can you use sys.call() or match.call() to find this function?