# 16 Expressions

## 16.1 Introduction

To compute on the language, we first need to understand its structure. That requires some new vocabulary, some new tools, and some new ways of thinking about R code. The first of these is the distinction between an operation and its result. Take the following code, which multiplies a variable `x`

by 10 and saves the result to a new variable called `y`

. It doesn’t work because we haven’t defined a variable called `x`

:

```
y <- x * 10
#> Error in eval(expr, envir, enclos):
#> object 'x' not found
```

It would be nice if we could capture the intent of the code without executing it. In other words, how can we separate our description of the action from the action itself? One way is to use `rlang::expr()`

:

```
z <- rlang::expr(y <- x * 10)
z
#> y <- x * 10
```

`expr()`

returns an expression, an object that captures the structure of the code without evaluating it (i.e. running it). If you have an expression, you can evaluate it with `base::eval()`

:

```
x <- 4
eval(z)
y
#> [1] 40
```

The focus of this chapter is the data structures that underlie expressions. Mastering this knowledge will allow you to inspect and modify captured code. We’ll come back to `expr()`

in Section **??**, and to `eval()`

in Section 18.2.

### Outline

Section 16.2 introduces the idea of the abstract syntax tree (AST), and reveals the tree like structure that underlies all R code.

Section 16 dives into the details of the data structures that underpin the AST: constants, symbols, and calls, which are collectively known as expressions.

Section 16.4 covers parsing, the act of converting the linear sequence of character in code into the AST, and uses that idea to explore some details of R’s grammar.

Section 16.5 shows you how you can use recursive functions to “compute on the language”, writing functions that compute on expression input.

Section 16.6 circles back to three more specialised data structures: pairlists, missing arguments, and expression vectors.

## 16.2 Abstract syntax trees

Expressions are also called **abstract syntax trees** (ASTs) because the structure of code is hierarchical and can be naturally represented as a tree. Understanding this tree structure is crucial for inspecting and modifying expressions (i.e. metaprogramming).

### 16.2.1 Drawing

We’ll start by introducing some conventions for drawing ASTs, beginning with a simple call that shows their main components: `f(x, "y", 1)`

. I’ll draw trees in two ways:

By “hand” (with omnigraffle):

With

`lobstr::ast()`

^{54}:`lobstr::ast(f(x, "y", 1)) #> █─f #> ├─x #> ├─"y" #> └─1`

Both approaches share conventions as much as possible:

The leaves of the tree are either symbols, like

`f`

and`x`

, or constants, like`1`

or`"y"`

. Symbols are drawn in purple and have rounded corners. Constants have black borders and square corners. Strings and symbols are easily confused, so strings are always surrounded in quotes.The branches of the tree are function calls, represented by call objects and drawn as orange squares. The first child (

`f`

) is the function that gets called; the second and subsequent children (`x`

,`"y"`

, and`1`

) are the arguments to that function.

The above example only contained one function call, making for a very shallow tree. Most expressions will contain considerably more calls, creating trees with multiple levels. For example, consider `f(g(1, 2), h(3, 4, i()))`

:

```
lobstr::ast(f(g(1, 2), h(3, 4, i())))
#> █─f
#> ├─█─g
#> │ ├─1
#> │ └─2
#> └─█─h
#> ├─3
#> ├─4
#> └─█─i
```

You can read the hand-drawn diagrams from left-to-right (ignoring vertical position), and the lobstr-drawn diagrams from top-to-bottom (ignoring horizontal position). The depth within the tree is determined by the nesting of function calls. This also determines evaluation order, as evaluation generally proceeds from deepest-to-shallowest (but this is not guaranteed because of lazy evaluation, Section 5.5). Also note the appearance of `i()`

, a function call with no arguments; it’s a branch with a single (symbol) leaf.

### 16.2.2 Non-code components

You might have wondered why these are *abstract* syntax trees. They are abstract because they only capture important structural details of the code, not whitespace or comments:

```
ast(
f(x, y) # important!
)
#> █─f
#> ├─x
#> └─y
```

There’s one important situation where whitespace does affect the AST:

```
lobstr::ast(y <- x)
#> █─`<-`
#> ├─y
#> └─x
lobstr::ast(y < -x)
#> █─`<`
#> ├─y
#> └─█─`-`
#> └─x
```

### 16.2.3 Infix calls

Every call in R can be written in tree form, even if it doesn’t look like it at first glance. Take `y <- x * 10`

again: what are the functions that are being called? It is not as easy to spot as `f(x, 1)`

because this expression contains two infix calls: `<-`

and `*`

.

However, as discussed in Section 5.8.1, any call can be rewritten in prefix form. That means that these two lines of code are equivalent:

```
y <- x * 10
`<-`(y, `*`(x, 10))
```

And they both have this AST^{55}:

```
lobstr::ast(y <- x * 10)
#> █─`<-`
#> ├─y
#> └─█─`*`
#> ├─x
#> └─10
```

There really is no difference between the ASTs, and if you generate an expression with prefix calls, R will still print them in infix form:

```
expr(`<-`(y, `*`(x, 10)))
#> y <- x * 10
```

The order in which infix operators are applied is governed by a set of rules called operator precedence. In Section 16.4.1 we’ll use `lobstr::ast()`

to explore R’s rules.

### 16.2.4 Exercises

Reconstruct the code represented by the trees below:

`#> █─f #> └─█─g #> └─█─h #> █─`+` #> ├─█─`+` #> │ ├─1 #> │ └─2 #> └─3 #> █─`*` #> ├─█─`(` #> │ └─█─`+` #> │ ├─x #> │ └─y #> └─z`

Draw the following trees by hand then check your answers with

`lobstr::ast()`

.`f(g(h(i(1, 2, 3)))) f(1, g(2, h(3, i()))) f(g(1, 2), h(3, i(4, 5)))`

How are function factories (Chapter 9) shown in the AST? What makes the ASTs of the expressions below special?

`lobstr::ast(f(x)(y)) #> █─█─f #> │ └─x #> └─y lobstr::ast(f(1)(2)(3)(4)) #> █─█─█─█─f #> │ │ │ └─1 #> │ │ └─2 #> │ └─3 #> └─4`

What’s happening with the ASTs below? (Hint: carefully read

`?"^"`

)`lobstr::ast(`x` + `y`) #> █─`+` #> ├─x #> └─y lobstr::ast(x ** y) #> █─`^` #> ├─x #> └─y lobstr::ast(1 -> x) #> █─`<-` #> ├─x #> └─1`

What is special about the AST below? (Hint: re-read Section

**??**)`lobstr::ast(function(x, y) x + y) #> █─`function` #> ├─█─x = `` #> │ └─y = `` #> ├─█─`+` #> │ ├─x #> │ └─y #> └─<inline srcref>`

What does the call tree of an

`if`

statement with multiple`else if`

conditions look like? Why?

## 16.3 Expressions

Collectively, the data structures present in the AST are called expressions. More precisely, an **expression** is any member of the set of base types created by parsing code: constant scalars, symbols, and call objects. These are the data structures used to represent captured code from `expr()`

, and so `is_expression(expr(...))`

is always true^{56}. In the subsequent sections, we’ll come back to define each of these types precisely, and show you how to create, inspect, and modify them.

Expressions use a couple of data structures that we’ll come back to later. Pairlists are used in one place, but behave identically to lists; we’ll discuss them in Section 16.6.1. There also a special symbol used to represent missing arguments; we’ll come back to it in Section 16.6.2.

NB: In base R documentation “expression” is used to mean two things. As well as the definition above, expression is also used to refer to the type of object returned by `expression()`

and `parse()`

, which are basically lists of expressions as defined above. In this book I’ll call these **expression vectors**, and I’ll come back to them in Section 16.6.3.

### 16.3.1 Constants

Constants are the simplest component of the AST, and use data structures that you learned about in Chapter **??**. More precisely, a **constant** is either `NULL`

or a length-1 atomic vector^{57} like `TRUE`

, `1L`

, `2.5`

or `"x"`

. You can test for a constant with `rlang::is_syntactic_literal()`

.

^{58} See Section 3.2.1 for the conventions for creating these scalars.

Constants are “self-quoting” in the sense that the expression used to represent a constant is the constant itself:

```
identical(expr(TRUE), TRUE)
#> [1] TRUE
identical(expr(1), 1)
#> [1] TRUE
identical(expr(2L), 2L)
#> [1] TRUE
identical(expr("x"), "x")
#> [1] TRUE
```

### 16.3.2 Symbols

A **symbol** represents the name of an object like `x`

, `mtcars`

, or `mean`

. In base R, the terms symbol and name are used interchangeably (i.e. `is.name()`

is identical to `is.symbol()`

), but in this book I used symbol consistently because “name” has many other non-technical meanings. You can test for a symbol with `is.symbol()`

.

You can create a symbol in two ways: by capturing code that references an object with `expr()`

, or turning a string into a symbol with `sym()`

:

```
expr(x)
#> x
sym("x")
#> x
```

You can turn a symbol back into a string with `as.character()`

or `as_string()`

. `as_string()`

has the advantage of clearly signalling that you’ll get a character vector of length 1.

```
as_string(expr(x))
#> [1] "x"
```

You can recognise a symbol because it’s printed without quotes, and `str()`

tells you that it’s a symbol:

```
str(expr(x))
#> symbol x
```

The symbol type is not vectorised, i.e. a symbol is always length 1. If you want multiple symbols, you’ll need to put them in a list, using (e.g.) `rlang::syms()`

.

### 16.3.3 Calls

A **call object** represents a captured function call. Call objects are vectors: the first component is name of the function to call (usually represented as a symbol), and the remaining elements are the arguments for that call. Call objects create branches in the AST, because calls can be nested inside other calls.

You can identify a call object when printed because it looks just like a function call. Unfortunately `typeof()`

and `str()`

print “language”^{59} for call objects, but `is_call()`

returns `TRUE`

:

```
lobstr::ast(read.table("important.csv", row.names = FALSE))
#> █─read.table
#> ├─"important.csv"
#> └─row.names = FALSE
x <- expr(read.table("important.csv", row.names = FALSE))
typeof(x)
#> [1] "language"
is.call(x)
#> [1] TRUE
```

#### 16.3.3.1 Subsetting

Calls generally behave like lists, i.e. you can use standard subsetting tools. The first element of the call object is the function to call, which is a usually a symbol[^symbol-exception]:

```
x[[1]]
#> read.table
is_symbol(x[[1]])
#> [1] TRUE
```

The primary exception[^function-factories] to this rule occurs when you use `::`

to call a function in a specific package. In that case the first element will be another call:

```
lobstr::ast(base::read.csv("important.csv"))
#> █─█─`::`
#> │ ├─base
#> │ └─read.csv
#> └─"important.csv"
```

[^function-factories] This is a general example of calls to a function factory, a function that returns a function, the topic of Chapter 9.

The remainder of the elements are the arguments:

```
as.list(x[-1])
#> [[1]]
#> [1] "important.csv"
#>
#> $row.names
#> [1] FALSE
```

You can extract individual arguments with `[[`

or `$`

(if named):

```
x[[2]]
#> [1] "important.csv"
x$row
#> [1] FALSE
```

You can determine the number of arguments in a call object by subtracting 1 from its length:

```
length(x) - 1
#> [1] 2
```

Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching: it could potentially be in any location, with the full name, with an abbreviated name, or with no name. To work around this problem, you can use `rlang::call_standardise()`

which standardises all arguments to use the full name:

```
rlang::call_standardise(x)
#> read.table(file = "important.csv", row.names = FALSE)
```

(Note that if the function uses `...`

it’s not possible to standardise all arguments.)

Calls can be modified in the same way as lists:

```
x$header <- TRUE
x
#> read.table("important.csv", row.names = FALSE, header = TRUE)
```

#### 16.3.3.2 Function position

The first element of the call object is the **function position**. This contains the function that will be called when the object is evaluated, and is usually a symbol^{60}:

```
lobstr::ast(foo())
#> █─foo
```

Note that while R allows you to surround the name of the function with quotes, the parser converts it to a symbol:

```
lobstr::ast("foo"())
#> █─foo
```

However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: for example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:

```
lobstr::ast(pkg::foo(1))
#> █─█─`::`
#> │ ├─pkg
#> │ └─foo
#> └─1
lobstr::ast(obj$foo(1))
#> █─█─`$`
#> │ ├─obj
#> │ └─foo
#> └─1
lobstr::ast(foo(1)(2))
#> █─█─foo
#> │ └─1
#> └─2
```

#### 16.3.3.3 Constructing

You can construct a call object from its components using `rlang::call2()`

. The first argument is the name of the function to call (either as a string, a symbol, or another call). The remaining arguments will be passed along to the call:

```
call2("mean", x = expr(x), na.rm = TRUE)
#> mean(x = x, na.rm = TRUE)
call2(expr(base::mean), x = expr(x), na.rm = TRUE)
#> base::mean(x = x, na.rm = TRUE)
```

Note that infix calls created in this way still print as usual.

```
call2("<-", expr(x), 10)
#> x <- 10
```

Using `call2()`

to create complex expressions is a bit clunky. You’ll learn another technique in Chapter 17.

### 16.3.4 Vocabulary

str | typeof | |
---|---|---|

Scalar constant | `logi` /`int` /`num` /`chr` |
`logical` /`integer` /`double` /`character` |

Symbol | `symbol` |
`symbol` |

Call object | `language` |
`language` |

Pairlist | Dotted pair list | `pairlist` |

Expression vector | `expression()` |
`expression` |

base | rlang | |
---|---|---|

Scalar constant | — | `is_syntactic_literal()` |

Symbol | `is.symbol()` |
`is_symbol()` |

Call object | `is.call()` |
`is_call()` |

Pairlist | `is.pairlist()` |
`is_pairlist()` |

Expression vector | `is.expression()` |
— |

### 16.3.5 Exercises

Which two of the six types of atomic vector can’t appear in an expression? Why? Why can’t you create an expression that contains an atomic vector of length greater than one?

What happens when you subset a call object to remove the first element? e.g.

`expr(read.csv("foo.csv", header = TRUE))[-1]`

. Why?Describe the differences between the following call objects.

`x <- 1:10 call2(median, x, na.rm = TRUE) call2(expr(median), x, na.rm = TRUE) call2(median, expr(x), na.rm = TRUE) call2(expr(median), expr(x), na.rm = TRUE)`

`rlang::call_standardise()`

doesn’t work so well for the following calls. Why? What makes`mean()`

special?`call_standardise(quote(mean(1:10, na.rm = TRUE))) #> mean(x = 1:10, na.rm = TRUE) call_standardise(quote(mean(n = T, 1:10))) #> mean(x = 1:10, n = T) call_standardise(quote(mean(x = 1:10, , TRUE))) #> mean(x = 1:10, , TRUE)`

Why does this code not make sense?

`x <- expr(foo(x = 1)) names(x) <- c("x", "")`

Construct the expression

`if(x > 1) "a" else "b"`

using multiple calls to`call2()`

. How does the code structure reflect the structure of the AST?

## 16.4 Parsing and grammar

We’ve talked a lot about expressions and the AST, but not about how expressions are created from code that you type. The process by which a computer language takes a string like (like `"x + y"`

) and constructs an expression is called **parsing**, and is governed by a set of rules known as a **grammar**. In this section, we’ll use `lobstr::ast()`

to explore some of the details of R’s grammar, and then see how you can transform back and forth between expressions and strings.

### 16.4.1 Operator precedence

Infix functions introduce ambiguity in a way that prefix functions do not^{61}. The parser has to resolve two sources of ambiguity when parsing infix operators. First, what does `1 + 2 * 3`

yield? Do you get 9 (i.e. `(1 + 2) * 3`

), or 7 (i.e. `1 + (2 * 3)`

)? In other words, which of the two possible parse trees below does R use?

Programming languages use conventions called **operator precedence** to resolve this ambiguity. We can use `ast()`

to see what R does:

```
lobstr::ast(1 + 2 * 3)
#> █─`+`
#> ├─1
#> └─█─`*`
#> ├─2
#> └─3
```

Predicting the precedence of arithmetic operations is usually easy because it’s drilled into you in school and is consistent across the vast majority of programming languages. Predicting the precedence of other operators is harder. There’s one particularly surprising case in R: `!`

has a much lower precedence (i.e. it binds less tightly) than you might expect. This allows you to write useful operations like:

```
lobstr::ast(!x %in% y)
#> █─`!`
#> └─█─`%in%`
#> ├─x
#> └─y
```

R has over 30 infix operators divided into 18 precedence groups. While the details are described in `?Syntax`

, very few people have memorised the complete ordering. If there’s any confusion, use parentheses!

```
lobstr::ast((1 + 2) * 3)
#> █─`*`
#> ├─█─`(`
#> │ └─█─`+`
#> │ ├─1
#> │ └─2
#> └─3
```

Note the appearance of the parentheses in the AST as a call to the `(`

function.

### 16.4.2 Associativity

Another source of ambiguity is introduced by repeated usage of the same infix function. For example, is `1 + 2 + 3`

equivalent to `(1 + 2) + 3`

or to `1 + (2 + 3)`

? This normally doesn’t matter because `x + (y + z) == (x + y) + z`

, i.e. addition is associative, but is needed because some S3 classes define `+`

in a non-associative way. For example, ggplot2 overloads `+`

to build up a complex plot from simple pieces; this is non-associative because earlier layers are drawn underneath later layers (i.e. `geom_point()`

+ `geom_smooth()`

does not yield the same plot as `geom_smooth()`

+ `geom_point()`

).

In R, most operators are **left-associative**, i.e. the operations on the left are evaluated first:

```
lobstr::ast(1 + 2 + 3)
#> █─`+`
#> ├─█─`+`
#> │ ├─1
#> │ └─2
#> └─3
```

There are two exceptions: exponentiation and assignment.

```
lobstr::ast(2^2^3)
#> █─`^`
#> ├─2
#> └─█─`^`
#> ├─2
#> └─3
lobstr::ast(x <- y <- z)
#> █─`<-`
#> ├─x
#> └─█─`<-`
#> ├─y
#> └─z
```

### 16.4.3 Parsing and deparsing

Most of the time you type code into the console, and R takes care of turning the characters you’ve typed into an AST. But occasionally you have code stored in a string, and you want to parse it yourself. You can do so using `rlang::parse_expr()`

:

```
x1 <- "y <- x + 10"
lobstr::ast(!!x1)
#> "y <- x + 10"
x2 <- rlang::parse_expr(x1)
x2
#> y <- x + 10
lobstr::ast(!!x2)
#> █─`<-`
#> ├─y
#> └─█─`+`
#> ├─x
#> └─10
```

`parse_expr()`

always returns a single expression. If you have multiple expression separated by `;`

or `\n`

, you’ll need to use `rlang::parse_exprs()`

. It returns a list of expressions:

```
x3 <- "a <- 1; a + 1"
rlang::parse_exprs(x3)
#> [[1]]
#> a <- 1
#>
#> [[2]]
#> a + 1
```

If you find yourself working with strings containing code very frequently, you should reconsider your process. Read the Chapter 17 and consider if you can instead more safely generate expressions using quasiquotation.

The base equivalent to `parse_exprs()`

is `parse()`

. It is a little harder to use because it’s specialised for parsing R code stored in files. You need supply your string to the `text`

argument. It returns an expression vector, discussed in Section **??**, which I recommend turning into a list:

```
as.list(parse(text = x1))
#> [[1]]
#> y <- x + 10
```

The inverse of parsing is **deparsing**: given an expression, you want the string that would generate it. This happens automatically when you print an expression, and you can get the string yourself with `rlang::expr_text()`

:

```
z <- expr(y <- x + 10)
expr_text(z)
#> [1] "y <- x + 10"
```

Parsing and deparsing are not perfectly symmetric because parsing generates an *abstract* syntax tree. This means we lose backticks around ordinary names, comments, and whitespace:

```
cat(expr_text(expr({
# This is a comment
x <- `x` + 1
})))
#> {
#> x <- x + 1
#> }
```

Be careful when using the base R equivalent, `deparse()`

: it returns a character vector with one element for each line. Whenever you use it, remember that the length of the output might be greater than one, and plan accordingly.

### 16.4.4 Exercises

R uses parentheses in two slightly different ways as illustrated by these two calls:

`f((1)) `(`(1 + 1)`

Compare and contrast the two uses by referencing the AST.

`=`

can also be used in two ways. Construct a simple example that shows both uses.Does

`-2^2`

yield 4 or -4? Why?What does

`!1 + !1`

return? Why?Why does

`x1 <- x2 <- x3 <- 0`

work? Describe the two reasons.Compare the ASTs of

`x + y %+% z`

and`x ^ y %+% z`

. What have you learned about the precedence of custom infix functions?What happens if you call

`parse_expr()`

with a string that generates multiple expressions? e.g.`parse_expr("x + 1; y + 1")`

What happens if you attempt to parse an invalid expression? e.g.

`"a +"`

or`"f())"`

.`deparse()`

produces vectors when the input is long. For example, the following call produces a vector of length two:`expr <- expr(g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z)) deparse(expr)`

What does

`expr_text()`

do instead?`pairwise.t.test()`

assumes that`deparse()`

always returns a length one character vector. Can you construct an input that violates this expectation? What happens?

## 16.5 Walking the AST with recursive functions

To conclude the chapter I’m going to use everything you’ve learned about ASTs to solve more complicated problems. The inspiration comes from the base codetools package, which provides two interesting functions:

`findGlobals()`

locates all global variables used by a function. This can be useful if you want to check that your function doesn’t inadvertently rely on variables defined in their parent environment.`checkUsage()`

checks for a range of common problems including unused local variables, unused parameters, and the use of partial argument matching.

Getting all of the details of these functions correct is fiddly, so we won’t fully develop the ideas. Instead we’ll focus on the big underlying idea: recursion on the AST. Recursive functions are a natural fit to tree-like data structures because a recursive function is made up of two parts that correspond to the two parts of the tree:

The

**recursive case**handles the nodes in the tree. Typically, you’ll do something to each child of a node, usually calling the recursive function again, and then combine the results back together again. For expressions, you’ll need to handle calls and pairlists (function arguments).The

**base case**handles the leaves of the tree. The base cases ensure that the function eventually terminates, by solving the simplest cases directly. For expressions, you need to handle symbols and constants in the base case.

To make this pattern easier to see, we’ll need two helper functions. First we define `expr_type()`

which will return “constant” for constant, “symbol” for symbols, “call”, for calls, “pairlist” for pairlists, and the “type” of anything else:

```
expr_type <- function(x) {
if (rlang::is_syntactic_literal(x)) {
"constant"
} else if (is.symbol(x)) {
"symbol"
} else if (is.call(x)) {
"call"
} else if (is.pairlist(x)) {
"pairlist"
} else {
typeof(x)
}
}
expr_type(expr("a"))
#> [1] "constant"
expr_type(expr(f(1, 2)))
#> [1] "call"
```

We’ll couple this with a wrapper around the switch function:

```
switch_expr <- function(x, ...) {
switch(expr_type(x),
...,
stop("Don't know how to handle type ", typeof(x), call. = FALSE)
)
}
```

With these two functions in hand, the basic template for any function that walks the AST is as follows:

```
recurse_call <- function(x) {
switch_expr(x,
# Base cases
symbol = ,
constant = ,
# Recursive cases
call = ,
pairlist =
)
}
```

Typically, solving the base case is easy, so we’ll do that first, then check the results. The recursive cases are a little more tricky. Typically you’ll think about the structure of the final output and then find the correct purrr function to produce it. To that end, make sure you’re familiar with Functionals before continuing.

### 16.5.1 Finding F and T

We’ll start simple with a function that determines whether a function uses the logical abbreviations `T`

and `F`

: it will return `TRUE`

if it finds a logical abbreviation, and `FALSE`

otherwise. Using `T`

and `F`

is generally considered to be poor coding practice, and is something that `R CMD check`

will warn about.

Let’s first compare the AST for `T`

vs. `TRUE`

:

```
ast(TRUE)
#> TRUE
ast(T)
#> T
```

`TRUE`

is parsed as a logical vector of length one, while `T`

is parsed as a name. This tells us how to write our base cases for the recursive function: a constant is never a logical abbreviation, and a symbol is an abbreviation if it’s “F” or “T”:

```
logical_abbr_rec <- function(x) {
switch_expr(x,
constant = FALSE,
symbol = as_string(x) %in% c("F", "T")
)
}
logical_abbr_rec(expr(TRUE))
#> [1] FALSE
logical_abbr_rec(expr(T))
#> [1] TRUE
```

I’ve written `logical_abbr_rec()`

function assuming that the input will be an expression as this will make the recursive operation simpler. However, when writing a recursive function it’s common to write a wrapper that provides defaults or makes the function a little easier to use. Here we’ll typically make a wrapper that quotes its input (we’ll learn more about that in the next chapter), so we don’t need to use `expr()`

every time.

```
logical_abbr <- function(x) {
logical_abbr_rec(enexpr(x))
}
logical_abbr(T)
#> [1] TRUE
logical_abbr(FALSE)
#> [1] FALSE
```

Next we need to implement the recursive cases. Here it’s simple because we want to do the same thing for calls and for pairlists: recursively apply the function to each subcomponent, and return `TRUE`

if any subcomponent contains a logical abbreviation. This is made easy by `purrr::some()`

, which iterates over a list and returns `TRUE`

if the predicate function is true for any element.

```
logical_abbr_rec <- function(x) {
switch_expr(x,
# Base cases
constant = FALSE,
symbol = as_string(x) %in% c("F", "T"),
# Recursive cases
call = ,
pairlist = purrr::some(x, logical_abbr_rec)
)
}
logical_abbr(mean(x, na.rm = T))
#> [1] TRUE
logical_abbr(function(x, na.rm = T) FALSE)
#> [1] TRUE
```

### 16.5.2 Finding all variables created by assignment

`logical_abbr()`

is very simple: it only returns a single `TRUE`

or `FALSE`

. The next task, listing all variables created by assignment, is a little more complicated. We’ll start simply, and then make the function progressively more rigorous.

We start by looking at the AST for assignment:

```
ast(x <- 10)
#> █─`<-`
#> ├─x
#> └─10
```

Assignment is a call object where the first element is the symbol `<-`

, the second is the name of variable, and the third is the value to be assigned.

Next, we need to decide what data structure we’re going to use for the results. Here I think it will be easiest if we return a character vector. If we return symbols, we’ll need to use a `list()`

and that makes things a little more complicated.

With that in hand we can start by implementing the base cases and providing a helpful wrapper around the recursive function. The base cases here are really simple!

```
find_assign_rec <- function(x) {
switch_expr(x,
constant = ,
symbol = character()
)
}
find_assign <- function(x) find_assign_rec(enexpr(x))
find_assign("x")
#> character(0)
find_assign(x)
#> character(0)
```

Next we implement the recursive cases. This is made easier by a function that should exist in purrr, but currently doesn’t. `flat_map_chr()`

expects `.f`

to return a character vector of arbitrary length, and flattens all results into a single character vector.

```
flat_map_chr <- function(.x, .f, ...) {
purrr::flatten_chr(purrr::map(.x, .f, ...))
}
flat_map_chr(letters[1:3], ~ rep(., sample(3, 1)))
#> [1] "a" "b" "b" "b" "c" "c"
```

The recursive case for pairlists is simple: we iterate over every element of the pairlist (i.e. each function argument) and combine the results. The case for calls is a little bit more complex - if this is a call to `<-`

then we should return the second element of the call:

```
find_assign_rec <- function(x) {
switch_expr(x,
# Base cases
constant = ,
symbol = character(),
# Recursive cases
pairlist = flat_map_chr(as.list(x), find_assign_rec),
call = {
if (is_call(x, "<-")) {
as_string(x[[2]])
} else {
flat_map_chr(as.list(x), find_assign_rec)
}
}
)
}
find_assign(a <- 1)
#> [1] "a"
find_assign({
a <- 1
{
b <- 2
}
})
#> [1] "a" "b"
```

Now we need to make our function more robust by coming up with examples intended to break it. What happens when we assign to the same variable multiple times?

```
find_assign({
a <- 1
a <- 2
})
#> [1] "a" "a"
```

It’s easiest to fix this at the level of the wrapper function:

```
find_assign <- function(x) unique(find_assign_rec(enexpr(x)))
find_assign({
a <- 1
a <- 2
})
#> [1] "a"
```

What happens if we have nested calls to `<-`

? Currently we only return the first. That’s because when `<-`

occurs we immediately terminate recursion.

```
find_assign({
a <- b <- c <- 1
})
#> [1] "a"
```

Instead we need to take a more rigorous approach. I think it’s best to keep the recursive function focused on the tree structure, so I’m going to extract out `find_assign_call()`

into a separate function.

```
find_assign_call <- function(x) {
if (is_call(x, "<-") && is_symbol(x[[2]])) {
lhs <- as_string(x[[2]])
children <- as.list(x)[-1]
} else {
lhs <- character()
children <- as.list(x)
}
c(lhs, flat_map_chr(children, find_assign_rec))
}
find_assign_rec <- function(x) {
switch_expr(x,
# Base cases
constant = ,
symbol = character(),
# Recursive cases
pairlist = flat_map_chr(x, find_assign_rec),
call = find_assign_call(x)
)
}
find_assign(a <- b <- c <- 1)
#> [1] "a" "b" "c"
find_assign(system.time(x <- print(y <- 5)))
#> [1] "x" "y"
```

While the complete version of this function is quite complicated, it’s important to remember we wrote it by working our way up by writing simple component parts.

### 16.5.3 Exercises

`logical_abbr()`

returns`TRUE`

for`T(1, 2, 3)`

. How could you modify`logical_abbr_rec()`

so that it ignores function calls that use`T`

or`F`

?`logical_abbr()`

works with expressions. It currently fails when you give it a function. Why not? How could you modify`logical_abbr()`

to make it work? What components of a function will you need to recurse over?`logical_abbr(function(x = TRUE) { g(x + T) })`

Modify find assignment to also detect assignment using replacement functions, i.e.

`names(x) <- y`

.Write a function that extracts all calls to a specified function.

## 16.6 Specialised data structures

There are two data structures and one special symbol that we need to cover for the sake of completeness. They are not usually important in practice.

### 16.6.1 Pairlists

Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere. The only place you are likely to see pairlists in R^{62} is when working with calls to the “function” function, as the formal arguments to a function are stored in a pairlist:

```
f <- expr(function(x, y = 10) x + y)
args <- f[[2]]
args
#> $x
#>
#>
#> $y
#> [1] 10
typeof(args)
#> [1] "pairlist"
```

Fortunately, whenever you encounter a pairlist, you can treat it just like a regular list:

```
pl <- pairlist(x = 1, y = 2)
length(pl)
#> [1] 2
pl$x
#> [1] 1
```

Behind the scenes pairlists are implemented using a different data structure, a linked list instead of an array. That makes subsetting a pairlist much slower than subsetting a list, but this has little practical impact.

### 16.6.2 Missing arguments

The special symbol that needs a little extra discussion is the empty symbol, which is used to represent missing arguments (not missing values!). You only need to care about the missing symbol if you’re programmatically creating functions with missing arguments; we’ll come back to that in Section 17.4.3.

You can make an empty symbol with `missing_arg()`

(or `expr()`

):

```
missing_arg()
typeof(missing_arg())
#> [1] "symbol"
```

An empty symbol doesn’t print anything, so you can check if you have one with `rlang::is_missing()`

:

```
is_missing(missing_arg())
#> [1] TRUE
```

And you’ll find them in the wild in function formals:

```
f <- expr(function(x, y = 10) x + y)
args <- f[[2]]
is_missing(args[[1]])
#> [1] TRUE
```

The empty symbol has a peculiar property: if you bind it to a variable, then access that variable, you will get an error:

```
m <- missing_arg()
m
#> Error in eval(expr, envir, enclos):
#> argument "m" is missing, with no default
```

This is the same error you get when referring to a missing argument inside a function, and indeed this is the magic that powers missing arguments.

If you need to preserve the missingness of a variable, `rlang::maybe_missing()`

is often helpful. It allows you to refer to a potentially missing variable without triggering the error.

### 16.6.3 Expression vectors

Finally, we need to briefly discuss the expression vector. Expression vectors are only produced by two base functions: `expression()`

and `parse()`

:

```
exp1 <- parse(text = c("
x <- 4
x
"))
exp2 <- expression(x <- 4, x)
typeof(exp1)
#> [1] "expression"
typeof(exp2)
#> [1] "expression"
exp1
#> expression(x <- 4, x)
exp2
#> expression(x <- 4, x)
```

Like calls and pairlists, expression vectors behave like lists:

```
length(exp1)
#> [1] 2
exp1[[1]]
#> x <- 4
```

Conceptually, an expression vector is just a list of expressions. The only difference is that calling `eval()`

on an expression evaluates each individual expression. I don’t believe this advantage merits introducing a new data structure, so instead of expression vectors I just use lists of expressions.

For more complex code, you can also use RStudio’s tree viewer which doesn’t obey quite the same graphical conventions, but allows you to interactively explore the AST. Try it out with

`View(expr(f(x, "y", 1)))`

.↩The names of non-prefix functions are non-syntactic so I show them with

```

, as in Section 2.2.1.↩It is

*possible*to insert any other base type into an expression, but this is unusual and only needed in rare circumstances. We’ll come back to the idea in Section 17.4.7.↩Technically, the R language does not possess scalars, and everything that looks like a scalar is actually a vector of length one. This however, is mainly a theoretical distinction, and blurring the distinction between scalar and length-1 vector is unlikely to harm your code.↩

Technically, the R language does not possess scalars, and everything that looks like a scalar is actually a vector of length one. This however, is mainly a theoretical distinction, and blurring the distinction between scalar and length-1 vector is unlikely to harm your code.↩

Avoid

`is.language()`

which returns`TRUE`

for symbols, calls, and expression vectors.↩Peculiarly, it can also be a number, as in the expression

`3()`

. But this call will always fail to evaluate because a number is not a function.↩This ambiguity does not arise without infix operators, which can be considered an advantage of purely prefix and postfix languages. It’s interesting to compare a simple arithmetic operation in Lisp (prefix) and Forth (postfix). In Lisp you’d write

`(* (+ 1 2) 3))`

; this avoids ambiguity by requiring parentheses everywhere. In Forth, you’d write`1 2 + 3 *`

; this doesn’t require any parentheses, but does require more thought when reading.↩If you’re working in C, you’ll encounter pairlists more often. For example, call objects are also implemented using pairlists.↩