12 Function operators
In this chapter, you’ll learn about function operators (FOs). A function operator is a function that takes one (or more) functions as input and returns a function as output. The following code shows a simple function operator,
chatty(). It wraps a function, making a new function that prints out its first argument. You might create a function like this because it gives you a window to see how functionals, like
Function operators are closely related to function factories; indeed they’re just a function factory that takes a function as a input. As well a being built from the same building blocks, there’s nothing you can’t do without them, but they often allow you to factor out complexity in order to make your code more readable and resuable. Function operators are typically paired with functionals. If you’re using a for-loop, there’s rarely a reason to use a FO, as it will make your code more complex for little gain.
If you’re familiar with Python, decorators are just another name for function operators.
Function operators are a type of function factory, so make sure you’re familiar with Section 6.2 before you go on.
We’ll use a couple of functionals from purrr that you learned about in Chapter 10, as well as some function operators that you’ll learn about below. We’ll use the memoise package for a useful FO.
12.2 Existing FOs
There are two extremely useful function operators that will both help you solve common recurring problems, and give you a sense for what FOs can do:
12.2.1 Capturing errors with
One advantage of a for-loops is that if one of the iterations fails in a for-loop you can still access all the previous results:
If you run the same code with a functional, you get no output and it can be hard to figure out where the problem lies:
purrr::safely() provides a tool to help with this problem.
safely() is a function operator that transforms a function to turn errors into data. (You can learn the basic idea that makes it work in Section 8.6.2). Let’s start by taking a look at it outside of
safe_sum <- safely(sum) str(safe_sum(x[])) #> List of 2 #> $ result: num 1.39 #> $ error : NULL str(safe_sum(x[])) #> List of 2 #> $ result: NULL #> $ error :List of 2 #> ..$ message: chr "invalid 'type' (character) of argument" #> ..$ call : language sum(..., na.rm = na.rm) #> ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
A function transformed by
safely() always returns a list with two elements,
error. If the function runs successfully,
result contains the result; if the function fails,
error contains the error.
out <- map(x, safely(sum)) str(out) #> List of 4 #> $ :List of 2 #> ..$ result: num 1.39 #> ..$ error : NULL #> $ :List of 2 #> ..$ result: num 1.27 #> ..$ error : NULL #> $ :List of 2 #> ..$ result: num 2.17 #> ..$ error : NULL #> $ :List of 2 #> ..$ result: NULL #> ..$ error :List of 2 #> .. ..$ message: chr "invalid 'type' (character) of argument" #> .. ..$ call : language sum(..., na.rm = na.rm) #> .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
The output is in a slightly inconveient form, since we have four lists each containing a list containing the result and the error. We can make it more convenient by using
purrr::transpose() to turn it “inside-out” so that we get a list of result and a list of errors:
out <- transpose(map(x, safely(sum))) str(out) #> List of 2 #> $ result:List of 4 #> ..$ : num 1.39 #> ..$ : num 1.27 #> ..$ : num 2.17 #> ..$ : NULL #> $ error :List of 4 #> ..$ : NULL #> ..$ : NULL #> ..$ : NULL #> ..$ :List of 2 #> .. ..$ message: chr "invalid 'type' (character) of argument" #> .. ..$ call : language sum(..., na.rm = na.rm) #> .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
Now we can easily find the results the worked, or the inputs that failed:
You can use this same technique in many different situtations. For example, imagine you’re fitting a set of generalised linear models (GLMs) to a list of data frames. While GLMs can sometimes fail because of optimisation problems, you’d still want to be able to try to fit all the models, and later look back at those that failed:
I think this is a great example of the power of combining functionals and function operators: it lets you succinctly express what you need to solve a common data analysis problem.
purrr comes with three other function operators in a similar vein:
possibly(): returns a default value when there’s an error.
quietly(): turns output, messages, and warning side-effects in to
warningcomponents of the output.
auto_browser(): automatically executes
browser()inside the function when there’s an error.
See their documentation for more details.
12.2.2 Caching computations with
An extremely handy FO is
memoise::memoise(). It memoises a function, meaning that the function will remember previous inputs and return a cache results. Memoisation is an example of the classic computer science tradeoff of memory versus speed. A memoised function can run much faster because it stores all of the previous inputs and outputs, using more memory.
Let’s explore this idea with a toy function that simulates an expensive operation:
When we memoise this function, it’s slow when we call it with new arguments. But when we call it with arguments that it’s seen before it’s instanteous: it retrieves the previous value of the computation.
A relatively realistic use of memoisation is computing the Fibonacci series. The Fibonacci series is defined recursively: the first two values are defined by convention, \(f(0) = 0\), \(f(n) = 1\), and then \(f(n) = f(n - 1) + f(n - 2)\) (for any positive integer). A naive version is slow because, for example,
fib(7), and so on.
fib() makes the implementation much faster because each value is computed only once:
And future calls can rely on previous computations:
This is an example of dynamic programming, where a complex problem can be broken down into many overlapping subproblems, and remembering the results of a subproblem considerably improves performance.
Think carefully before memoising a function. If the function is not pure, i.e. the output does not depend only on the input, you will get misleading and confusing results. I created a subtle bug in devtools because I memoised the results of
available.package(), which is rather slow because it has to download a large file from CRAN. The available packages don’t change that frequently, but if you have an R process that’s been running for a few days, the changes can become important, and because the problem only arose in long-running R process, the bug was very painful to find.
- Base R provides a function operator in the form of
Vectorize(). What does it do? When might you use it?
12.3 Case study: creating your own FOs
Imagine you have named vector of URLs and you’d like to download each one to disk.
That’s pretty simple with
This approach is fine for a handful of URLs, but as the vector gets longer, it’d be nice to add a couple more features:
Add a small delay between each request to avoid hammering the server.
.every few URLs so that we know that the function is still working.
It’s relatively easy to add these extra features if we’re using a for loop:
But I think this for loop is suboptimal because it interleaves different concerns (iteration, printing, and downloading). This makes the code harder to read, and it makes the harder to components reuse in new situations. Instead, let’s see if we can use function operators to extract out the two ideas and make them reusable.
First, lets write make an FO that adds a small delay. I’m going to call it
delay_by() for reasons that will be clear more shortly, and it has two arguments: the function to wrap, and the amount of delay to add. The actual implementation is quite simple. The main trick is forcing evaluation of all arguments as described in Section 11.2.4, because function operators are a special type of function factory:
And we can use it with the original
Creating a function to display the occassional dot is a little harder, because we can no longer rely on the index from the loop. We could pass the index along as another argument, but that breaks encapsulation: now a concern of the progress function becomes a problem that the higher level wrapper needs to deal instead. Instead, we’ll use another function factory trick (from Section 11.2.3), so that the progress wrapper can manage its own internal counter:
Now we can express our original goal as:
This is starting to get a little hard to read because we are composing many function calls, and the arguments are getting spread out. One way to resolve that is to use the pipe:
The pipe works well here because I’ve carefully chosen the function names to yield an (almost) readable sentence: take
download.file then (add) a dot every 10 iterations, then delay by 0.1s. The more clearly you can express the intent of your code through function names, the more easily others (including future you!) can read and understand the code.
Compare and contrast the for loop and
walk2()approaches to downloading many urls. Which makes it easier to see the core objects and functions? Which requires more background knowledge? What are the advantages and disadvantages in factoring out components of the problem into independent functions?
Create a FO that reports whenever a file is created or deleted in the working directory, using
setdiff(). What other global function effects might you want to track?
Write a FO that logs a time stamp and message to a file every time a function is run.
delay_by()so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called
g <- delay_by(1, f); g(); Sys.sleep(2); g()there shouldn’t be an extra delay.