R, at its heart, is a functional language. This means that it has certain technical properties, but more importantly that it lends itself to a style of problem solving centered on functions. Below I’ll give a brief overview of the technical definition of a functional language, but in this book I will primarily focus on the functional style of programming, because I think it is an extremely good fit to the types of problem you commonly encounter when doing data analysis.
Recently, functional techniques have experienced a surge in interest because they can produce efficient and elegant solutions to many modern problems. A functional style tends to create functions that can easily be analysed in isolation (i.e. using only local information), and hence is often much easier to automatically optimise or parallelise. The traditional weaknesses of function languages, poorer performance and sometimes unpredictable memory usage, have been much reduced in recent years. Functional programming is complementary to object oriented programming, which has been the dominant programming paradigm for the last several decades.
Functional programming languages
Every programming language has functions, so what makes a programming language functional? There are many defintions for precisely what makes a language “functional”, but there are two common threads.
Firstly, functional languages have first-class functions, functions that behave like any other data structure. In R, this means that you can do anything with a function that you can do with a vector: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.
Secondly, many functional languages require functions to be pure. A function is pure if it satisfies two properties:
- The output only depends on the inputs, i.e. if you call it again with the
same inputs, you get the same outputs. This excludes functions like
Sys.time()that can return different values.
- The function has no side-effects, like changing the value of a variable,
writing to disk, or displaying to the screen. This excludes functions like
Pure functions are much easier to reason about, but obviously have significant downsides: imagine doing a data analysis where you couldn’t generate random numbers or read files from disk.
Strictly speaking, R isn’t a functional programming language because it doesn’t require that you write pure functions. However, you can certainly adopt a functional style in parts of your code: you don’t have to write pure functions, but you often should. In my experience, partitioning code into functions that are either extremely pure and or extremely impure tends to lead to code that is easier to understand and extend to new situations.
It’s hard to describe exactly what a functional style is, but generally I think it means decomposing a big problem into smaller pieces then solving each piece with a function or combination of functions. When using a functional style you strive to decompose components of the problem into isolated functions that operate independently. Each function taken by itself is simple and straightforward to understand; complexity is handled by composing fuctions in various ways.
The following three chapters discuss the three key functional techniques that help you to decompose problems into smaller pieces:
Chapter 8 shows you how to replace many for loops with functionals which are functions (like
lapply()) that take another function as an argument. Functionals allow you to take a function that solves the problem for a single input, and generalise it to handle any number of inputs. Functionals are by far and away the most important technique, and you’ll use them all the time in data analysis.
Chapter 9 introduces function factories, functions that create functions. Function factories are less useful than functionals, but often allow you elegantly partition work between different parts of your code.
Chapter 10 shows you how to create function operators, functions that take functions as input and produce functions as output. They are like adverbs, because they typically modify the operation of a function.
Collectively, these types of function are caled higher-order functions, and fill out a two-by-two table: