Welcome to part 2 of “The Burge School of Functional Programming”. Last time I claimed that Burge’s 1975 book “Recursive Programming Techniques” is a gem of Functional Programming that deserves to be more widely known. Burge’s book gets to the core of what Functional Programming is about, and it’s not fancy type systems or mathematical jargon; it’s *expressions*. This post is where we really pinpoint what an expression is. If I’m going to convince you that functional programing is about expressions, we should be really clear what we mean by that!

In later posts we’ll look at how to build up an entire programming language from what’s described here, and how that reflects one of the great ideas in Functional Programming: constructing software systems from *small* expressions that can be understood *in isolation*.

Throughout this series, I will use the notation that Burge used. However, I aim to point out when those notations differ from modern conventions.

I’m going to quote directly from the book a lot early on in this post, as I feel there’s a lot of insight to be found at the beginning of Burge’s first chapter.

Burge hits the ground running with the introductory chapter of the book, making it clear that the style of programming this book advocates is a bit different from what the reader may be used to. On the very first page he states:

All the linguistic devices introduced [in this book] are based upon two methods of constructing expressions from smaller expressions […] Thus the extra notation that is added to this basis adds no new structural features

He names the extra notation *additions*. These days we would call it *syntactic sugar*, but the idea is identical: New forms of expression that can be translated to a few core constructs. Let that sink in: all of the language constructs that Burge describes in his book can be rewritten with just the two following constructs:

- “An operator/operand construction that denotes function application”
- “An expression format which denotes a function”

Burge does remind the reader that there will be some ‘constants’ (we’d call them *primitives*) required for certain tasks. This sets up one of the first great insights of the text, stating that the constructs

create a practical and powerful programming system, which is more like a

familyof programming languages than a single language, because the features introduced are concerned more with combining functions to produce new ones than with the nature of the primitive functions that are being combined.

Here comes the kicker:

A programming language for a particular range of applications can be obtained by adding an appropriate set of primitives to this basic structure.

These days we call a “language for a particular range of applications” a Domain Specific Language (DSL). DSLs are a very powerful tool in the functional programmer’s toolbox. So in just under a page Burge has set the stage for understanding functional programming through just a few core constructs and its usefulness in creating DSLs but avoids using any jargon. So Burge’s lesson so far: understand two core constructs, then pick primitives according to your domain.

We haven’t gotten much further in the decades since.

He’s not done yet, there’s one more insight waiting for us, right before he starts introducing the core constructs he reminds us that the focus is on *expressions* and not *mechanisms*. He argues that expressions have a great property:

the value, or meaning, of an expression depends in a simple way only on the values of its subexpressions and on no other properties of them.

So again, without leaning on fancy language, Burge lays it all out in front of us. He states that this property allows you solve large and complex problems by breaking them down into small, simple, and independent problems. And

it is possible to make the structure of the program match the structure of the problem being solved.

Let’s start with the obvious: Functional Programming deals with functions. This makes it critical to specify what we mean when we say *function*. Burge does this by talking about the relationship a function has to its arguments. More specifically, he talks about types, starting with function types.

But remember, this is a book from decades ago, so we’re not bringing in any heavy machinery here. Note that Burge never talks about type-checking or even implementing a type system. Even if your compiler doesn’t do type checking, you can still benefit from thinking about types.

I’ve been going on about *expressions* over *mechanisms* for a while and I haven’t even shown you an expression yet. So let’s take a look at one: \(f(x)\). This is how Burge typesets the application of the function \(f\) to the value \(x\). This is the first expression we’ve seen, so let’s unpack it. \(f(x)\) is a function application, which has two parts; an *operator* (in this case \(f\)), and an *operand* (in this case \(x\)). This makes function application a type of *compound* expression.

In the introduction I claimed that all of the language constructs we will introduce can be understood with just two concepts; function application is the first one! The basic idea of looking at application as a compound expression is that in order to make sense of the expression \(f(x)\) you’re going to have to make sense of \(f\) and of \(x\). That may seem obvious, but the insight Burge is trying to get across to you is that when you program in this expression-based style you make sense of each in isolation. That’s a very powerful idea!

Okay, so a function is a value that you can *apply* to other values (a.k.a. the inputs). However, not all values make sense as inputs; for example, the function \(square\) that takes a number and multiplies it by itself, does not make much sense if you give it the letter ‘a’ as an input. Let’s be a bit more concrete and say functions have a *type*, usually written as

\[A \longrightarrow B\]

Here the \((\rightarrow)\) is what indicates this is a function. The \(A\) is the type of value it accepts, and \(B\) is the type of values it returns. Burge calls these the *domain* and *range*, respectively.^{1} So if you have a function \(f\) of type \(A \rightarrow B\), and you apply it to a value \(x\) which has type \(A\), the result is of type \(B\). Burge typesets function application as both \(f(x)\) or \(f\ x\).

Earlier I mentioned the function \(square\). One possible type for \(square\) would be^{2}

\[square \in (\text{integer} \rightarrow \text{integer})\]

Other common examples:

\[ sin \in (\text{real} \rightarrow \text{real}) \\ log \in (\text{positive} \rightarrow \text{real}) \\ negate \in (\text{positive} \rightarrow \text{negative}) \]

So in general anything of the form \(g \in (A \rightarrow B)\) is an assertion that \(g\) is a function that takes arguments of type \(A\) and returns values of type \(B\). This means if we have a value \(x\) of type \(A\), we can be assured that \(g(x) \in B\).

Okay, this is all well and good, but every function we’ve looked at takes only a single argument, what about things like \(+\)? Here are some examples:

\[ + \in (\text{real} \times \text{real} \rightarrow \text{real})\\ min \in (\text{real} \times \text{real} \rightarrow \text{real})\\ equal \in (\text{real} \times \text{real} \rightarrow \text{truth value}) \]

This is just saying that these functions take two arguments, both reals, and return a single value. This generalizes in the obvious way to functions with an arbitrary number of arguments. So a function that takes \(N\) arguments would have a type like

\[A_{1} \times A_{2} \times \dots \times A_{N} \rightarrow B\]

Where \(A_{i}\) is the appropriate type of the \(i^{th}\) argument. In other words, \(A \times B\) defines the type of *pairs* where the first element is of type \(A\) and the second element is of type \(B\).

Now, the traditional way to apply functions that take multiple arguments is by extending the syntax we already have, giving us \(min(x,y)\) or \(+(x,y)\). Many languages have a predetermined set of special functions that can be applied differently, the arithmetic operations are usually such an exception, so you could write \(x + y\) instead of \(+(x,y)\). Burge is no different here and his language allows for certain functions to be applied in this special manner.

It’s worth pointing out that even if you have a special syntax for the application of certain functions, *the operator/operand relationship is unchanged*. It could be argued that allowing any such special syntax obscures this relationship and therefore obscures the meaning of the expression^{3}. The important bit is that regardless of the syntax it is crucial that you be able to identify the operator and the operand of a function application.

This is pretty much the extent of what we’ll say about types, possibly for the whole series. Though types are used to describe things (as we’ll see in the next section), Burge never defines a type system, or any form of static enforcement of types. So why mention types at all? Because it’s important to *think* about types when you’re doing functional programming, particularly to distinguish between things that you can apply (functions), what they expect as argument values, etc.; and things you can’t apply (the number \(5\), for instance).

Many of the experts of dynamic languages I’ve interacted with will be the first to tell you: thinking about types is important when writing programs. But we don’t have to get fancy with our type system in order to have any benefit from the *concept* of types.

We now know what makes up a function application (the operator and operand), but we still don’t know the *meaning* of anything. I claimed earlier that you can determine the meaning of a function application by finding the meaning of its operator and operand. But eventually you’ll reach an expression that is not a function application, Burge calls these *simple* expressions. In this section we will explain the meaning of one type of simple expression: *constants* (variables are the other type of simple expression, which we’ll get to a bit later).

In any programming system there is a set of constants (what we would call primitives). The meaning of these constants is given by the system and its implementation. The most obvious constants are things like numbers: \(4\), \(-128\), etc.^{4} or arithmetic operations: \(+\), \(-\), \(\times\), etc.

So if your language has the constant \(+\) which takes two numbers and adds them together, and it has numeric constants we can now find the meaning of expressions like \(+(4,5)\), or \(square(\times(2,3))\).

We do this with the following procedure:

- If the expression is simple, what is the meaning of the constant?
- If the expression is composite, what is the meaning of its operator, and what is the meaning of its operand?

Try applying this series of steps to \(+(4,5)\), or \(square(\times(2,3))\). Do yourself a favor and force yourself to actually go through the steps. In this instance it is not very hard, but it is the practice of finding the meaning of expressions via the meaning of their constituent parts that is important.

For the rest of the post we’ll allow ourselves to use the arithmetic operators as infix, i.e. \(4 + 5\).

Burge uses two conventions when writing function application. The first is what we’ve seen up to this point: \(f(x)\) or \(f(x,y)\). This is very common and is likely to be what most languages you’re familiar with use. The second convention Burge uses is more unusual: \(f\ x\) or \(f\ x\ y\), respectively. Burge explains that these aren’t really different syntaxes for function application, but really represent functions of different types!

It’s clearer with two argument functions. Take the functions \(addT\) and \(addC\)^{5}, both of which add two numbers together. These are their types:

\[ addT \in (\text{real} \times \text{real} \rightarrow \text{real})\\ addC \in (\text{real} \rightarrow (\text{real} \rightarrow \text{real})) \]

The types tell us that \(addT\) takes a *pair* of real numbers and returns a real number, while \(addC\) takes a *single* real number and returns a *function* of type \((\text{real} \rightarrow \text{real})\). Therefore, when we write \(addC\ x\
y\), we’re really writing \((addC\ x)\ y\), i.e. we’re applying the function \((addC\ x)\) to \(y\). We can omit the brackets because function application associates to the left. Similarly the function type \((\rightarrow)\) associates to the right, so \((A \rightarrow (B \rightarrow C))\) is the same as \((A \rightarrow B \rightarrow C)\).

To really hit the lesson home, convince yourself that the following are all equivalent for a function, \(f\), that takes three arguments

\[ ((f\ x)\ y) \ z\\ (f\ x\ y) \ z\\ f\ x\ y \ (z) \]

and understand why \(f\ x\ (y\ z)\) is not equivalent to the three above^{6}. Here’s a hint: think of what the operator/operand relationships are for every function application, even the symbols within brackets.

For extra credit, draw a tree of the operator/operand (function/argument) parts for each version, including \(f\ x\ (y\ z)\).

This is not just some clever rationalization of multi-argument functions, this is a direct consequence of functions themselves being values. Because they are values just like anything else we can also *pass* functions to other functions, take the following example:

\[ twice\ f\ x = f\ (f\ x) \]

\(twice\) is a function that takes two arguments, a function and some other value, and then applies the function ‘twice’ to the value. So if we had a function \(add\text{-}one\) which adds 1 to a number and called \(twice\ add\text{-}one\ 5\) we would get 7.

Treating functions as ‘first-class’ values has become quite ubiquitous these days, and for good reason! First-class functions allow you separate the *form* of a computation from the task being accomplished. This is one of the more powerful ideas from functional programming, and as such, Burge will explore it in depth later on.

Up to now we’ve used variables without discussing them, because many of us will have an intuition for what variables mean from other languages or from algebra in school. In this section we’ll be more precise about what a variable means.

Take the following mathematical equation:

\[ f\ x = (5 \times x) + 2 \]

This is a function over the variable \(x\). When you plug in different values for \(x\) you get different results.

There are two very important properties about variables like the ones in the equation above. The first is that it doesn’t matter what name we give \(x\): \(f\
y = 5y +2\) is the *exact* same equation with the exact same meaning. The second is that in mathematics we don’t actually *change* the values of a variable: Once we’ve plugged in a value for \(x\) that value remains the same.

To better formalise how variables work we’ll use a notation developed by Alonzo Church. The following defines the same function using *lambda (\(\lambda\)) notation*:

\[ \lambda x.(5 \times x) + 2 \]

We haven’t given the lambda expression a name, which is why in some languages they call lambdas ‘anonymous functions’. There’s nothing stopping us from giving the above a name though

\[ f = \lambda x.(5 \times x) + 2 \]

defines the same function again, this time giving it the name \(f\). Here is the crucial point: \(f\ x = (5 \times x) + 2\) and \(f = \lambda x.(5 \times x) + 2\) are *the same function*.

In the same way that function application had two parts, the operator and the operand, lambda expressions also have two parts: the *bound variable* and the *body*. Everything between the lambda (\(\lambda\)) and the period (\(.\)) is the bound variable and everything after the period is the body.

Lambda expressions are functions, which means you can apply values to them (i.e they can be the operator part of a function application). When you apply a value to a lambda expression you substitute that value wherever the bound variable appears in the body and get rid of the bound variable part of the lambda expression, leaving only the body. So \((\lambda x. x + x)\ 5\) becomes \(5 + 5\).

We can happily pass lambda expressions to other functions as well (since they are values themselves). Remembering the definition of \(twice\) from earlier, the expression

\[ twice\ (\lambda x. x + 1)\ 5 \]

is equal to \(7\).

What about multi-argument functions? Well we already learned that multi-argument functions are just single argument functions that return functions, so let’s apply that idea here and define \(add\) with lambdas:

\[ add = \lambda x.\lambda y. x + y \]

This actually makes what’s happening when we *partially apply* a function more clear. If we only pass \(add\) one argument what do we get? \((\lambda x.\lambda y. x + y)\ 1\) becomes \(\lambda y. 1 + y\), or the function that adds \(1\) to its argument.

So we’ve spent a lot of time going on about function application, lambda expressions, and simple expressions (constants and variables). What do we know now? Well we know all the forms an expression can take. To paraphrase Burge:

An expression is

- either simple and is a identifier
- or a
lambda expression+ and has abound variablewhich is an identifier + and abodywhich is an expression- or it is a
function application+ and has anoperatorand anoperand, both of which are expressions

In the next post we’ll show you the language Burge describes in full, but it’s important to emphasize: Every language construct we will show you can be translated to expressions consisting only of what’s described above. A whole *family of languages* arises from the three possibilities above. As promised, we have two ways of combining expressions: lambda expressions and function application. The only other piece is that we need a way to refer to things, which is where the identifiers (variables or constants) come in..

The core of Burge’s book, and Functional Programming more generally, is in what is possible with expressions. From this fact stems a lot of what makes Functional Programming such an interesting and exciting field. It affects the way we think about data, the way we think about program structure, and even how we *implement* our languages. Once equipped with these tools it’s easy to see the shared principles of all functional languages.

It’s very easy to get intimidated by a lot of the language that is thrown around in certain Functional Programming circles these days, but rest assured; understanding expressions and how we can reason about them and manipulate them gets you much further than diving into arcane mathematics ever will.

The rest of the series is going to be a pretty crazy ride, and it all comes back to what we’ve learned here; that expressions come in three mains forms: identifiers, function applications, and lambdas.

Buckle up.

Josh Triplett, Kelley Robinson, Heather Miller, Rob Rix, and Michael Banks all gave constructive feedback on drafts of this post. Thank them if any of this made sense.

And of course, as promised, we’ve got another Prog Rock hit from the early 70’s, this one’s about a show “you’ve got to see”, a.k.a. how all of the language constructs we are used to can be made up of just expressions ;)

-JMCT

Note that in modern programming language texts and papers

*range*would almost universally be called the*codomain*. The reasoning is simple: The range of a function is the set of values it*actually*returns, whereas the codomain is the set of*possible*return values. So when reasoning about types we’re actually reasoning about the codomain.↩Here the symbol \(\in\) can be pronounced as ‘has the type’. The use of this symbol, which is borrowing from Set Theory, has fallen out of favor since the 80’s. The reason for this is that types aren’t

*really*sets, and so using that notation implied something that wasn’t always true. These days you’ll almost universally see the symbol ‘\(:\)’ used (e.g. \(\ f : A \rightarrow B\)), unless you’re reading something written with Haskell in mind, where it would be \(f :: A \rightarrow B\).↩The LISP, Racket, and Clojure communities might make this argument, for example.↩

The constants are defined by the system in use and aren’t necessarily equivalent to their pure mathematical counterparts. For example adding two numbers in many languages introduces the possibility of overflow, whereas the ‘true’ \(+\) has no such possibility. Or how some languages automatically convert all numeric values to floating point, etc.↩

For those that are curious about the naming, \(addT\) is the adding a tuple, and \(addC\) is the ‘curried’ addition.↩

The difference is that in \(f\ x\ (y\ z)\) only

*two*arguments have been passed to \(f\): \(x\) and \((y\ z)\). The expression \((y\ z)\) is itself a function application with \(y\) as the operator and \(z\) as the operand.↩

This is the first in a series of posts about what I am calling the “Burge School of Functional Programming”. When I originally tweeted that I was going to do this, I called it the “Burge-Runciman” School, but upon further reflection I think that they are distinct and I was being clouded by the fact that it was Colin Runciman who taught me about Burge.

There has been a lot of discussion online in the past few years about what Functional Programming *is*. This discussion comes in waves and we recently had a high-point when a list claiming to classify levels of Functional Programming was widely distributed.

I, as well as many of my colleagues, found the list a bit disconcerting. It is a dangerous game to try and make such a prescriptive list about a topic; it is a near certainty that any benefits of such a list are far outweighed by the likelihood of any such list alienating those who are looking to get started in Functional Programming.

Independent of the situation I just described, I’ve been dissatisfied with the attitude that Haskell is the ‘truest’ Functional Language, or that it somehow exhibits FP in its ‘purest’ form. Because of this I have wanted to write about some Functional Programming history for a few years now. In particular I want to draw some attention to William H. Burge’s book “Recursive Programming Techniques”, which is a (sometimes forgotten) gem of Functional Programming.

When I was a PhD student at York, I remember reading a paper about parser combinators in Haskell. I went into Colin Runciman’s office and expressed my joy, and in particular I expressed something like “it’s amazing that they were able to come up with something like that!”. This was early days of my PhD when I still wasn’t so great at identifying what’s *actually* novel in a paper. Colin told me to wait a minute and he went to his bookshelf and grabbed an old book: “Recursive Programming Techniques” by William H. Burge. He suggested I read it.

The book was first published in 1975, three years before Backus’ famous Turing award lecture “Can programming be liberated from the von Neumann style?”, which is often seen as a watershed moment in the history of functional programming because it brought so much attention to the field. In my view Burge’s book has aged much better than Backus’ lecture, but that is likely due to other factors.^{1}

Burge’s book is, in my opinion, one of the best publications on what functional programming is about. This is made even more intriguing when you think about what *wasn’t* around when this book was first published. In 1975 we did *not* have:

- Polymorphic Type Inference
- Algebraic Data types
- Lazy languages
^{2} - Lots of Category-theoretic terminology as part of functional programming

This is in contrast to how many commentators online talk about functional programming. However, much in the same way that music is not a set of instruments, functional programming is not a set of abstractions that we need to learn and memorize. Functional programming is an *approach* to solving computational problems.

Many of the abstractions that you do read about are ways to apply this approach to new problems, or problems that were difficult to solve without reaching for more traditional programming methods. But the essence, the core, of what functional programming is about is mostly unchanged.

In the preface to the book Burge writes (emphasis mine):

The main emphasis [of this book] is placed on those parts of the language, namely expressions, that denote the end results sought from the computer, rather than on the instructions which the machine must follow in order to achieve the results.

The main thesis of this book is that, in many cases, this emphasis on expressions as opposed to mechanisms simplifies and improves the task of programming.

The above will be familiar to anyone who has had the good fortune of taking a well designed functional programming course. This is the paradigm shift that sometimes makes functional programming hard to learn for those that are used to other methods of programming. Expressions over mechanisms.

Part of what makes this book so interesting is that it was published as part of a series on programming from IBM. The series was called “The Systems Programming Series” and its charter was

a long term project to collect, organize, and publish those principles and techniques that would have lasting value throughout the industry.

IBM was trying to combat the tendency for systems programmers to all continually reinvent the wheel. In other words, this was not a series for the navel gazers in the Ivory Tower, this was for those on the ground programming real systems and solving ‘real world’ problems. Granted, the programming tasks of the day weren’t necessarily the same tasks that a programmer today might be focusing on, but the tools and techniques are more similar than one might think.

The book itself is divided into 5 chapters:

- Basic Notions and Notations
- Program Structure
- Data Structures
- Parsing
- Sorting

In the coming weeks we will take a look at each chapter and highlight Burge’s insights. The goal here is to provide a counter to the checklist approach to functional programming material. The core ideas are all here, my hope is that once those are internalized a lot of the modern discourse on Functional Programming can be seen in a new light.

Many of the older functional programmers I know were big Prog Rock fans. I find it fitting that many of those that were early proponents of what was a radical programming methodology were into radical music as well. I don’t know Burge, but I like to imagine that the soundtrack of the 1970’s was in the air as he was writing this book. With that spirit I’m going to link to a Prog Rock hit from the early to mid 1970’s with each post; because, why not?

So, for all of those early functional programmers that were called ‘dreamers’:

-JMCT

Backus was trying to do more than just explain functional programming in his lecture, he was trying to turn the tide of programming language research. Because of this some of the technical work in the lecture has not aged super well. For instance in certain parts of the paper Backus emphasises point-free programming using an APL-style syntax. In the decades since the general consensus has moved away from this style.↩

There were call by name languages, and the concept of a delayed computation or a ‘thunk’ was already well known. But I’d argue that the study of laziness

*as a discipline*really only took off in the later 1970’s with the famous papers “A Lazy Evaluator” (1976) and “CONS should not Evaluate its Arguments” (also 1976). Let me know if I’ve missed something there.↩