Why Tail-Recursive Functions are Loops
One story every computing enthusiast should hear is the lesson of how loops and tail-recursion are equivalent. We like recursive functions because they’re amenable to induction, and we can derive them in a way that is in direct correspondence with the definition of the datatype over which they recur. We like loops because they’re fast and make intuitive sense as long as variables don’t change in too tricky a way.
In general, recursive functions are slower than loops because they push stack frames: the performance of most programs today is dominated by memory reads/writes. The data we touch the most lives in the cache–we do not want to evict a ton of stuff from the cache, under any circumstance. In a direct-style implementation of a recursive function, the recursive call has to push a stack frame to remember what to do once the function returns:
;; Racket
(define (sum l)
(if (empty? l)
0
(+ (first l) (sum (rest l)))))
// C
int sum(int *l, int length) {
if (length == 0)
return 0;
else
return l[0] + sum(l + 1, length - 1);
}
When we get to (+ (first l) (sum (rest l)))
, we first call (first
l)
(which returns the first element). While we’re making that call,
we have to remember to come back and do (sum (rest l))
–to be fully
precise, we remember that we need to do (rest l)
, then take its
result x
and call (sum x)
, remembering to come back and finally
take that result and add it to the result of (first l)
. The reason
we have to do this is because we need to remember those partial
results (in this case the result of (first l)
): we have to store
them somewhere after all, and each time we make the recursive call, we
need to remember the result of (first l)
from this call–we need
O(n) stack space for a list of size n.
Of course, if we use iteration this all goes away:
;; Racket
(define (sum l)
(define x 0)
(for ([elt l])
(set! x (+ x elt)))
x)
// C
int sum(const int *l, int length) {
int x = 0;
for (int i = 0; i < length; i++) {
x += l[i];
}
return x;
}
We all have an intuitive sense of what the loop is doing: once we hit
the end of the loop, we do not make a recursive call (we never issue
a call
instruction in assembly), we simply jump up to the
beginning of the loop. The key is that x
is being used as an
accumulator, growing a partial result in a bottom-up fashion as
the computation proceeds, eventually yielding the final value at the
end. Instead of keeping partial results on the stack, the loop takes a
constant amount of space but linear time.
In a tail-recursive implementation, the rule is that every recursive
call must be a tail call. Intuitively, a tail call is a call which
is “immediately returned.” More formally, a subexpression of an
expression is in tail position if the return value from that
expression is the return value from the whole expression. For example,
in (if guard e-t e-f)
, both e-t
and e-f
are in tail position,
but the guard is not: after we decide which branch to take, we’re
committed:
(define (foo ...)
...
(if guard
(f x y ...)
(g z ...)))
Once we finish executing guard
, it would be useless (but
correct) to (a) push a stack frame, (b) wait on the result of the
subordinate call, and (c) merely return that same result, because
all we’d be doing is copying the return value from the callee and
propagating it back as the return value of the caller. Being a tail
call is a syntactic property of a callsite: we (and the compiler) can
easily look at a piece of code and cheaply decide when a call is a
tail call versus not.
This reasoning above generalizes to any call expression in tail position: because a tail call will necessarily evaluate to its result, administratively copying it up/down the stack is extensionally a no-op. Now, the tail-recursive version uses a simple trick I teach to all of my students: (a) identify an accumulator variable, (b) instead of computing with the result of the recursive call, compute with the current accumulator, (c) return the accumulator in the base case:
;; Racket
(define (sum l acc) ;; note: acc got added
(if (empty? l)
acc ;; this is the *true* return!
(sum (rest l) (+ acc (first l)))))
// C -- we pass in length manually because we're using arrays
int sum(const int* l, int length, int acc) {
if (length == 0) return acc;
return sum(l + 1, length - 1, acc + l[0]);
}
Both of these functions are tail recursive: because the only recursive
call to sum
is also the return value from sum
(or, more
directly: because both calls to sum
are in tail position). Since the
compiler knows that these are tail calls, a compiler with tail-call
optimization will ensure that both of these tail calls compile into
jmp
statements–with zero implication on stack usage–rather than
the more burdensome (on the cache, stack, etc.) direct-style
calls. Something that should concern you is this: if the function is
using constant stack space, how are the variables being updated /
represented!? The answer is that the arguments get stomped over,
and mutably updated, yielding the exact same performance profile
as a loop!.
Now time for an exercise, what about this program, can you convert it to using tail-recursion?
;; return a pair (cons cell) of the number of even numbers,
;; and the number of odd numbers.
;; HINT: use multiple accumulators.
(define (even-odd l)
(if (empty? l)
(cons 0 0)
(let ([v (even-odd (rest l))])
(if (first l)
(cons (add1 (car v)) (cdr v))
(cons (car v) (add1 (cdr v))))))))
What about this program?
;; flattens a tree? into a list
;; It's hard because there are *two* calls to linearize--can you do anything?
(define (linearize t)
(match t
['empty '()]
[`(node ,v ,t0 ,t1) (append (list v) (linearize t0) (linearize t1))]))
One surprising fact is that we can systematically compile any program so that every call is a tail-call, by completely transforming the program into continuation-passing-style (CPS), this essentially eliminates the stack. Indeed, some compilers for functional languages work precisely this way: those languages cannot fall prey to a stack overflow, because they have essentially traded a monolithic (efficient, array-like) stack for a deeply-nested stack, strewn throughout the heap—because the continuations will be heap-allocated and nested. There are various exciting trade-offs here, but for now I will leave this as is–we will continue next week.