Project 4: Church Encoder

Due: Tuesday, April 21, 11:59PM

Submit via autograde.org. If you are not enrolled in the course, you can still work through this project using the public starter code and tests above.

Lecture notes: Church Encoding

In this project, you will write a Church encoder: a compiler from a large subset of Scheme to lambda-calculus. In other words, you will transform inputs like '(let ([x 0]) x) to outputs like '((lambda (x) x) (lambda (f) (lambda (x) x))). Your output will be quoted data in the lambda-calculus: the test infrastructure will then interpret this into a Racket lambda by using eval.

It is crucial to understand that you are not building an interpreter, as in project 3. Instead, you are building a compiler. Your project will translate the input language into the lambda-calculus by removing each Racket form, one-by-one, and encoding them in the lambda-calculus.

To understand the encodings, please read through the Church Encoding lecture notes, the posted course slides, the course videos, and the recorded class lectures on Blackboard. We have covered many of the forms already: the main challenge in this project is translating the “on paper” knowledge into a working compiler.

The central challenge with this project is debugging. You should try to learn a debugging methodology that works for you. I recommend calling your compiler on smaller and smaller programs until you recognize which part is incorrect. Work to gradually whittle down the test cases until you isolate precisely what is wrong.

Good luck!

The Language: Syntax

Below is the full syntax of the language you must support:

e ::= (letrec ([x (lambda (x ...) e)]) e)
    | (let ([x e] ...) e)
    | (let* ([x e] ...) e)
    | (lambda (x ...) e)
    | (e e ...)
    | x
    | (and e ...) | (or e ...)
    | (if e e e)
    | (prim e) | (prim e e)
    | datum

datum ::= nat | (quote ()) | #t | #f
nat ::= 0 | 1 | 2 | ...
x is a symbol
prim is a primitive operation in list prims
(define prims '(+ * - = add1 sub1 cons car cdr null? not zero?))
The following are *extra credit*: -, =, sub1

This input language has semantics identical to Scheme / Racket, except:

You will not be provided code that yields any kind of error in Racket
You do not need to treat non-boolean values as #t at if, and, or forms (Racket allows “truthy” and “falsy” values – we won’t handle those)
primitive operations are either strictly unary (add1 sub1 null? zero? not car cdr), or strictly binary (+ - * = cons)
There will be no variadic functions or applications — but any fixed arity is allowed (e.g., ((lambda l (first l)) 1 2 3) is not allowed but ((lambda (x y z) x) 1 2 3) is)

Output language

Your output must be a quoted expression in the lambda-calculus:

e ::= (lambda (x) e)
    | (e e)
    | x

Functions and Builtins

You will primarily implement the churchify function, which accepts an input in the input language above and produces an output in the lambda-calculus. churchify recognizes each form in the input language (using a match pattern) and then calls itself.

To make things easier for you, we have used a trick: the function church-compile calls churchify with a large let block to bind each of the builtins. This allows you to handle builtin operators in a relatively simple way: you can just translate (+ e0 e1) to use currying ((+ e0') e1'), where e0' and e1' are the Church encodings of e0 and e1. Then, + will be given a binding because it is wrapped in the top-level let block in church-compile.

Please think about the above paragraph until it makes sense to you: it makes handling builtins overwhelmingly simpler.

Implementation Roadmap

This project is designed so you can implement it in stages, passing more tests at each stage. The early tests (stages 1-8) test churchify directly, so you do not need church-compile or any builtins to start passing tests. Later tests (stages 9-11) go through church-compile and require builtins.

Stage 1: Variables and Natural Numbers

Tests: public-01a-nat-zero, public-01b-nat-nonzero

Start here. Implement these two match cases:

Variables (? symbol? x): Just return the symbol as-is. This is trivial but required for everything else.
Natural numbers (? natural? nat): Encode as Church numerals. A Church numeral n is (lambda (f) (lambda (x) (f^n x))) where f^n means f applied n times. For example:
- 0 -> (lambda (f) (lambda (x) x))
- 3 -> (lambda (f) (lambda (x) (f (f (f x)))))
Write a helper function to produce the (f (f ... x)) part, then wrap it in the two lambdas. Remember to churchify the result.

See the Church Encoding lecture notes for a full explanation.

Stage 2: Single-arg Lambda and Application

Tests: public-02a-lambda-id, public-02b-lambda-const

These are needed for almost everything else, so do them early.

Single-arg lambda (lambda (x) e): Recursively churchify the body: `(lambda (,x) ,(churchify e0))
Single-arg application (fun arg): Churchify both parts: `(,(churchify fun) ,(churchify arg))

Stage 3: Multi-arg Lambda and Application (Currying)

Tests: public-03a-curry-lam, public-03b-curry-app

In lambda calculus, all functions take exactly one argument. Multi-arg functions must be curried.

Multi-arg lambda (lambda (x y z ...) e): Curry into nested single-arg lambdas. Hint: (churchify '(lambda (x . rest) e)) -> `(lambda (,x) ,(churchify `(lambda ,rest ,e)))
Zero-arg lambda (lambda () e): Add a dummy parameter _.
Multi-arg application (f a b c ...): Curry into nested applications. (f a b c) -> (((f a) b) c). Hint: recursively churchify ((f arg) rest...).
Zero-arg application (f): Apply with a dummy argument (lambda (_) _).

See the Church Encoding lecture notes for the background on currying.

Stage 4: Boolean Literals

Tests: public-04a-true, public-04b-false

A Church boolean takes two thunks and calls one of them:

#t -> (lambda (tt ft) (tt)) (calls the true-thunk)
#f -> (lambda (tt ft) (ft)) (calls the false-thunk)

Hint: just churchify the lambda expression shown above. Since these use multi-arg lambdas and zero-arg application, you need stage 3 working first.

See the Church Encoding lecture notes for the full explanation of the thunk-based encoding.

Stage 5: If Expressions

Tests: public-05a-if-true, public-05b-if-false

Since a Church bool selects between two thunks:

(if e0 e1 e2) -> (e0 (lambda () e1) (lambda () e2))

Just churchify that expression – it’s a 3-arg application.

See the Church Encoding lecture notes for how if works with the boolean encoding.

Stage 6: Let Expressions

Tests: public-06a-let-simple, public-06b-let-empty, public-06c-let-multi

Let is syntactic sugar for lambda + application:

(let ([x e0]) body) -> ((lambda (x) body) e0)
(let ([x e0] [y e1]) body) -> ((lambda (x y) body) e0 e1)
(let () body) -> just body

Churchify the desugared form.

Stage 7: Let* and Null Literal

Tests: public-07a-letstar, public-07b-null

let*: Desugar to nested let: (let* ([x e0] rest...) body) -> (let ([x e0]) (let* (rest...) body))
Null '(): Encode like a boolean pair callback that calls the when-null case: (lambda (when-cons when-null) (when-null))

See the Church Encoding lecture notes for the list encoding.

Stage 8: And / Or

Tests: public-08a-and-true, public-08b-or-true, public-08c-and-false

Desugar to nested if expressions:

(and e0 e1 ...) -> (if e0 (and e1 ...) #f)
(or e0 e1 ...) -> (if e0 #t (or e1 ...))
Base cases: (and) = #t, (or) = #f (already provided)

See the Church Encoding lecture notes for the desugaring.

Stage 9: Arithmetic Builtins (in `church-compile`)

Tests: public-09a-add1, public-09b-plus, public-09c-mult, public-09d-zero-true, public-09e-zero-false, public-09f-not

Now fill in the Church-encoded definitions in church-compile:

add1: (lambda (n) (lambda (f) (lambda (x) (f ((n f) x)))))
+: (lambda (n0 n1) (lambda (f) (lambda (x) ((n1 f) ((n0 f) x)))))
*: (lambda (n0 n1) (lambda (f) (lambda (x) ((n0 (n1 f)) x))))
- Warning: heed the note in the course videos: the video gets * wrong, but the slides get it right.
zero?: (lambda (n) ((n (lambda (_) #f)) #t))
not: (lambda (b) (if b #f #t))

Note: these definitions themselves use forms like #t, #f, if, and multi-arg lambdas, which churchify will process. So you need stages 1-8 working for these to work.

See the Church Encoding lecture notes for the derivation of each of these definitions.

Stage 10: List Builtins

Tests: public-10a-cons-basic, public-10b-car, public-10c-cdr, public-10d-null-true, public-10e-null-false

cons: (lambda (a b) (lambda (when-cons when-null) (when-cons a b)))
car: (lambda (p) (p (lambda (a b) a) (lambda () (lambda (x) x))))
cdr: Follow the same pattern as car – what changes?
null?: Think about what each callback should return.

See the Church Encoding lecture notes for the observer pattern behind these definitions.

Stage 11: Letrec (Y-combinator)

Tests: public-11a-letrec-basic, public-11b-letrec-recursive

To handle letrec, use the Y combinator for recursion:

(letrec ([f (lambda (args...) body)]) e1) -> (let ([f (Y-comb (lambda (f) (lambda (args...) body)))]) e1)

You need to define Y-comb in church-compile:

Y = ((lambda (u) (u u)) (lambda (y) (lambda (mk) (mk (lambda (x) (((y y) mk) x))))))

See the Church Encoding lecture notes for the full explanation of the Y combinator and how it enables recursion. The answer is surprisingly simple: you apply the Y combinator to (lambda (f) ...), then bind the result with let.

Integration Tests

The remaining tests (public-add, public-arith0, server-append, etc.) combine multiple features. Once you have stages 1-11 working, most of these should pass automatically.

Extra Credit: sub1, -, =

The sub1 (predecessor) function is significantly more complex than the other builtins. Look it up on Wikipedia if you’d like to attempt it. Interesting history:

“Be this as it may, Kleene did find a way to lambda define the predecessor function in the untyped lambda calculus, by using an appropriate data type (pairs of integers) as auxiliary device. In [69], he described how he found the solution while being anesthetized by laughing gas for the removal of four wisdom teeth.” – Henk Barendregt’s “Impact of the Lambda calculus”

Debugging Tips

Call your compiler on smaller and smaller programs to isolate which part is broken.
Use the staged tests: if public-01a-nat-zero fails, focus on your natural number encoding. If public-05a-if-true fails, look at your if/boolean encoding.

You can test churchify directly in the REPL:

> (churchify '0)
'(lambda (f) (lambda (x) x))
> (churchify '#t)
'(lambda (tt) (lambda (ft) (tt (lambda (_) _))))

You can evaluate and convert back:

> (church->nat (eval (churchify '3) (make-base-namespace)))
3