Project 2: LVar → x86-64 Assembler
Project due date:
Starter code: Download the starter here!

(Pictured: debug-server.rkt
in action.)
This project is inspired by Essentials of Compilation (EoC), but the code here is a from-scratch reimplementation with some simplifications to keep the workload reasonable and to make grading/debugging clearer. Where behavior differs from the book, this README is the source of truth for expectations, file names, and command lines.
You will compile a tiny expression language (R1/LVar, used
interchangably) for straight-line arithmetic with let
-bindings and
read
into x86-64 assembly, then produce a real binary and test it on
actual inputs. Essentially, programs in our language are sequences of
reads, ultimately printing one final output.
Input Language (R1 / LVar)
R1 (covered in the book, chapter 2) is a straight-line expression
language with integers, variables, let
, unary minus, addition, and a
zero-argument (read)
primitive that consumes one integer from stdin
at runtime.
;; Expressions (exp) and programs
(define (R1-exp? e)
(match e
[(? fixnum? n) #t]
[`(read) #t]
[`(- ,(? R1-exp? e)) #t]
[`(+ ,(? R1-exp? e0) ,(? R1-exp? e1)) #t]
[(? symbol? var) #t]
[`(let ([,(? symbol? x) ,(? R1-exp? e)]) ,(? R1-exp? e-body)) #t]
[_ #f]))
(define (R1? e)
(match e
[`(program ,(? R1-exp? exp)) #t]
[_ #f]))
Programs on disk are s-expressions, e.g.:
(program
(let ([x (read)])
(let ([y (+ x 10)])
(+ y (- y)))))
Repository Layout
compile.rkt
– TODO - Your pass implementations. You will edit functions provided here. -> This is the only file you will edit! The rest are read-onlyirs.rkt
– IR definitions and predicates likeanf-program?
,c0-program?
, etc.interpreters.rkt
– Reference interpreters for several IRs (used by tests and for your own debugging).system.rkt
– System/ABI configuration, pass names, runtime filenames, output paths, etc.main.rkt
– Driver that runs all passes, can build a binarydebug-server.rkt
– Launches a debugging server to help visualize each compiler passtest.rkt
– Test harness. Runs isolation tests or end-to-end native tests depending on-m
mode.runtime.c
– Minimal runtime (read_int64
,print_int64
, etc.).test-programs/
– Example programs (.scm
).input-files/
– Input streams for programs (lines of integers).goldens/
– Instructor goldens (IR snapshots, interpreter outputs, and stdout baselines).
Please do not rename files or directories (grading infra depends on them).
The Compiler Pipeline (passes)
The compiler consists of a number of passes. Please do not get overwhelmed: each of these passes is going to be very small, and each have a specific, isolated behavior that will allow us to explain the specific stages of compilation. Additionally, as the code is compiled, it is translated into an increasinglly-lower-level IR.
You will implement the following passes:
uniqueify
→ ensures every bound identifier is unique
Input:R1?
→ Output:unique-source-tree?
→ Interp:interpret-R1
anf-convert
→ ANF conversion (introduce temps to flatten nested ops)
Input:unique-source-tree?
→ Output:anf-program?
→ Interp:interpret-anf
explicate-control
→ convert ANF to C0 (a statement/block flavor)
Input:anf-program?
→ Output:c0-program?
→ Interp:interpret-c0
uncover-locals
→ collect local variables (stack frame sizing later)
Input:c0-program?
→ Output:locals-program?
→ Interp:interpret-c0
select-instructions
→ map C0 statements to pseudo-x86 instructions over vars
Input:locals-program?
→ Output:instr-program?
→ Interp:interpret-instr
assign-homes
→ assign each var to a stack slot(deref (reg rbp) offset)
Input:instr-program?
→ Output:homes-assigned-program?
→ Interp:interpret-instr
patch-instructions
→ break illegal memory→memorymovq
into two moves via%rax
Input:homes-assigned-program?
→ Output:patched-program?
→ Interp:interpret-instr
prelude-and-conclusion
→ function prologue/epilogue; callprint_int64
on%rax
Input:patched-program?
→ Output:x86-64?
→ Interp:interpret-instr
dump-x86-64
→ render to assembler text (string)
Input:x86-64?
→ Output:string?
→ Interp:dummy-interp-x86-64
(skips interpretation)
What you implement: All passes in compile.rkt
are
scaffolded. Your job is to produce outputs that satisfy each predicate
in irs.rkt
and that remain semantically equivalent (the interpreters
check this).
IR Predicates and Interpreters
-
Predicates (
irs.rkt
): Each pass has a precise predicate (e.g.,anf-program?
,c0-program?
, etc.). Tests check both input and output predicates at every step to catch malformed IR early. You should read and understand each predicate. -
Interpreters (
interpreters.rkt
): For many stages we run the interpreter over your IR to check semantic consistency—different IRs must compute the same result on the same input stream. The harness reports whether outputs match and whether stdout across passes is consistent. Please do not modify this file.
How to Compile and Run
To directly compile a single file using main.rkt
:
racket main.rkt test-programs/<prog>.scm
The compiler dumps a long summary of the work done at each pass, and
ultimately yielding a program ./output
which it compiles using
system-specific infrastructure (this should work on Linux/Mac–ping me
if any issues). For example, I can see:
kkmicins@lcs-QVR7XH2QGR p1 % racket main.rkt test-programs/r0-0.scm
Compiling IR (using *your* compile.rkt) …
'(pushq (reg rbp))
...
🏗️ Now building a binary... 🏗️
-> Host: macosx/x86_64 Target: x86_64-apple-darwin Entry: main
/usr/bin/clang -target x86_64-apple-darwin -Wall -O2 -c ./output.s -o ./output.o
/usr/bin/clang -target x86_64-apple-darwin -Wall -O2 -c ./runtime.c -o ./runtime.o
/usr/bin/clang -target x86_64-apple-darwin -Wall -O2 ./output.o ./runtime.o -o ./output
Success! Executable produced at: ./output
Done!
kkmicins@lcs-QVR7XH2QGR p1 % ./output
42
Here, I invoked main.rkt
and passed it a single file, one of the
files in test-programs
. This is by far the easiest way to begin
your work on this project. My advice to you is to start with the
simplest programs in test-programs
, get them to compile, and then
move on to larger programs.
For programs that (read)
input, they will emit programs which
require you to type inputs into the console. For example, (print
(read) + (read))
will compile to a program which calls read_int64
(from runtime.c) twice. You can either manually type in inputs, or you
can use redirection (e.g., output <input-files/1.in
) to input them
from a file.
Common Tricky Systems Issues
Systems-specific code is in system.rkt
, which you should (hopefully)
have to use minimally, as my starter files give a good amount of help
here.
- IMPORTANT: ensure 16-byte stack alignment before calls (System V / macOS both require it).
- IMPORTANT: To ensure your code is portable across Linux/Mach
ABIs, our intermediate IRs work with “symbols” rather than labels,
and (during the last pass,
dump-x86
) we specialize labels to Mach/Linux. To do this, you should use the helper function(rt-sym symbol)
which produces “_symbol” on Mac and “symbol” as required by the ABI. - Output assembly:
./output.s
- Objects:
./output.o
and./runtime.o
- Binary:
./a.out
- Assembler/Linker: Clang by default. Override with
CC=/path/to/clang
if needed. - Runtime linkage:
runtime.c
is compiled and linked in. The renderer emits externs for runtime symbols. - Linux linking uses
-no-pie
for simple non-PIE linking.
Debug server
There is a debug server, which (if you can get it to work) offers the ability to interactively visualize the output of each pass of the requisite input in your language.
racket debug-server.rkt
You can either type programs, or you can (hopefully) select from a set of programs.
Testing infrastructure
test.rkt
is the primary entrypoint during development. Modes:
frontend
– runsuniqueify
→anf-convert
and checks ANF via interpretermiddleend
– runsexplicate-control
→uncover-locals
and checks viainterpret-c0
backend
– runsselect-instructions
→patch-instructions
and checks viainterpret-instr
native
– full compile to assembly, build the binary, run it with real stdin; compare stdout to goldens
Usage:
racket test.rkt -m <mode> -i <comma-separated input files> -g <comma-separated goldens> <program.scm>
<program.scm>
is a file intest-programs/
-i
paths are usually in./input-files/
-g
paths point into./goldens/
Examples:
racket test.rkt -m middleend -i ./input-files/1.in \
-g goldens/r0-0_1_uniqueify.stdout,goldens/r0-0_1_explicate-control.in-ast \
test-programs/r0-0_1.scm
racket test.rkt -m backend -i ./input-files/1.in \
-g goldens/r0-0_1_uniqueify.stdout,goldens/r0-0_1_select-instructions.in-ast \
test-programs/r0-0_1.scm
racket test.rkt -m native -i ./input-files/1.in \
-g goldens/r0-0_1_uniqueify.stdout \
test-programs/r0-0_1.scm
Autograder tests
The autograder tests use the tester.py
infrastructure. These tests
are designed to be compatible with the Autograder. You can use the
standard python3 tester -av
to run all tests (in verbose mode), or
python3 tester.py -t <TEST> -v
to run a specific test.
Goldens: what they are and how we use them
We use goldens in several ways:
1) Isolation goldens (IR & stdout):
- For non-native modes, we compare:
- Your pass output (serialized) to an instructor-provided
.in-ast
file - The interpreter’s stdout to a
*.stdout
file
- Your pass output (serialized) to an instructor-provided
2) Native goldens (end-to-end):
- We run your produced binary and compare its stdout to a golden stdout file
How goldens are organized:
goldens/<progname>_<input#>_<passname>.in-ast
– serialized input to that passgoldens/<progname>_<input#>_<passname>.out-ast
– serialized output of that passgoldens/<progname>_<input#>_<passname>.interp
– interpreter result snapshotgoldens/<progname>_<input#>_<passname>.stdout
– stdout (what we actually check)
The instructor has a gengoldens
mode to regenerate these. Students do not need it.
Common Errors & Fixes
- “Satisfies output predicate: no”
- Your pass constructed malformed IR. Re-read the corresponding
…-program?
predicate inirs.rkt
. Print your term and compare shapes.
- Your pass constructed malformed IR. Re-read the corresponding
- Interpreter mismatch or inconsistent stdout across passes
- You are changing the behavior of the code in compiling it. Debug by narrowing the pass window. Compare interpreter outputs before and after each pass to locate the offensive one.
- Illegal memory→memory
movq
patch-instructions
must rewrite(movq (deref …) (deref …))
into two moves via%rax
.
- Stack alignment or frame size wrong
prelude-and-conclusion
relies on correct locals and offsets. Ensureuncover-locals
andassign-homes
agree.
- Executable missing after “building a binary…”
- Toolchain failed. Check clang/ld messages. On Linux ensure clang
is installed (or set
CC
).
- Toolchain failed. Check clang/ld messages. On Linux ensure clang
is installed (or set
- Mach-O underscores vs ELF
- Do not hardcode
_
. Always go throughrt-sym
andruntime-function-externs
.
- Do not hardcode
- Native test says no executable present
- You ran
-m native
without a successful build. Ensure the final pass emits assembly and the assembler/linker stage succeeds.
- You ran
FAQ
- Do I need register allocation?
- Not in P1. We mostly compute into
%rax
, store into vars, then place vars on the stack.
- Not in P1. We mostly compute into
- Can I write helper functions?
- Yes, as long as pass interfaces/types remain as specified by the predicates.
- Can I emit calls to other C library functions?
- Not in P1. Use the provided runtime functions (
read_int64
,print_int64
).
- Not in P1. Use the provided runtime functions (
- Why underscores on macOS?
- Mach-O prefixes symbols.
rt-sym
abstracts this. Don’t add_
yourself.
- Mach-O prefixes symbols.
- How are inputs fed to my program?
- For native tests, the harness writes the whitespace-separated
numbers from the
.in
file to your program’s stdin. Your stdout must match the golden.
- For native tests, the harness writes the whitespace-separated
numbers from the