Project 4: vectors, set!, and loops

Download the starter files here

This project is inspired by Essentials of Compilation (EoC), but in this project I take a departure from the book’s content in several important ways. The language in this project, R3, is an imperative-style language with loops, but at the core the important thing happening is dynamic memory allocation on the heap. In the EoC book, this is covered alongside garbage collection–in my class (at least this iteration) I prefer to skip implementing garbage collection: you need to know C well (to do it correctly), and the algorithms are both tricky to describe and challenging to implement and debug in class where we have not covered C programming. Thus, I opt for a simpler route: just use malloc, and (optionally) get garbage collection by using (e.g.,) the Boehm garbage collector; I also leave something such as arena allocation as a stretch goal, which you should pursue if you are a PhD student or interested to learn more.

Input Language (R3)

this language, which I am calling R3, adds imperative features to R2/LIf. Specifically, we add the forms:

(let* ([x e] ...) e-b) ;; let*, allows multiple simultaneously definitions
(void) ;; operations like vector-set! return void
(make-vector i) ;; Allocate a vector of size i, initialize all entries to 0
(vector-ref e i) ;; Dereference the vector e at index i, i must be static 
(vector-set! e i ev) ;; e[i] := v
(set! x e) ;; Any variable can be set!d
(while e-g e-b) ;; While e-g, do e-b, eval to (void)
(begin es ... e-last) ;; Do all of es..., eval to the result of e-last

As an example of an R3 program…

 (program
              (let* ([x (read)]
                     [i 0]
                     [acc 0])
                (begin
                  (while (< i x)
                         (begin
                           (set! acc (+ acc i))
                           (set! i (+ i 1))))
                  acc)))

The major extensions are these:

We have a new literal, called (void). This is sometimes called the “unit value,” and carries no computational information, i.e., control-flow never depends on the value of its contents; in an ideal world we would eliminate voids entirely via optimization, but their inclusion allows us to answer questions like: “what should vector-set! return?”
Programs may allocate vectors using (make-vector i), which zeros all entries of the vector. These get translated into calls to an allocation function in runtime.c.
Programs may perform vector operations, but these operations are limited to constant indices (i.e., we can’t compute the index that we want to write / read via vector-ref/vector-set!))–we may generalize this in subsequent projects.
All variables may be mutated via set!, our language is no longer pure. We handle this by “boxing” every variable, a process we describe below.
Mutability goes hand-in-hand with loops, in the sense that mutable variables are not much fun unless you add loops. So we add a form, (while e-g e-b), which continually evaluates the loop guard e-g and executes the body e-b (discarding its result) until the loop is finished. (while ...) reduces to (void).

Repository Layout

compile.rkt – Your pass implementations. You will edit functions provided here. -> This is the only file you will edit! The rest are read-only
irs.rkt – IR definitions and predicates like anf-program?, c1-program?, etc. (see also typed/shrunk variants)
interpreters.rkt – Reference interpreters for several IRs (used by tests and for your own debugging).
system.rkt – System/ABI configuration, pass names, runtime filenames, output paths, etc.
main.rkt – Driver that runs all passes, can build a binary, and can launch a debug server.
test.rkt – Test harness. Runs isolation tests or end-to-end native tests depending on -m mode.
runtime.c – Minimal runtime (read_int64, print_int64, etc.).
test-programs/ – Example programs (.scm).
input-files/ – Input streams for programs (lines of integers).
goldens/ – Instructor goldens (IR snapshots, interpreter outputs, and stdout baselines).

Please do not rename files or directories (grading infra depends on them).

The Compiler Pipeline (passes)

We skip typechecking in this project; programs are assumed type-correct. If not, behavior is undefined (e.g., segfaults). You’ll implement the following passes in compile.rkt. Each pass must produce an output that satisfies the corresponding predicate in irs.rkt, and preserve behavior across interpretable stages.

shrink Removes binary -, and/or, and <=, >, >= by rewriting to eq?/</if. Preserves let, while, vectors, set!, and begin (canonically as let chains). Rewrites while so that it appears only in non-tail position: (while e-g e-b) => (let ([_ (while e-g e-b)]) (void)) Input: R3? → Output: shrunk-R3?
uniqueify Alpha-renames so that each bound identifier is written exactly once. Output program’s info field is ().
Input: shrunk-R3? → Output: unique-source-tree?
assignment-convert Boxes variables. Rewrites:
- let → (make-vector 1) + vector-set! to initialize
- uses of x → (vector-ref x 0)
- (set! x e) → (vector-set! x 0 e) Loops and vector ops remain.
  Input: unique-source-tree? → Output: assignment-converted-program?
anf-convert Converts to ANF: RHS are atoms or small ops over atoms; side-effects sequenced via let ([_ (vector-set! …)]) … and let ([_ (while …)]) …; if guards are atomic. Input: assignment-converted-program? → Output: anf-program?
explicate-control Lowers ANF to C2: a map from labels to tails with seq (assign x rhs) …, if (cmp a0 a1) (goto lt) (goto lf), goto, return a, and standalone vector-set!. while becomes header/body/rest labeled blocks. Input: anf-program? → Output: c2-program?
uncover-locals Computes the set of locals (variables assigned or used as vector operands) for stack layout; carries it as the program’s info set. Input: c2-program? → Output: locals-program?
select-instructions Selects pseudo-x86 instructions still using (var x) operands. Uses cmpq + set{e,l} + movzbq for comparisons, calls read_int64/make_vector, and on return jumps to a special conclusion label (no early ret). Input: locals-program? → Output: instr-program?
assign-homes Replaces (var x) with stack slots (deref (reg rbp) n) using the locals set; negative 8-byte offsets. Input: instr-program? → Output: homes-assigned-program?
patch-instructions Fixes illegal instructions (e.g., movq (deref …) (deref …), certain cmpq forms) by shuttling through a temp register. Input: homes-assigned-program? → Output: patched-program?
prelude-and-conclusion Inserts function prologue/epilogue and the conclusion block that prints %rax, sets %rax := 0, and returns. Ensures 16-byte stack alignment before callq and reserves aligned space for locals. Input: patched-program? → Output: x86-64?
dump-x86-64 Renders GAS assembly text. Emits a global for the entry symbol (rt-sym), includes runtime-function-externs, and prints operands as $imm, %reg, disp(%reg). Input: x86-64? → Output: string?

Your task: implement these passes so each output satisfies its predicate in irs.rkt and preserves semantics (the harness checks using interpreters where applicable).

“Boxing” and assignment-conversion

In this project we embrace mutability by enabling set!, which allows us to make mutable updates to variables:

(program
 (let* ([x (read)]
        [y (read)]
        [tmp 0])
   (begin
     (set! tmp x)
     (set! x y)
     (set! y tmp)
     y)))

Notice that we can use set!, which mutably updates a variable; this is not possible in “pure” functional programs, and mutation is simply not provided in those languages (at least as a builtin). The question is how to implement set!. It may seem like an obvious choice: each variable becomes a single register-sized (8-byte) value on the stack. However, this approach starts to fall apart when variables outlive their stack frames (which will happen when we add functions / procedures). Instead, we use this as an opportunity to discuss another necessary challenge to work towards a general-purpose programming language: heap-allocated data. You likely know about the heap from programming in C with malloc (or C++ with new). These procedures allow us to allocate memory from a large block of available memory (the heap), with the obligation that we later release (“free”) it back to the operating system.

To enable mutability and set!, we will employ a technique called “boxing.” In short, we assume that all data is put in a vector (“box”) on the heap. In our case, we will use a very simple representation of vectors:

typedef struct {
    int64_t len;
    int64_t data[];
} TinyVecPacked;

In terms of memory representation, a TinyVecPacked struct has a very simple layout: a single 64-bit value representing the length, followed by a data array of 8-byte values. For example, the vector containing the sequence 5 followed by 20 (starting at the address 0xAAAA00) would be represented as:

   lower addresses →                                       higher addresses →
   ┌───────────────────────┬───────────────────────┬───────────────────────┐
   │  len (int64)          │  data[0] (int64)      │  data[1] (int64)      │ 
   │  8 bytes              │  8 bytes              │  8 bytes              │
   ├───────── ─ 0xAAAA00 ──┼────────── 0xAAAA08  ──┼──────── 0xAAAA10  ────┤
   │ e.g., 2               │ e.g., 5               │ e.g., 20              │
   └───────────────────────┴───────────────────────┴───────────────────────┘

The length can be used to avoid accesses outside of the buffer: we don’t want to be able to write outside of the bounds of a buffer because it could contain junk data, we want to be able to write outside of the bounds of a buffer because we could mess up other data, unrelated to this vector (but still within our same address space, assuming the OS provides virtual memory). To handle this, we could take one of three positions:

(a) don’t worry about it–just be unsafe, and let the program crash.

(b) check the bound before each access, if a bad access would occur, jump to an exception handler.

Each of these approaches has different trade offs: C, C++, and similar languages opt for (a), which is fast but risky–the program might crash in a subtle or even exploitable way. Managed languages like Java, C#, etc. often opt for (b), but can sometimes optimize to avoid checks if certain conditions are met (e.g., using profiling data and just-in time compilation). The issue with (b) is that it is slow: every single access necessitates a branch, compared with (a) which might just be as simple as movq. Languages like Rust use (c), which enables the benefits of (b) as a zero-cost abstraction, often matching (or even exceeding) the performance of (a).

In this project we will do (a): we assume the program is type correct. This means that as you generate code, you assume that operations for various builtins are also type correct. You will not be given any programs to test which contain type errrors.

In short:

To implement make-vector, you will ultimately emit a call to make_vector in runtime.c.
To impelment (vector-ref v i), you will emit code that uses movq to dereference the vector v. You want to movq the value at 8(i+1), the +1 is due to the fact that the first 8 bytes of every vector holds its length. You could also consider emitting a *safe vector reference, checking the bounds each time (but it is not necessary in this specific case, because we don’t test any offensive programs).
To implement vector-set!, you will also emit a movq instruction.

Translating Loops

A while e-g e-b lowers to three blocks:

header: evaluate guard, then if (eq? a-g #f) (goto rest) (goto body).
body: code for e-b, then goto header.
rest: continuation after the loop.

In ANF/C2 we sequence any side-effects (e.g., vector-set!) via seq and keep the tail forms restricted to return, goto, or a single if.

IR Predicates and Interpreters

Predicates (irs.rkt): Each pass has a precise predicate (e.g., anf-program?, c1-program?, etc.). Tests check both input and output predicates at every step to catch malformed IR early. You should read and understand each predicate.
Interpreters (interpreters.rkt): For many stages we run the interpreter over your IR to check semantic consistency—different IRs must compute the same result on the same input stream. The harness reports whether outputs match and whether stdout across passes is consistent. Please do not modify this file.

How to Compile and Run

To directly compile a single file using main.rkt:

racket main.rkt test-programs/<prog>.scm

The compiler dumps a long summary of the work done at each pass, and ultimately yields a program ./output which it compiles using system-specific infrastructure (works on Linux/Mac).

Example:

$ racket main.rkt test-programs/example.scm
Compiling IR (using *your* compile.rkt) …
'(pushq (reg rbp))
...
🏗️ Now building a binary... 🏗️
-> Host: macosx/x86_64  Target: x86_64-apple-darwin  Entry: main
/usr/bin/clang -target x86_64-apple-darwin -Wall -O2 -c ./output.s -o ./output.o
/usr/bin/clang -target x86_64-apple-darwin -Wall -O2 -c ./runtime.c -o ./runtime.o
/usr/bin/clang -target x86_64-apple-darwin -Wall -O2 ./output.o ./runtime.o -o ./output
Success! Executable produced at: ./output
Done!
$ ./output
42

Programs that (read) input will prompt or consume stdin. You can use input redirection:
./output < input-files/1.in

Debugging Guide

This is a tricky project, and it is really important that you lean into a debugging methodology that works for you. Let me share with you some advice I used when I was writing the compiler myself.

First, I started by writing a single manual test in the top-level of compile.rkt, so that I could easily just sit at the command line (or Dr. Racket) and run compile.rkt (with no command-line arguments) and see if the test would pass. At the early stages of debugging, this is an excellent strategy, since it means fully-pushbutton test. For example, I had (appended to compile.rkt) something like:

(displayln ;; or pretty-print 
 (dump-x86-64
  (prelude-and-conclusion
   (patch-instructions
      ...))))

I started with a fairly large test in my case–since I was merely adding forms not present in my previous implementation. In this case, I would often be hitting match failures–this is a good thing, it allows me to trace down exactly where I need to add match cases and handle new behavior. Of course, the issue is that I also need to mix that with thinking holistically about the specs of each IR.
After I thought I had each pass of the compiler working, I started switching over to main.rkt, which will run all passes of the compiler and will report their outputs. I also needed to write the interpreters (you do not), and so I debugged some of those using main.rkt.
Last, once things are acutally working, I used test.rkt, which I adjusted to account for the possibility of expected type errors in malformed inputs.
My advice to you is similar: start with something where you can press “Run” (or continually reinvoke racket compile.rkt). It will facilitate rapid testing, and it is really important to build some skill and intuition for how to accomplish that exercise.
Remember, debugging is a key concept that you are practicing in this class. The worst possible thing you can do is “guess and check” debugging (running the tests and hoping they pass)–the issue is that doing this gives you very little observability into why the code is broken. To fix this, you need to have some way of interacting with the code. Experts debug their code using an iterative, hypothesis-driven methodology, where they (a) articulate a falsifiable hypothesis (“this match pattern never matches anything”), (b) change their code to observe the bug (“add a displayln at every match handler”), and (c)

Tricky Parts of this Project

BE CAREFUL of the following:

There is no type checking.
In this case, explicate-control needs to be updated, because when generating loops, you need to “take hold” of the return value from the translated block and use it for some purpose (e.g., branch on it, or ignore it and jump back to the header). I have generalized expr->blocks to accept a third argument, k, which is a continuation that gets applied to the eventual return value.
Non-loop conclusion flow: select-instructions must end “returns” by jumping to conclusion; don’t emit ret in the middle of blocks.
Indices must be static: vector-ref / vector-set! indices are fixnum literals at compile time in this project.
On Mac OSX, you need to prefix labels for functions, meaning _main instead of main. To facilitate this, I provide a function (rt-sym s), which converts a symbol to a system-specific variant (based on the OS). However, you only need to worry about this for labels that correspond to exported function entrypoints, not other labels that are internal to the program (e.g., the target of a jump, or even conclusion). This is important, because you don’t want to call (rt-sym ...) until the very last pass (it is ugly to make the previous passes OSX/Linux-specific). The issue is this: if you naively rename every label from foo to _foo, then you also need to ensure that you clean up jumps so that instead of jumping to foo, they go to _foo. In my implementation, I handled this as follows: I simply used (rt-sym ...) on the (entry-symbol) (i.e., main), which is the only function in this project.
Ensure 16-byte stack alignment before callq

Testing Infrastructure

The file test.rkt runs a set of formal tests, as in the last project.

frontend – runs typecheck → anf-convert, checks ANF via interpreter
middleend – runs explicate-control → uncover-locals, checks via interpret-c1
backend – runs select-instructions → patch-instructions, checks via interpret-instr
native – full compile → build binary → run with stdin → compare stdout to golden

Usage:

racket test.rkt -m <mode> -i <comma-separated input files> -g <comma-separated goldens> <program.scm>

Example:

racket test.rkt -m native -i ./input-files/1.in -g goldens/example_1_uniqueify.stdout test-programs/example.scm

Goldens: What They Are

We use goldens to verify correctness:

Isolation goldens – for non-native modes:
- Compare serialized ASTs and interpreter stdout
Native goldens – for full pipeline:
- Run your compiled binary, compare stdout

Layout:

goldens/<prog>_<n>_<pass>.in-ast
goldens/<prog>_<n>_<pass>.out-ast
goldens/<prog>_<n>_<pass>.interp
goldens/<prog>_<n>_<pass>.stdout

Instructor-only gengoldens mode regenerates these.

Autograder Tests

The autograder invokes test.rkt in JSON mode. Your job is to make each pass satisfy its predicates and produce semantically equivalent IRs until final native code passes all tests.