Week 4: Algebraic Data Types

Week 4 Lecture Notes

I have decided to shift the schedule around a little bit to match course pacing, which does not affect the exam topics. I will talk about algebraic data and pattern matching this week in class.
Code links from previous classes (the ones I have) are on Blackboard.
Our goal today is to figure out how to use just simple cons cells to build tree-shaped data structures. Tree-shaped, or algebraic data is a ubiqutuous topic in functional programming, which emphasizes recursion (and induction) over tree-shaped data types, with finite objects built using some (finite) number of constructors (functions) each of which has a corresponding sequence of arguments (of some specific arity).
We will not study the formalization of algebraic data right now, but to briefly summarize the mathematics: a sequence of arguments (as input to a constructor) is characterized by a product type (a tuple is characterized by a * operation), and the ability to have multiple constructors is characterized by a sum type. For this reason, we call this kind of tree-shaped data, algebraic data.
Last week in class, we talked about building lists with cons. One key idea to establish is that every list can be canonically represented in the form:

(cons x0 (cons x1 ... (cons xn '()) ...))

Every list is ultimately written something like the above, with each individual cons cell represented as a heap-allocated (not with malloc but some custom allocator that allocates cons cells specifically) struct (a pair of the car and the cdr).
In Racket, lists are untyped, dynamic errors on lists happen when the expected length (or type of the underlying data) does not match the type expected by the code. This means that for now, we don’t have to think about the types–but we do have to be careful with how we use data.
We will introduce the idea of “type predicates.” A type predicate is a Racket function (returning only #t or #f) which classifies whether a given Racket value inhabits the type being defined. For example, the following type predicate defines a simple type of binary tree where each node contains some arbitrary value–there is no type checking imposed upon the value v carried by the node.

(define (tree? t)
  (match t
    ['empty #t]
    [`(node ,v ,(? tree? t0) ,(? tree? t1)) #t]
    [_ #f]))

The above code uses pattern matching, we will walk through it in class and carefully explain this example. Any time where you are using pattern matching, you could also (with more typing) be explicit, and build a large cond instead; using cond, each condition checks whether the pattern matches, and each corresponding branch extracts the matched elements and computes with them. This is basically what the pattern matcher is doing under the hood (but with the ability to reason about some properties of the pattern and optimize the search, potentially).

Example trees here include:

(tree? 'empty) ;; #t
(tree? '(node 1 empty empty)) ;; #t
(tree? '(node 25 (node "hello" empty empty) empty)) ;; #t
(tree? '(node "hello" 1 2)) ;; #f

Our type predicate will typically have the pattern:

(define (type? t)
  (match t
    [... #t]
	...
	[_ #f]))

In general, an algebraic type has a set of constructors–the type predicate will be formed by building a match statement over each of these constructors. For example, we might imagine a specification for a set of unary or binary trees:

(define (ubtree? t)
  (match t
    ['empty #t]
    [`(unary ,v ,(? ubtree? t)) #t]
    [`(binary ,v ,(? ubtree? t0) ,(? ubtree? t1)) #t]
    [_ #f]))

We can read the Racket code as implementing a common English language description of the corresponding inductive definition of lists (using structural recursion to define the shape of lists):

An value t is a ubtree? iff it:

is the specific symbol 'empty
A list of length three beginning with 'unary is a ubtree?, assuming the third element also has to be a ubtree?.
A list of length four beginning with 'binary is a ubtree?, assuming its last two elements are also ubtree?s.
Last, and importantly, no other types of values are ubtree?s.

We eschew type theory at the moment, and in the first parts of this course we will focus on operational semantics. The role of type theory is to give a set of rules which can be run (and will necessarily terminate) to decide these stipulations are met. In Racket, we do not have a static type system–it is easier to check these properties on-the-fly and allow them to dynamically fail. Also, in my descriptions so far I’m cheating by not constraining the type of v in any way–in a static type system, we’d have to give v a type as well, and features like parametric polymorphism offer one way to achieve this statically.

So to sum it up, an algebraic data type is a type that is built out of multiple constructors, each of which is allowed to take a specific sequence of arguments. We will not explicitly build this type into the type system in Racket (as Racket lacks a type system), but instead we will write “type predicates,” functions which implement all of the runtime checks necessary to operationalize the type.

Programming with algebraic data involves matching on it. The great news here is that every usage of algebraically-defined data must follow the same template as the type definition itself. This is not just an idiom–it is a requirement to ensure that the semantics is total, and prevent dynamic errors in the form of match failures. For example, let’s say we have distance?:

(define (distance? d)
  (match d
    [`(,(? integer?) feet) #t]
    [`(,(? integer?) miles) #t]
    [_ #f]))

Now, if we have a function that expects distances, it would be an error to write something like:

;; distance? -> integer?
(define (to-inches d)
  (match d
    [`(,i feet) (* 12 i)]))

The issue is that the match pattern is non-exhaustive with respect to distance?. Again, just to be clear–the comment is not enforced by Racket at all, it is a conceptual annotation of what expect the static type of the function to be. To see the bug, we can look at the following call:

(to-inches '(5 miles))

Now, to-inches has no match pattern for '(5 miles), and so it raises a dynamic error. If Racket had a static type system (like Typed Racket does), it could statically (at compile time) tell you that the match was non-total–however, this is also an optional feature (sometimes a compiler warning) in practice. But to us, it doesn’t seem very meaningful of a distinction, since we’re still seeing the bug at the same time. This becomes more important in larger developments, where errors manifest possibly far away from the code in which they originated (the “root cause” of the bug).

Read the Documentation

Here: https://docs.racket-lang.org/guide/match.html

Exercises

Let’s define a shape? to be one of the following:
- Rectangles can be shapes, and have to include points x0, y0 and x1, y1
- Circles can be shapes, and have a starting point x0, y0 and a radius r which is a real number
- No other things are shapes.
Write this as a Racket type predicate.
Define the function area on the type predicate shape?.