Week 4 Lecture Notes
-
I have decided to shift the schedule around a little bit to match course pacing, which does not affect the exam topics. I will talk about algebraic data and pattern matching this week in class.
-
Code links from previous classes (the ones I have) are on Blackboard.
-
Our goal today is to figure out how to use just simple cons cells to build tree-shaped data structures. Tree-shaped, or algebraic data is a ubiqutuous topic in functional programming, which emphasizes recursion (and induction) over tree-shaped data types, with finite objects built using some (finite) number of constructors (functions) each of which has a corresponding sequence of arguments (of some specific arity).
-
We will not study the formalization of algebraic data right now, but to briefly summarize the mathematics: a sequence of arguments (as input to a constructor) is characterized by a product type (a tuple is characterized by a
*
operation), and the ability to have multiple constructors is characterized by a sum type. For this reason, we call this kind of tree-shaped data, algebraic data. -
Last week in class, we talked about building lists with
cons
. One key idea to establish is that every list can be canonically represented in the form:
(cons x0 (cons x1 ... (cons xn '()) ...))
-
Every list is ultimately written something like the above, with each individual cons cell represented as a heap-allocated (not with
malloc
but some custom allocator that allocates cons cells specifically) struct (a pair of thecar
and thecdr
). -
In Racket, lists are untyped, dynamic errors on lists happen when the expected length (or type of the underlying data) does not match the type expected by the code. This means that for now, we don’t have to think about the types–but we do have to be careful with how we use data.
-
We will introduce the idea of “type predicates.” A type predicate is a Racket function (returning only
#t
or#f
) which classifies whether a given Racket value inhabits the type being defined. For example, the following type predicate defines a simple type of binary tree where each node contains some arbitrary value–there is no type checking imposed upon the valuev
carried by the node.
(define (tree? t)
(match t
['empty #t]
[`(node ,v ,(? tree? t0) ,(? tree? t1)) #t]
[_ #f]))
The above code uses pattern matching, we will walk through it in class
and carefully explain this example. Any time where you are using
pattern matching, you could also (with more typing) be explicit, and
build a large cond
instead; using cond
, each condition checks
whether the pattern matches, and each corresponding branch extracts
the matched elements and computes with them. This is basically what
the pattern matcher is doing under the hood (but with the ability to
reason about some properties of the pattern and optimize the search,
potentially).
Example trees here include:
(tree? 'empty) ;; #t
(tree? '(node 1 empty empty)) ;; #t
(tree? '(node 25 (node "hello" empty empty) empty)) ;; #t
(tree? '(node "hello" 1 2)) ;; #f
- Our type predicate will typically have the pattern:
(define (type? t)
(match t
[... #t]
...
[_ #f]))
In general, an algebraic type has a set of constructors–the type
predicate will be formed by building a match
statement over each of
these constructors. For example, we might imagine a specification for
a set of unary or binary trees:
(define (ubtree? t)
(match t
['empty #t]
[`(unary ,v ,(? ubtree? t)) #t]
[`(binary ,v ,(? ubtree? t0) ,(? ubtree? t1)) #t]
[_ #f]))
We can read the Racket code as implementing a common English language description of the corresponding inductive definition of lists (using structural recursion to define the shape of lists):
An value t
is a ubtree?
iff it:
- is the specific symbol
'empty
- A list of length three beginning with
'unary
is aubtree?
, assuming the third element also has to be aubtree?
. - A list of length four beginning with
'binary
is aubtree?
, assuming its last two elements are alsoubtree?
s. - Last, and importantly, no other types of values are
ubtree?s
.
We eschew type theory at the moment, and in the first parts of this
course we will focus on operational semantics. The role of type theory
is to give a set of rules which can be run (and will necessarily
terminate) to decide these stipulations are met. In Racket, we do not
have a static type system–it is easier to check these properties
on-the-fly and allow them to dynamically fail. Also, in my
descriptions so far I’m cheating by not constraining the type of v
in any way–in a static type system, we’d have to give v
a type as
well, and features like parametric polymorphism offer one way to
achieve this statically.
So to sum it up, an algebraic data type is a type that is built out of multiple constructors, each of which is allowed to take a specific sequence of arguments. We will not explicitly build this type into the type system in Racket (as Racket lacks a type system), but instead we will write “type predicates,” functions which implement all of the runtime checks necessary to operationalize the type.
Programming with algebraic data involves matching on it. The great
news here is that every usage of algebraically-defined data must
follow the same template as the type definition itself. This is not
just an idiom–it is a requirement to ensure that the semantics is
total, and prevent dynamic errors in the form of match failures. For
example, let’s say we have distance?
:
(define (distance? d)
(match d
[`(,(? integer?) feet) #t]
[`(,(? integer?) miles) #t]
[_ #f]))
Now, if we have a function that expects distances, it would be an error to write something like:
;; distance? -> integer?
(define (to-inches d)
(match d
[`(,i feet) (* 12 i)]))
The issue is that the match pattern is non-exhaustive with respect to distance?
. Again, just to be clear–the comment is not enforced by Racket at all, it is a conceptual annotation of what expect the static type of the function to be. To see the bug, we can look at the following call:
(to-inches '(5 miles))
Now, to-inches
has no match pattern for '(5 miles)
, and so it
raises a dynamic error. If Racket had a static type system (like Typed
Racket does), it could statically (at compile time) tell you that the
match was non-total–however, this is also an optional feature
(sometimes a compiler warning) in practice. But to us, it doesn’t seem
very meaningful of a distinction, since we’re still seeing the bug at
the same time. This becomes more important in larger developments,
where errors manifest possibly far away from the code in which they
originated (the “root cause” of the bug).
Read the Documentation
Here: https://docs.racket-lang.org/guide/match.html
Exercises
- Let’s define a
shape?
to be one of the following:- Rectangles can be shapes, and have to include points
x0, y0
andx1, y1
- Circles can be shapes, and have a starting point
x0, y0
and a radiusr
which is a real number - No other things are shapes.
- Rectangles can be shapes, and have to include points
-
Write this as a Racket type predicate.
- Define the function
area
on the type predicateshape?
.