Project 2: Transitive Closure
Project due date:
Due: Wednesday, March 1st, 11:59PM.
Available on http://autograde.org. Use your credentials to login.
In this project we will implement a domain-specific programming language that allows us to specify network topologies and check their connectivity. Production datacenters and commercial networks are composed of myriad hosts, routers, and switches, connected in a complex web of infrastructure. An important but challenging question is simply: which nodes may connect to which others? Even in the modern age, network misconfigurations still account for global internet outage events. To address this, several cloud providers have begun to provide tools for analyzing properties of networks specified via domain-specific configuration languages. For example, AWS’s VPC Reachability Analyzer is a commercial offering that is now used by some cloud system engineers as part of their day-to-day jobs.
In this project, we will build a minimal language for expressing network topologies and implement transitive closure, the algorithm for graph reachability. This will allow us to check for connectivity properties of (potentially huge) graphs. Our language is loosely inspired by NetKAT, and we encourage you to read about its features if software-defined networking is of interest to you.
Academic Integrity
The coding on this project is to be completed by you alone, without help from any other students. You are encouraged to discuss the project specification at a high level, but should not discuss specifics or show students your solution code.
Input Format
Networks include various pieces of physical infrastructure: routers,
switches, and servers. In this project we will ignore the
physical properties of network nodes, and will simply call every
entity capable of sending or receiving data a “node.” Nodes are
connected by (directed) links. Both nodes and links are specified as
commands, each command residing on a unique line of the input file
format. Syntactically, the two specification commands are (a) the
NODE <name>
command, which specifies the existence of a node named
<name>
and (b) the LINK <from> <to>
command, which establishes a
(directed) link from node <from>
to node <to>
. For convenience, we
assume all nodes are specified before any occurrences of LINK
commands. For example, the following is valid:
NODE node0-name
NODE node1-name
LINK node0-name node1-name
But the next is not:
NODE node0-name
LINK node0-name node1-name
NODE node1-name
We can visualize the first graph in the following way:
+------------+ link +------------+
| node0-name | -------> | node1-name |
+------------+ +------------+
Let’s add some more nodes and links:
NODE node0-name
NODE node1-name
NODE node2-name
LINK node0-name node1-name
LINK node1-name node2-name
Now our network looks like this:
+------------+ link +------------+ link +------------+
| node0-name | -------> | node1-name | -------> | node2-name |
+------------+ +------------+ +------------+
Transitive Closure
The above networks are toy examples. In practice, networks may have
millions of nodes, and orders-of-magnitude more links. Network
engineers work hard to ensure that networks are designed to adhere to
specific connectivity invariants. For example, we may wish to ensure
that main-server
can connect to backup-srv
:
NODE main-server
NODE backup-srv
NODE switch
NODE db-server
LINK main-server switch
LINK switch main-server
LINK switch backup-srv
LINK backup-srv switch
LINK db-server switch
LINK switch db-server
Note that we explicitly enumerated bidirectional links. The above specification may be visualized as:
+-------------+ +------------+ +------------+
| main-server | <------> | switch | <------> | backup-srv |
+-------------+ +------------+ +------------+
^
+------------+ |
| db-server | <-------------+
+------------+
Now, we can make queries:
CONNECT main-server backup-srv
CONNECT db-server main-server
Checking these queries is easy for small examples: we can trace in our
head a path between main-server
and db-server
. But larger examples
may involve a great many nodes, and even more links between them. As
such, we need an algorithm which will infer transitive links through
the input graph:
+----------------------------------------------+
v v
+-------------+ +------------+ +------------+
| main-server | <------> | switch | <------> | backup-srv |
+-------------+ +------------+ +------------+
| ^ ^
+------------+ | |
| db-server | <-------------+ |
+------------+ |
^ |
+-----------------------------------------------+
We do this through an iterative process named transitive closure. Transitive closure consists of a series of steps applied in an iterative fashion, until no more answers are possible. In other words, the algorithm is defined in terms of its behavior at each “time step.”
-
The transitive closure (of links) at time 0 is simply the set of extensionally-specified (input) links.
-
To construct the transitive closure at time n+1, look at the transitive closure at time n. For any pair of links in that graph,
(x,y)
and(y,z)
, such that the intermediate nodey
is matching, draw a (possibly new) link betweenx
andz
,(x,z)
. -
Repeat this process until no new links are found. I.e., until the transitive closure at some time n is equal to the transitive closure at time n+1
This process necessarily terminates, as long as you are careful to ensure elements are not added twice (which will be easy using the correct datastructures, namely sets and hashes). This is because there are a finite number of nodes, and so the “worst-case” scenario would be that every node was connected to every other (as it is in the above graph). At each step of the process, we add more information to (monotonically) increase our knowledge base (of transitive links). At some point we will either (a) not add any new links or (b) add all possible links, at which point no more links may be added and we terminate.
A generalized version of this reasoning gives us the Knaster-Tarski fixed-point theorem and the Klenne fixed-point theorem.
Hashes
We will represent graphs of links as Racket hashes
. We strongly
encourage you to read their documentation. You
will need to know at least the following key functions: (hash key0 value0
...)
, (hash-ref key)
, (hash-set hsh key value)
, and (hash-keys
hsh)
. Note: you are specifically forbidden from using
make-hash
, hash-set!
, and other mutable variants of hashes.
To represent a graph as a hash, we will adopt the following
representation: keys will be strings, and their associated values will
be sets of strings, manipulated using Racket’s set
s (whose
documentation you should read
here. For example,
the first example graph would be represented as
(define x (hash "node0-name" (set "node1-name")
"node1-name" (set "node2-name")))
To add a new link to the graph, we must be careful to use set-add
to
extend the set of nodes. For example, consider that we wanted to
extend the above graph with a pointer from node0-name
to
node3-name
, we would do it via the following:
(hash-set x "node0-name" (set-add (hash-ref x "node0-name") "node3-name"))
Make sure you understand in the above code how set-add
and
hash-ref
are used in combination to ensure that no
previously-present nodes are dropped.
Tasks
You will implement several functions, I have ranked them roughly in order of difficulty.
-
(parse-line l)
– You will parse an input line given as a string and you will transform it into an output that conforms toline?
. Hint: usestring-split
and matching. (Difficulty: easy/medium) -
(forward-link? graph n0 n1)
– Check whether there is a forward link fromn0
ton1
in the graphgraph
. Return#t
iffn1
is linked to from (pointed at by)n0
. (Difficulty: easier) -
(add-link graph from to)
– Add an “edge” in the graph from nodefrom
to nodeto
. Hint: usehash-ref
,set-add
, andhash-set
to accomplish this. (Difficulty: easier) -
(build-init-graph input)
– Assume thatinput
is a program given as input. You will build up an initial graph datastructure corresponding to the program. Essentially, you are building an initial graph upon which you will subsequently perform iterative rounds of transitive closure. You will do this by looking at each line of input and changing the hash in one of two ways:-> If you see a
(node <n>)
command, you will add a self link betweenn
and itself. -> If you see an(link <from> <to>)
command, you will insert an edge fromfrom
toto
.(Difficulty: medium)
-
(transitive-closure graph)
– Perform the transitive closure of the graphgraph
. Your solution must be the final answer of transitive closure, not just a single iteration. In other words, your solution must possess the property that there are no additional links possible to add. (Difficulty: harder)
Implementing Transitive Closure
There are a variety of algorithms to implement transitive closure. I
will describe the so-called “naive” approach here, and then sketch a
more refined (semi-naive) approach. I would recommend you implement
your code using a combination of either recursive functions or the
functions foldl/r
. My solution is roughly ten lines and uses foldl
three times.
Here is an iterative algorithm for calculating the transitive closure of a graph.
-
Proceeding one iteration at a time, accumulate a variable
solution
-
At time step zero, initialize
solution
to be the graph of initial links. -
To obtain the next time step, iterate over each edge in
solution
,(x,y)
(this can be done usinghash-keys
and a recursive function orfoldl
)- For each of these edges,
(x,y)
identify the set of edges starting fromy
,(y,z)
, add(x,z)
to the graph (this can be done usinghash-ref
, which will return a set: you can iterate over that set as a list by usingset->list
). Hint: you may want to callhash-ref
with a third argument to specify a “default” value of the empty set (e.g.,(hash-ref h key (set))
).
- For each of these edges,
-
At some point this process will add no new edges, and
solution
at timestepn
will be the same assolution
at timestampn+1
. When this happens, the search is over.
As an example, let’s see what happens on the following
NODE node0
NODE node1
NODE node2
LINK node0 node1
LINK node1 node2
-
We start with
solution
being the graph{node0 -> {node1}, node1 -> {node2}}
-
We examine edge
(node0,node1)
:- Consider the set of edges beginning with
node1
: this is just the single edge(node1,node2)
. We add it to the graph, and now our graph is{node0 -> {node1, node2}, node1 -> {node2}}
- Consider the set of edges beginning with
-
We examine edge
(node1,node2)
:- Consider the set of edges beginning with
node2
: there are no such edges, so the graph doesn’t change.
- Consider the set of edges beginning with
-
At the end of the first iteration (time step), the graph is
{node0 -> {node1, node2}, node1 -> {node2}}
. -
During the next iteration, we reexamine
node0
andnode1
, but we don’t discover any new edges, so the graph remains the same. -
We’re done: the final answer is
{node0 -> {node1, node2}, node1 -> {node2}}
All in all my reference solution uses roughly 14 lines of code and
uses foldl
several times. I predict reasonable solutions will be
between 10 and 50 lines, if you find yourself doing significantly more
work please consult the instructors to ensure you’ve got the algorithm
down.
Bonus Tests
There are three bonus tests: bonus-add-edge-3
,
bonus-transitive-closure-ring100
, and
bonus-transitive-closure-ring200
. The difference between these tests
and the normal (secret) tests is that they operate over much larger
graphs (the last two are ring graphs of size 200 and 300). To pass
these tests, ensure you don’t add extra algorithmic overhead to your
solution. The normal secret tests use graphs up to 30 nodes in
size. In sum, you can earn roughly 18% bonus.
Testing
Once you implement all of your work, you will unlock the ability to
run connectivity.rkt
with an input file. The demo
function canonicalizes
the database and prints out the starting database and its transitive closure:
racket connectivity.rkt demo/1.net "CONNECTED n13 n52"
This will allow you to see your code in action. To run the testing
infrastructure on your code, use tester.py
. It is invoked as
follows:
python3 tester.py -av
You may wish to add more example networks in the demos
folder.
Submitting your code for testing
NOTE Before you can submit your project for grading you must git add, commit, and push. On a terminal (in your project directory) type:
# Add all files in the directory
git add .
# Make a commit
git commit -m "my commit message here"
# Push to server
git push
Once you have done a git commit and push, go to the autograder and select for your project to be graded.
Starter Code
For those of you not currently enrolled in the class
;; Honor Pledge (bottom too) FILL IN NAME:
;;
;; I _____ submit this assignment as my own work
#lang racket
(provide (all-defined-out))
;;
;; CIS352 (Fall 22) Project 2 -- Network Connectivity
;;
;; To see the demo, invoke using:
;; racket connectivity.rkt <input-file>.net
;; racket connectivity.rkt <input-file>.net "CONNECTED <from> <to>"
;; Lines are pared into an intermediate representation satisfying the
;; line? predicate.
(define (line? l)
(match l
[`(node ,(? string? node-name)) #t]
[`(link ,(? string? from-node) ,(? string? to-node)) #t]
[_ #f]))
;; The input format is a list of line?s
(define (input-format? lst)
(and (list? lst)
(andmap line? lst)))
;; A graph? is a hash table whose keys are strings and whose values
;; are sets of strings.
(define (graph? gr) (and (hash? gr)
(immutable? gr)
(andmap string? (hash-keys gr))
(andmap (lambda (key) (andmap string? (set->list (hash-ref gr key))))
(hash-keys gr))))
;;
;; BEGIN PROJECT BELOW
;;
;; TODO
;; Parse a line of text input. Lines will have the following format:
;; NODE <node-name>
;; LINK <node-name> <node-name>
;;
;; Hint: use string-split and match, make sure to produce something
;; that adheres to `line?`.
(define/contract (parse-line l)
(-> string? line?)
;; TODO TODO TODO
'todo)
;; starter code
;; read a file by mapping over its lines
(define/contract (read-file f)
(-> string? input-format?)
(map parse-line (file->lines f)))
;; TODO
;; Input is a list of line? commands. Write a recursive function which
;; builds up a hash.
;;
;; - If it's a `node` command, add a link from a node to itself.
;; - If it's a `link` command, add a directional link as specified.
;;
;; Hint: use (hash), (set n), hash-set, set-add, hash-ref, and similar.
(define/contract (build-init-graph input)
(-> input-format? graph?)
;; TODO TODO TODO
(hash))
;; TODO
;; Check whether or not there is a forward line from n0 to n1 in
;; graph.
;;
;; Hint: use set-member? and hash-ref
(define (forward-link? graph n0 n1)
'todo)
;; TODO
;; Add a directed link (from,to) to the graph graph, return the new graph with
;; the additional link.
;;
;; Hint: use hash-set, hash-ref, and set-add.
(define (add-link graph from to)
'todo)
;; TODO
;; Perform the transitive closure of the graph. This is the most challenging
;; operation in the project, so we recommend putting it off until the end.
;;
;; To perform the transitive closure of the graph, iteratively add links
;; whenever you find a matching (x,y) and (y,z). This can be done in one of
;; two broad ways: (a) chaotic iteration or (b) semi-naive evaluation.
;; Read the project description for more details and hints at a solution.
;;
;; My solution uses `foldl`, `hash-keys`, `set->list`, `hash-ref`, and
;; `add-link`. It is always possible to use a recursive helper function instead
;; of a foldl, but it makes the code much easier to understand in my opinion.
(define (transitive-closure graph)
'todo)
;;
;; END PROJECT CODE, DO NOT TOUCH BELOW
;;
;; Print a DB
(define (print-db db)
(for ([key (sort (hash-keys db) string<?)])
(displayln (format "Key ~a:" key))
(displayln (string-append " " (string-join (sort (set->list (hash-ref db key)) string<?) ", ")))))
(define (demo file query)
(define ir (read-file file))
(define initial-db (build-init-graph ir))
(displayln "The input is:")
(print-db initial-db)
(displayln "Now running transitive closure...")
(define final-db (transitive-closure initial-db))
(displayln "Transitive closure:")
(print-db final-db)
(unless (equal? query "")
(match (string-split query)
[`("CONNECTED" ,n0 ,n1)
(if (forward-link? final-db n0 n1)
(displayln "CONNECTED")
(displayln "DISCONNECTED"))])))
(match-define (cons file query)
(command-line
#:program "connectivity.rkt"
#:args ([filename ""] [query ""])
(cons filename query)))
;; if called with a single argument, this racket program will execute
;; the demo.
(if (not (equal? file "")) (demo file query) (void))