7 Week 7: REs, CFGs, and FORTH
We now begin our transition from learning about langauge features to discussing aspects of langauge implementation. First we focus on regular expressions, which allow us to recognize and capture textual patterns. Next, we move to context-free grammars. Context-free grammars form the basis for the syntax of most modern programming languages. The goal of this week is to explain the core components of CFGs so that next week we can understanding how parsing works. Last, because parsing will take a while to understand, we’ll learn about a langauge we can implement right now: FORTH. FORTH is a stack-based language that will build on the material we’ve already covered in previous labs.
7.1 Rubular tutorial
Rubular is a nice interactive tool that helps you develop and debug regular expressions. I highly recommend it. Please check out my video on using Rubular and Dave’s notes on using Rubular
7.2 Slides
Regular expressions: (Slides in Keynote) and (Slides in PDF)
RE foundations, CFG and FORTH intro: (Slides in Keynote) and (Slides in PDF)
More grammars, intro to parsing: (Slides in Keynote) and (Slides in PDF)
7.3 Practice Problems
Also consider using: https://regex101.com/
Problems from Monday / Wednesday:
- Write regular expressions that match the following:
The string "010"
Any number of as followed by any number of bs
Any number of as, followed by exactly two bs, followed by one or more cs
All of this in sequence: possibly a, followed by any number of bs, followed by one or more as
The set of valid US zipcodes (5 digit), the set of valid US extended zipcodes
Strings that are at least eight digits long and include alphanumeric characters and at least one of either asterisk or ampersand.
C identifiers may contain any number of digits or letters, and no other characters. Also, they must not begin with a number. Write a regex for C identifiers.
The set of valid email addresses ending in haverford.edu, brynmawr.edu, or swarthmore.edu
- Describe (in English prose) the following regular expressions:
a*(bc)*
(0|1)*1
(a|b|c*)b
Experiment(s?)
What are the components of a context-free grammar? Explain each component
Are regular expressions or CFGs more powerful (in the sense that they can capture a larger set of langauges)?
- Consider the following grammar (assume that variable stands in for x,y,etc... and number stands in for 1,2,etc...):
Expr -> variable := number
Expr -> Expr ; Expr
What are the nonterminal(s)? What are the terminals?
Describe in English the set of strings matched by the grammar
- Draw the derivation for the following. If a derivation is not possible, identify why.
x := 3; y := 4
x := 3; y := 4; z := 5
x := 3; y := 4; z := 5;
Is the grammar ambiguous? Why or why not?
For each of the valid derivations above, draw the corresponding parse tree.
- Write a CFG for the following english language description. Indicate the terminals, nonterminals, productions, and start symbol.
Consider the language of "useless parentheses." Any number is in the language of useless parenthesis. So, for example, 5 is an element of the language of useless parentheses. Furthermore, if S is in the language of useless parentheses, I can put parentheses around it, and that string is also in the language of useless parentheses. For example, "(5)" and "((5))" are in the language. Write the CFG for the language of useless parentheses.