Lab 11: Macros, Tail Recursion, Regular Expressions
Due by 11:59pm on Tuesday, August 3.
Starter Files
Download lab11.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the Ok autograder.
Topics
Consult this section if you need a refresher on the material for this lab. It's okay to skip directly to the questions and refer back here should you get stuck.
So far we've been able to define our own procedures in Scheme using the
define
special form. When we call these procedures, we have to follow
the rules for evaluating call expressions, which involve evaluating all the
operands.
We know that special form expressions do not follow the evaluation rules of call expressions. Instead, each special form has its own rules of evaluation, which may include not evaluating all the operands. Wouldn't it be cool if we could define our own special forms where we decide which operands are evaluated? Consider the following example where we attempt to write a function that evaluates a given expression twice:
scm> (define (twice f) (begin f f))
twice
scm> (twice (print 'woof))
woof
Since twice
is a regular procedure, a call to twice
will
follow the same rules of evaluation as regular call expressions; first we
evaluate the operator and then we evaluate the operands. That means that
woof
was printed when we evaluated the operand (print 'woof)
.
Inside the body of twice
, the name f
is bound to the value
undefined
, so the expression (begin f f)
does nothing at all!
The problem here is clear: we need to prevent the given expression from
evaluating until we're inside the body of the procedure. This is where the
define-macro
special form, which has identical syntax to the regular
define
form, comes in:
scm> (define-macro (twice f) (list 'begin f f))
twice
define-macro
allows us to define what's known as a macro
,
which is simply a way for us to combine unevaluated input expressions together
into another expression. When we call macros, the operands are not evaluated,
but rather are treated as Scheme data. This means that any operands that are
call expressions or special form expression are treated like lists.
If we call (twice (print 'woof))
, f
will actually be bound to
the list (print 'woof)
instead of the value undefined
.
Inside the body of define-macro
, we can insert these expressions into
a larger Scheme expression. In our case, we would want a begin
expression that looks like the following:
(begin (print 'woof) (print 'woof))
As Scheme data, this expression is really just a list containing three
elements: begin
and (print 'woof)
twice, which is exactly
what (list 'begin f f)
returns. Now, when we call twice
,
this list is evaluated as an expression and (print 'woof)
is evaluated
twice.
scm> (twice (print 'woof))
woof
woof
To recap, macros are called similarly to regular procedures, but the rules for evaluating them are different. We evaluated lambda procedures in the following way:
- Evaluate operator
- Evaluate operands
- Apply operator to operands, evaluating the body of the procedure
However, the rules for evaluating calls to macro procedures are:
- Evaluate operator
- Apply operator to unevaluated operands
- Evaluate the expression returned by the macro in the frame it was called in.
# Python
def fact(n):
if n == 0:
return 1
return n * fact(n - 1)
# Scheme
(define (fact n)
(if (= n 0)
1
(* n (fact (- n 1)))))
Notice that in this version of factorial
, the return expressions both
use recursive calls, and then use the values for more "work." In other
words, the current frame needs to sit around, waiting for the recursive
call to return with a value. Then the current frame can use that value
to calculate the final answer.
As an example, consider a call to fact(5)
(Shown with Scheme below).
We make a new frame for the call, and in carrying out the body of the
function, we hit the recursive case, where we want to multiply 5 by the
return value of the call to fact(4)
. Then we want to return this
product as the answer to fact(5)
. However, before calculating this
product, we must wait for the call to fact(4)
. The current frame
stays while it waits. This is true for every successive recursive
call, so by calling fact(5)
, at one point we will have the frame of
fact(5)
as well as the frames of fact(4)
, fact(3)
, fact(2)
, and
fact(1)
, all waiting for fact(0)
.
(fact 5)
(* 5 (fact 4))
(* 5 (* 4 (fact 3)))
(* 5 (* 4 (* 3 (fact 2))))
(* 5 (* 4 (* 3 (* 2 (fact 1)))))
(* 5 (* 4 (* 3 (* 2 (* 1 (fact 0))))))
(* 5 (* 4 (* 3 (* 2 (* 1 1)))))
(* 5 (* 4 (* 3 (* 2 1))))
(* 5 (* 4 (* 3 2)))
(* 5 (* 4 6))
(* 5 24)
120
Keeping all these frames around wastes a lot of space, so our goal is to come up with an implementation of factorial that uses a constant amount of space. We notice that in Python we can do this with a while loop:
def fact_while(n):
total = 1
while n > 0:
total = total * n
n = n - 1
return total
However, Scheme doesn't have for
and while
constructs. No problem!
Everything that can be written with while and for
loops and also be
written recursively. Instead of a variable, we introduce a new
parameter to keep track of the total.
def fact(n):
def fact_optimized(n, total):
if n == 0:
return total
return fact_optimized(n - 1, total * n)
return fact_optimized(n, 1)
(define (fact n)
(define (fact-optimized n total)
(if (= n 0)
total
(fact-optimized (- n 1) (* total n))))
(fact-optimized n 1))
Why is this better? Consider calling fact(n)
on based on this
definition:
(fact 5)
(fact-optimized 5 1)
(fact-optimized 4 5)
(fact-optimized 3 20)
(fact-optimized 2 60)
(fact-optimized 1 120)
(fact-optimized 0 120)
120
Because Scheme supports tail-call optimization (note that Python does
not), it knows when it no longer needs to keep around frames because
there is no further calculation to do. Looking at the last line in
fact_optimized
, we notice that it returns the same thing that the
recursive call does, no more work required. Scheme realizes that there
is no reason to keep around a frame that has no work left to do, so it
just has the return of the recursive call return directly to whatever
called the current frame.
Therefore the last line in fact_optimized
is a tail-call.
Regular Expressions
Regular expressions are a way to describe sets of strings that meet certain criteria, and are incredibly useful for pattern matching.
The simplest regular expression is one that matches a sequence of characters, like aardvark
to match any "aardvark" substrings in a string.
However, you typically want to look for more interesting patterns. We recommend using an online tool like regexr.com for trying out patterns, since you'll get instant feedback on the match results.
Character classes
A character class makes it possible to search for any one of a set of characters. You can specify the set or use pre-defined sets.
Class | Description |
---|---|
[abc] |
Matches a, b, or c |
[a-z] |
Matches any character between a and z |
[^A-Z] |
Matches any character that is not between A and Z. |
\w |
Matches any "word" character. Equivalent to [A-Za-z0-9_] |
\d |
Matches any digit. Equivalent to [0-9] . |
[0-9] |
Matches a single digit in the range 0 - 9. Equivalent to \d |
\s |
Matches any whitespace character (spaces, tabs, line breaks). |
. |
Matches any character besides new line. |
Character classes can be combined, like in [a-zA-Z0-9]
.
Combining patterns
There are multiple ways to combine patterns together in regular expressions.
Combination | Description |
---|---|
AB |
A match for A followed immediately by one for B. Example: x[.,]y matches "x.y" or "x,y" |
A⎮B |
Matches either A or B. Example: \d+⎮Inf matches either a sequence containing 1 or more digits or "Inf" |
A pattern can be followed by one of these quantifiers to specify how many instances of the pattern can occur.
Quantifier | Description |
---|---|
* |
0 or more occurrences of the preceding pattern. Example: [a-z]* matches any sequence of lower-case letters or the empty string. |
+ |
1 or more occurrences of the preceding pattern. Example: \d+ matches any non-empty sequence of digits. |
? |
0 or 1 occurrences of the preceding pattern. Example: [-+]? matches an optional sign. |
{1,3} |
Matches the specified quantity of the preceding pattern. {1,3} will match from 1 to 3 instances. {3} will match exactly 3 instances. {3,} will match 3 or more instances. Example: \d{5,6} matches either 5 or 6 digit numbers. |
Groups
Parentheses are used similarly as in arithmetic expressions, to create groups. For example, (Mahna)+
matches strings with 1 or more "Mahna", like "MahnaMahna". Without the parentheses, Mahna+
would match strings with "Mahn" followed by 1 or more "a" characters, like "Mahnaaaa".
Anchors
^
- Matches the beginning of a string. Example:
^(I⎮You)
matches I or You at the start of a string.
- Matches the beginning of a string. Example:
$
- Normally matches the empty string at the end of a string or just before a newline at the end of a string. Example:
(\.edu|\.org|\.com)$
matches .edu, .org, or .com at the end of a string.
- Normally matches the empty string at the end of a string or just before a newline at the end of a string. Example:
\b
- Matches a "word boundary", the beginning or end of a word. Example:
s\b
matches s characters at the end of words.
- Matches a "word boundary", the beginning or end of a word. Example:
Special characters
The following special characters are used above to denote types of patterns:
\ ( ) [ ] { } + * ? | $ ^ .
That means if you actually want to match one of those characters, you have to escape it using a backslash. For example, \(1\+3\)
matches "(1 + 3)".
Using regular expressions in Python
Many programming languages have built-in functions for matching strings to regular expressions. We'll use the [Python re module] in 61A, but you can also use similar functionality in SQL, JavaScript, Excel, shell scripting, etc.
The search method searches for a pattern anywhere in a string:
re.search(r"(Mahna)+", "Mahna Mahna Ba Dee Bedebe")
That method returns back a match object, which is considered truth-y in Python and can be inspected to find the matching strings.
For more details, please consult the re module documentation or the re tutorial.
Required Questions
Macros
Q1: WWSD: Macros
One thing to keep in mind when doing this question, builtins get rendered as such:
scm> +
#[+]
scm> list
#[list]
If evaluating an expression causes an error, type
SchemeError
. If nothing is displayed, typeNothing
.Use Ok to test your knowledge with the following "What Would Scheme Display?" questions:
python3 ok -q wwsd-macros -u
scm> +
______#[+]
scm> list
______#[list]
scm> (define-macro (f x) (car x))
______f
scm> (f (2 3 4)) ; type SchemeError for error, or Nothing for nothing
______2
scm> (f (+ 2 3))
______#[+]
scm> (define x 2000)
______x
scm> (f (x y z))
______2000
scm> (f (list 2 3 4))
______#[list]
scm> (f (quote (2 3 4)))
______SchemeError
scm> (define quote 7000)
______quote
scm> (f (quote (2 3 4)))
______7000
scm> (define-macro (g x) (+ x 2))
______g
scm> (g 2)
______4
scm> (g (+ 2 3))
______SchemeError
scm> (define-macro (h x) (list '+ x 2))
______h
scm> (h (+ 2 3))
______7
scm> (define-macro (if-else-5 condition consequent) `(if ,condition ,consequent 5))
______if-else-5
scm> (if-else-5 #t 2)
______2
scm> (if-else-5 #f 3)
______5
scm> (if-else-5 #t (/ 1 0))
______SchemeError
scm> (if-else-5 #f (/ 1 0))
______5
scm> (if-else-5 (= 1 1) 2)
______2
Q2: Scheme def
Implement def
, which simulates a python def
statement, allowing you to write code like
(def f(x y) (+ x y))
.
The above expression should create a function with parameters x
and y
, and
body (+ x y)
, then bind it to the name f
in the current frame.
Note: the previous is equivalent to
(def f (x y) (+ x y))
.Hint: We strongly suggest doing the WWPD questions on macros first as understanding the rules of macro evaluation is key in writing macros.
(define-macro (def func args body)
'YOUR-CODE-HERE)
Use Ok to test your code:
python3 ok -q scheme-def
Tail Recursion
Q3: Replicate
Write a tail-recursive function that returns a list with x
repeated n
times.
scm> (tail-replicate 3 10)
(3 3 3 3 3 3 3 3 3 3)
scm> (tail-replicate 5 0)
()
scm> (tail-replicate 100 5)
(100 100 100 100 100)
(define (tail-replicate x n)
'YOUR-CODE-HERE
)
Use Ok to test your code:
python3 ok -q tail-replicate
Q4: Exp
We want to implement the exp
procedure. So, we write the following
recursive procedure:
(define (exp-recursive b n)
(if (= n 0)
1
(* b (exp-recursive b (- n 1)))))
Try to evaluate
(exp-recursive 2 (exp-recursive 2 10))
You will notice that it will cause a maximum recursion depth error. To fix this, we need to use tail recursion! Implement the exp
procedure using tail recursion:
(define (exp b n)
;; Computes b^n.
;; b is any number, n must be a non-negative integer.
)
Use Ok to test your code:
python3 ok -q exp
Regular Expressions
Q5: Roman Numerals
Write a regular expression that finds any string of letters that resemble a Roman numeral and aren't part of another word. A Roman numeral is made up of the letters I, V, X, L, C, D, M and is at least one letter long.
import re
def roman_numerals(text):
"""
Finds any string of letters that could be a Roman numeral
(made up of the letters I, V, X, L, C, D, M).
>>> roman_numerals("Sir Richard IIV, can you tell Richard VI that Richard IV is on the phone?")
['IIV', 'VI', 'IV']
>>> roman_numerals("My TODOs: I. Groceries II. Learn how to count in Roman IV. Profit")
['I', 'II', 'IV']
>>> roman_numerals("I. Act 1 II. Act 2 III. Act 3 IV. Act 4 V. Act 5")
['I', 'II', 'III', 'IV', 'V']
>>> roman_numerals("Let's play Civ VII")
['VII']
>>> roman_numerals("i love vi so much more than emacs.")
[]
>>> roman_numerals("she loves ALL editors equally.")
[]
"""
return re.findall(__________, text)
Use Ok to test your code:
python3 ok -q roman_numerals
Q6: Calculator Ops
Write a regular expression that parses strings written in the 61A Calculator language and returns any expressions which have two numeric operands, leaving out the parentheses around them.
import re
def calculator_ops(calc_str):
"""
Finds expressions from the Calculator language that have two
numeric operands and returns the expression without the parentheses.
>>> calculator_ops("(* 2 4)")
['* 2 4']
>>> calculator_ops("(+ (* 3 (+ (* 2 4) (+ 3 5))) (+ (- 10 7) 6))")
['* 2 4', '+ 3 5', '- 10 7']
>>> calculator_ops("(* 2)")
[]
"""
return re.findall(__________, calc_str)
Use Ok to test your code:
python3 ok -q calculator_ops
Submit
Make sure to submit this assignment by running:
python3 ok --submit