CS 164 Section 8: Scoping and Coroutines

Scoping

This section was supposed to be midterm review, but there weren't many questions about older material. So we instead reviewed two newer topics, scoping and continuations.

Let us first discuss the difference between lexical and dynamic scoping. Dynamic scoping walks up the call stack to find variables. When a variable is accessed, a language with dynamic scoping will walk up the call stack looking for the closest variable with the same name and use that. Lexical scoping is instead based entirely on the source code. When a variable is accessed, the program walks up the scopes from where the variable is defined (e.g. the enclosing functions) and looks for the closest variable with the same name. Dynamic scoping is hard to use because it's not very compositional. If you define a function that refers to variables it does not define itself and pass it to a library function, you have no idea whether that library function will define variables that your function will use instead of the ones you wish (see below for an example). For this reason, most languages use lexical scoping.

Now let's consider how we implement lexical scoping. At the moment a function is defined, we have to store the current environment. That way when we refer to it later on, we remember the environment at the definition and reference the correct variable.

But how do we store this environment? Can we, for example, store it on the stack? For example, consider the following piece of code:

	def main():
	  list = ...
	  scores = ...
	  def cmp(x, y): return scores[x] < scores[y]
	  sort(list, cmp)

(This code is also a good way to compare dynamic and lexical scoping. If sort defines its own variable named scores, this code will not work as likely intended under dynamic scoping.) Under lexical scoping, the cmp function has to refer to the scores variable defined just outside it in main. Can we, instead of making a copy of the environment, simply remember the offset in the stack of main's scores variable and use that?

In this case such an approach would work, but not in general. For instance, consider the following code:

	def main():
	  def addX(x): return lambda y: return x + y
	  add42 = addX(42)
	  print add42(0)

Here, main will call addX with the argument 42. This will create a new entry on the stack for addX where x=42. However, once addX returns, this entry will vanish from the stack. When we call add42, this value for x will not exist on the stack anymore, so the method presented above will work. We thus need to store the environment separately on the heap so we can access it later.

Coroutines

Let's consider a different problem now. Say we want to write code to match regular expressions but without writing a parser or doing backtracking recursion. Specifically, say we want to write something like

	# For the regex (abc|de)x
	patt = seq(alt(prim("abc"), prim("de")), prim("x"))
	match(s, patt)

This is a relatively intuitive way to encode regular expressions if you don't have a parser. Now how will we implement this match function? Well, there are multiple ways the pattern it's given might match, and we might need to check all of them to find one that matches the entire string. We can thus write match something like the following. Note that all the code I write here is just Python-like pseudocode; it is not valid Python code.

	def match(s, patt):
	  matches = create_coroutine(lambda: patt(s, 1))
	  for pos in matches:
	    if pos == len(s)+1:
	      return True
	  return False

This nicely hides the backtracking structure from us: we simply iterate over the possible ways to match the regular expression and look at them. But now how do we implement the seq, alt, and prim functions?

	def prim(str):
	  return lambda s, pos:
	    if pos+len(str) < len(s) and s[pos:pos+len(str)] == str:
	      yield pos + len(str)
	
	def seq(patt1, patt2):
	  return lambda s, pos:
	    matches1 = create_coroutine(lambda : patt1(s, pos))
	    for npos in matches1:
	      patt2(s, npos)
	
	def alt(patt1, patt2):
	  return lambda s, pos:
	    patt1(s, pos)
	    patt2(s, pos)

This code uses coroutines to generate all possible matches of the regular expression Each of these functions returns a lambda that takes in the string and position: after all, when we first called them we only had the regex, not the string. The prim function's lambda checks if the given primitive is at the given position in the string. If so, it yields the result back to the caller, and if not, it yields nothing. Note that this avoids ugly checks for failure in anyone who calls it. The seq function uses a coroutine to find all possible places its first argument could end and calls the second pattern starting on them. The pattern for this second function will yield any values it finds, and since seq does not wrap this call in a coroutine, it will yield back to its caller. The alt function simply calls both of its alternatives, which will each yield all of their results back to alt's caller.

Let us trace through a run of this program on the input "dex". We will first call match. Match creates a coroutine and resumes it. This coroutine is seq, which will create another coroutine and resume it. This coroutine is alt, which calls its first parameter. This parameter is prim("abc"), which looks for "abc" at the beginning of the string. It does not find it, so it returns back to its caller, alt. Alt then calls its next alternative, prim("de"). This time prim finds "de" at the beginning of the string, so it yields 2 back to whoever created the coroutine, which is seq in this case. Then seq calls its second parameter on position 2. This calls prim("x") on position 2, which finds a match and yields 3. Seq did not wrap this in a coroutine, so this yields back to its caller, match. Match finds that this matches the entire string, so it returns true.