Lab 11: Iterators, Generators, and Streams

Due at 11:59pm on 4/15/2015.

Starter Files

Download lab11.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

Submission

By the end of this lab, you should have submitted the lab with python3 ok --submit. You may submit more than once before the deadline; only the final submission will be graded.

Table of Contents

Iterators

Remember the for loop? (We really hope so.) The object that the for loop iterates over is required to be an iterable!

for elem in iterable:
    # do something

for loops only work with iterables, and that means that the object you want to use a for loop on must implement the iterable interface. In particular, a for loop makes use of the __next__ method: __iter__ and __next__. In other words, an object that implements the iterable interface must implement an __iter__ method that returns an object that implements the __next__ method.

Here is a table summarizing the required methods of the iterable and iterator interfaces/protocols. Python also has more documentation about iterator types.

Iterator Iterable
__iter__ __iter__
__next__ -----

This object that implements the __next__ method is called an iterator. While the iterator interface also requires that the object implement the __next__ and __iter__ methods, it does not require the __iter__ method to return a new object — just itself (with state about its current position).

One analogy: an iterable is a book (one can flip through the pages) and an iterator is a bookmark (saves the position and can then locate the next page).

Here is an example of a class definition for an object that implements the iterator interface:

class AnIterator:
    def __init__(self):
        self.current = 0

    def __next__(self):
        if self.current > 5:
            raise StopIteration
        self.current += 1
        return self.current

    def __iter__(self):
        return self

Let's go ahead and try out our new toy.

>>> for i in AnIterator():
...     print(i)
...
1
2
3
4
5
6

This is somewhat equivalent to running:

t = AnIterator()
t = t.__iter__()
try:
    while True:
        print(t.__next__())
except StopIteration as e:
    pass

Question 1

Try running each of the given iterators in a for loop. Why does each work or not work?

class IteratorA:
    def __init__(self):
        self.start = 5

    def __next__(self):
        if self.start == 100:
            raise StopIteration
        self.start += 5
        return self.start

    def __iter__(self):
        return self

No problem, this is a beautiful iterator.

class IteratorB:
    def __init__(self):
        self.start = 5

    def __iter__(self):
        return self

Oh no! Where is __next__? This fails to implement the iterator interface because calling __iter__ doesn't return something that has a __next__ method.

class IteratorC:
    def __init__(self):
        self.start = 5

    def __next__(self):
        if self.start == 10:
            raise StopIteration
        self.start += 1
        return self.start

This also fails to implement the iterator interface. Without the __iter__ method, the for loop will error. The for loop needs to call __iter__ first because some objects might not implement the __next__ method themselves, but calling __iter__ will return an object that does.

class IteratorD:
    def __init__(self):
        self.start = 1

    def __next__(self):
        self.start += 1
        return self.start

    def __iter__(self):
        return self

Watch out on this one. The amount of output might scare you. (Feel free to type Ctrl-C to stop.)

This is an infinite sequence! Sequences like these are the reason iterators are useful. Because iterators delay computation, we can use a finite amount of memory to represent an infinitely long sequence.

Question 2

For one of the above iterators that works, try this:

>>> i = IteratorA()
>>> for item in i:
...     print(item)

Then again:

>>> for item in i:
...     print(item)

With that in mind, try writing an iterator that "restarts" every time it is run through a for loop.

class IteratorRestart:
    """
    >>> i = IteratorRestart(2, 7)
    >>> for item in i:
    ...     print(item)
    2
    3
    4
    5
    6
    7
    >>> for item in i:
    ...     print(item)
    2
    3
    4
    5
    6
    7
    """
    def __init__(self, start, end):
"*** YOUR CODE HERE ***"
self.start = start self.end = end self.current = start
def __next__(self):
"*** YOUR CODE HERE ***"
if self.current > self.end: raise StopIteration self.current += 1 return self.current - 1
def __iter__(self):
"*** YOUR CODE HERE ***"
self.current = self.start return self

Generators

A generator is a special type of iterator that can be written using a yield statement:

def <generator_function>():
    <somevariable> = <something>
    while <predicate>:
        yield <something>
        <increment variable>

A generator function can also be run through a for loop:

def generator():
    i = 0
    while i < 6:
        yield i
        i += 1

for i in generator():
    print(i)

To better figure out what is happening, try this:

def generator():
    print("Starting here")
    i = 0
    while i < 6:
        print("Before yield")
        yield i
        print("After yield")
        i += 1
>>> g = generator()
>>> g # what type of object is this?
______
<generator object ...>
>>> iter(g)
______
<generator object ...>
>>> next(g)
______
Starting here Before yield 0
>>> next(g)
______
After yield Before yield 1

Trace through the code and make sure you know where and why each statement is printed.

You might have noticed from the Iterators section that the Iterator defined without a __next__ method failed to run in the for loop. However, this is not always the case.

class IterGen:
    def __init__(self):
        self.start = 5

    def __iter__(self):
        while self.start < 10:
            self.start += 1
            yield self.start

for i in IterGen():
    print(i)

Think for a moment about why that works.

Think more.

Longer.

Okay, I'll tell you.

The for loop only expects the object returned by __iter__ to have a __next__ method, and the __iter__ method is a generator function in this case. Therefore, when __iter__ is called, it returns a generator object, which you can call __next__ on.

Question 3

Write a generator that counts down to 0.

Write it in both ways: using a generator function on its own, and within the __iter__ method of a class.

def countdown(n):
    """
    >>> for number in countdown(5):
    ...     print(number)
    ...
    5
    4
    3
    2
    1
    0
    """
"*** YOUR CODE HERE ***"
while n >= 0: yield n n = n - 1
class Countdown:
    """
    >>> for number in Countdown(5):
    ...     print(number)
    ...
    5
    4
    3
    2
    1
    0
    """
"*** YOUR CODE HERE ***"
def __init__(self, cur): self.cur = cur def __iter__(self): while self.cur > 0: yield self.cur self.cur -= 1

Question 4

Write a generator that outputs the hailstone sequence from homework 1.

def hailstone(n):
    """
    >>> for num in hailstone(10):
    ...     print(num)
    ...
    10
    5
    16
    8
    4
    2
    1
    """
"*** YOUR CODE HERE ***"
i = n while i > 1: yield i if i % 2 == 0: i //= 2 else: i = i * 3 + 1 yield i

Extra Questions

The following questions are for extra practice — they can be found in the lab11_extra.py file. It is recommended that you complete these problems as well, but you do not need to turn them in for credit.

Streams

A stream is another example of a lazy sequence. A stream is a lazily evaluated Linked List. In other words, the stream's elements (except for the first element) are only evaluated when the values are needed.

Take a look at the following code:

class Stream:
    class empty:
        def __repr__(self):
            return 'Stream.empty'

    empty = empty()

    def __init__(self, first, compute_rest=lambda: Stream.empty):
        assert callable(compute_rest), 'compute_rest must be callable.'
        self.first = first
        self._compute_rest = compute_rest

    @property
    def rest(self):
        """Return the rest of the stream, computing it if necessary."""
        if self._compute_rest is not None:
            self._rest = self._compute_rest()
            self._compute_rest = None
        return self._rest

    def __repr__(self):
        return 'Stream({0}, <...>)'.format(repr(self.first))

We represent Streams using Python objects, similar to the way we defined Linked Lists. We nest streams inside one another, and compute one element of the sequence at a time.

Note that instead of specifying all of the elements in __init__, we provide a function, compute_rest, that encapsulates the algorithm used to calculate the remaining elements of the stream. Remember that the code in the function body is not evaluated until it is called, which lets us implement the desired evaluation behavior.

This implementation of streams also uses memoization. The first time a program asks a Stream for its rest field, the Stream code computes the required value using compute_rest, saves the resulting value, and then returns it. After that, every time the rest field is referenced, the stored value is simply returned and it is not computed again.

Here is an example:

def make_integer_stream(first=1):
    def compute_rest():
        return make_integer_stream(first+1)
    return Stream(first, compute_rest)

Notice what is happening here. We start out with a stream whose first element is 1, and whose compute_rest function creates another stream. So when we do compute the rest, we get another stream whose first element is one greater than the previous element, and whose compute_rest creates another stream. Hence, we effectively get an infinite stream of integers, computed one at a time. This is almost like an infinite recursion, but one which can be viewed one step at a time, and so does not crash.

Question 5

Write a procedure add_streams that takes in two streams s1, s2, and returns a new stream that is the result of adding elements from s1 by elements from s2. For instance, if s1 was (1, 2, 3, ...) and s2 was (2, 4, 6, ...), then the output would be the stream (3, 6, 9, ...). You can assume that both Streams are infinite.

def add_streams(s1, s2):
    """Returns a stream that is the sum of s1 and s2.

    >>> stream1 = make_integer_stream()
    >>> stream2 = make_integer_stream(9)
    >>> added = add_streams(stream1, stream2)
    >>> added.first
    10
    >>> added.rest.first
    12
    >>> added.rest.rest.first
    14
    """
"*** YOUR CODE HERE ***"
def compute_rest(): return add_streams(s1.rest, s2.rest) return Stream(s1.first + s2.first, compute_rest)

Question 6

Write a procedure make_fib_stream() that creates an infinite stream of Fibonacci Numbers. Make the first two elements of the stream 0 and 1.

Hint: Consider using a helper procedure that can take two arguments, then think about how to start calling that procedure.

def make_fib_stream():
    """Return a stream containing the Fib sequence.

    >>> fib = make_fib_stream()
    >>> fib.first 
    0
    >>> fib.rest.first
    1
    >>> fib.rest.rest.rest.rest.first
    3
    """
"*** YOUR CODE HERE ***"
return fib_stream_generator(0, 1) def fib_stream_generator(a, b): def compute_rest(): return fib_stream_generator(b, a+b) return Stream(a, compute_rest)

If the add_streams function has been defined, we can wrte make_fib_stream this way instead:

def make_fib_stream():
    def compute_rest():
        return add_streams(make_fib_stream(), make_fib_stream().rest)
    return Stream(0, lambda: Stream(1, compute_rest))

Higher Order Functions on Streams

Naturally, as the theme has always been in this class, we can abstract our stream procedures to be higher order. Take a look at filter_stream:

def filter_stream(filter_func, stream):
    def make_filtered_rest():
        return filter_stream(filter_func, stream.rest)
    if Stream.empty:
        return stream
    elif filter_func(stream.first):
        return Stream(stream.first, make_filtered_rest)
    else:
        return filter_stream(filter_funct, stream.rest)

You can see how this function might be useful. Notice how the Stream we create has as its compute_rest function a procedure that promises to filter out the rest of the Stream when asked. So at any one point, the entire stream has not been filtered. Instead, only the part of the stream that has been referenced has been filtered, but the rest will be filtered when asked. We can model other higher order Stream procedures after this one, and we can combine our higher order Stream procedures to do incredible things!

Question 7

What does the following Stream output? Try writing out the first few values of the stream to see the pattern.

def map_stream(fn, s):
    if s is Stream.empty:
        return s
    def compute_rest():
        return map_stream(fn, s.rest)
    return Stream(fn(s.first), compute_rest)

def my_stream():
    def compute_rest():
        return add_streams(map_stream(lambda x: 2 * x,
                                      my_stream()),
                                      my_stream())
    return Stream(1, compute_rest)

Powers of 3: 1, 3, 9, 27, 81, ...

Question 8

Define a function interleave, which takes in two streams:

and returns the new stream

a1, b1, a2, b2, a3, b3, ...

If either of the inputs is finite, the output stream should be finite as well, terminating just at the point where it would be impossible to go on. If both of the inputs are infinite, the output stream should be infinite as well.

def interleave(stream1, stream2):
    """Return a stream with alternating values from stream1 and stream2.

    >>> ints = make_integer_stream(1)
    >>> fib = make_fib_stream()
    >>> alternating = interleave(ints, fib)
    >>> alternating.first
    1
    >>> alternating.rest.first
    0
    >>> alternating.rest.rest.first
    2
    >>> alternating.rest.rest.rest.first
    1
    """     
"*** YOUR CODE HERE ***"
if stream1 is Stream.empty: return Stream.empty return Stream(stream1.first, lambda: interleave(stream2, stream1.rest))