Study Guide: Iterators

Instructions

This is a study guide with links to past lectures, assignments, and handouts, as well as additional practice problems to assist you in learning the concepts.

Assignments

Important: For solutions to these assignments once they have been released, see the main website

Handouts

Lectures

Readings

Guides

Iteration

Iterators generalize a common pattern in computation. Often, we want to get all the values from a compound value like the list ['c', 's', '6', '1', 'a'].

We can write a program that prints the values in the list with a while loop.

lst = ['c', 's', '6', '1', 'a']
i = 0
while i < len(lst):
    value = lst[i]
    i += 1
    print(value)

Or, as we've seen before as well, a for loop.

lst = ['c', 's', '6', '1', 'a']
for value in lst:
    print(value)

The for loop has the advantage of being much more compact and easy to read as we no longer need to keep track of the index as we iterate through the list.

If we think of the two bits of code above as being equivalent, we can see that the for loop actually hides a lot of complexity for us. It is somehow able to initialize the starting index, i = 0, then check the length of the list to determine whether we've stepped past the list's bounds, and moving the index forward while assign the next value in the list.

What's more, the for loop can event iterate over other sequence types too, like dictionaries and strings, not just lists.

>>> for key in {1: 'first value', 2: 'second value'}:
...     print(key)
...
1
2
>>> for character in 'cs61a':
...     print(character)
...
c
s
6
1
a

Iteration Abstraction

We can think of iterating with a for loop as a kind of abstraction. We don't really need to know how to index into a list, dictionary, or string to for loop over their values because the for loop hides the exact implementation details from us. It's also a generalization: we can store our data in any sequence type and iterate over it using the same for loop without having to worry too much about the underlying implementation.

But the for loop, like the Python interpreter, is not magic. We'd like to understand how this structure works.

Iterators

In Python, iterators are the objects which help implement for loops. In the list example, iterators do the entire job of knowing where to start the iteration, checking to see if there are any more elements in the list, and moving the index forward while returning the next value.

Iterators do all of this work in a magic method called __next__. Given an iterator, we can ask for its next value by calling the built-in next function on it, which really just invokes the __next__ method in the iterator.

Iterables

The example of a list iterator only makes sense in the context of the underlying list of values. But we often work with structures that aren't just lists: we could work with dictionaries and strings as well. These structures are more generally categorized as iterables because they can be iterated over.

Iterables have an __iter__ method which we can call with the built-in iter function to return an iterator.

>>> iterable = ['c', 's', '6', '1', 'a']
>>> iterator = iter(iterable)
>>> next(iterator)
'c'

How does the iterator know how to stop?

>>> next(iterator)
's'
>>> next(iterator)
'6'
>>> next(iterator)
'1'
>>> next(iterator)
'a'
>>> next(iterator)
StopIteration

A StopIteration exception is raised when there are no more values to return from calling next. This is how the for loop knows how to move on. We can convert the full for loop to a while loop that looks roughly like this.

iterator = iter(iterable)
try:
    while True:
        item = next(iterator)
        print(item)
except StopIteration:
    pass

Remember that the iterator is responsible for determining the stop condition, and that stopping condition is conveyed to the loop by the StopIteration exception.

Generators

Generator functions allow us to easily declare iterators. In CS 61A, they'll be the primary means by which we define iterators.

While generator functions look a lot like normal functions in Python, they behave quite differently. While a normal function has a return statement to define the value returned when the function is called, generator functions always return generator objects. The generator object is the iterator which keeps track of state and remembers its place during execution.

A commonly-used built-in that behaves like a generator function is zip.

>>> zip
<class 'zip'>           # zip is not a generator, but it behaves similar to one
>>> pairs = zip([1, 2, 3], [4, 5, 6])
>>> pairs
<zip object at 0x...>   # zip returns a zip object (an iterator similar to a generator object)
>>> next(pairs)
(1, 4)
>>> for pair in pairs:
...     print(pair)
...
(2, 5)
(3, 6)

Generator functions are defined using the def statement, but we use the new keyword, yield, to tell them apart from regular functions. For example, we can define a generator function that, given an iterable, yields the running sum of values in the iterable.

def running_sum(iterable):
    total = 0
    for value in iterable:
        total += value
        yield total

We can then use the running_sum program in the following way.

>>> rs = running_sum([1, 2, 3, 4, 5])
>>> next(rs)
1
>>> next(rs)
3
>>> for n in rs:
...     print(n)
...
6
10
15

When the generator object is first created, we start at the top of the body of the function. Then, when we call next on the generator, we start executing the body of the function until we yield the first value back to the caller. The generator object pauses, remembering its current state and waiting until the caller asks for the next value. After all the values are yielded from the generator, the next time the caller calls next will cause a StopIteration.

Practice Problems

Easy

Q1: Character generator

Write a generator that outputs each character of a string.

def char_gen(s):
    """
    >>> for char in char_gen("hello"):
    ...     print(char)
    ...
    h
    e
    l
    l
    o
    """
"*** YOUR CODE HERE ***"
for char in s: yield char

Medium

Q2: Pairs (generator)

Write a generator function pairs that takes a list and yields all the possible pairs of elements from that list.

def pairs(lst):
    """
    >>> type(pairs([3, 4, 5]))
    <class 'generator'>
    >>> for x, y in pairs([3, 4, 5]):
    ...     print(x, y)
    ...
    3 3
    3 4
    3 5
    4 3
    4 4
    4 5
    5 3
    5 4
    5 5
    """
"*** YOUR CODE HERE ***"
for i in lst: for j in lst: yield i, j