61A Homework 7

Due by 11:59pm on Saturday, 7/19

Readings: You might find the following references useful:

This assignment is a bit different from the rest. For the first half, you'll be using dictionaries to create your own Shakespearean sentences. In the second half, you'll be diving into cryptography and using what you've learned this week to create a cypher. The entire second half of this assignment is optional. While it is good practice on the material for this week, you don't have to turn it in to get full points for this assignment. You worked very hard last week, so hopefully this homework will be a bit of a break.

Submission: See the online submission instructions. We have provided a hw7.py starter file for the questions below.

Dictionaries and Shakespeare

Question 1
Question 2

Creating a Cypher (Optional)

Question 3
Question 4
Question 5
Question 6

Dictionaries and Shakespeare

Now that you know how dictionaries work, we can approximate the entire works of Shakespeare! We're going to use a bigram language model. Here's the idea: We start with some word — we'll use "The" as an example. Then we look through all of the texts of Shakespeare and for every instance of "The" we record the word that follows "The" and add it to a list, known as the successors of "The". Now suppose we've done this for every word Shakespeare has used, ever.

Let's go back to "The". Now, we randomly choose a word from this list, say "cat". Then we look up the successors of "cat" and randomly choose a word from that list, and we continue this process. This eventually will terminate in a period (".") and we will have generated a Shakespearean sentence!

The object that we'll be looking things up in is called a "successor table", although really it's just a dictionary. The keys in this dictionary are words, and the values are lists of successors to those words.

Question 1

Here's an incomplete definition of the build_successors_table function. The input is a list of words (corresponding to a Shakespearean text), and the output is a successors table. (By default, the first word is a successor to "."). See the example below:

def build_successors_table(tokens):
    """Return a dictionary: keys are words; values are lists of
    successors.

    >>> text = ['We', 'came', 'to', 'investigate', ',', 'catch', \
                'bad', 'guys', 'and', 'to', 'eat', 'pie', '.']
    >>> table = build_successors_table(text)
    >>> expected = {'and': ['to'], 'We': ['came'], 'bad': ['guys'], \
                'pie': ['.'], ',': ['catch'], '.': ['We'], \
                'to': ['investigate', 'eat'], 'investigate': [','], \
                'catch': ['bad'], 'guys': ['and'], 'eat': ['pie'], \
                'came': ['to']}
    >>> expected == table
    True
    """
    table = {}
    prev = '.'
    for word in tokens:
        if prev in table:
            "*** YOUR CODE HERE ***"
        else:
            "*** YOUR CODE HERE ***"

        prev = word
    return table

Question 2

Let's generate some sentences! Suppose we're given a starting word. We can look up this word in our table to find its list of successors, and then randomly select a word from this list to be the next word in the sentence. Then we just repeat until we reach some ending punctuation.

Hint: to randomly select from a list, first make sure you import the Python random library with import random and then use the expression random.choice(my_list))

This might not be a bad time to play around with adding strings together as well. Let's fill in the construct_sent function!

def construct_sent(word, table):
    """Returns a random sentence starting with word, sampling from
    table.
    """
    import random
    result = ' '
    while word not in ['.', '!', '?']:
        "*** YOUR CODE HERE ***"
    return result + word

Great! Now all that's left is to run our functions with some actual code. The following snippet included in the skeleton code will return a list containing the words in all of the works of Shakespeare.

Warning: do NOT try to print the return result of this function):

def shakespeare_tokens(path = 'shakespeare.txt', url = 'http://goo.gl/SztLfX'):
    """Return the words of Shakespeare's plays as a list."""
    import os
    from urllib.request import urlopen
    if os.path.exists(path):
        return open('shakespeare.txt', encoding='ascii').read().split()
    else:
        shakespeare = urlopen(url)
        return shakespeare.read().decode(encoding='ascii').split()

Next, we probably want an easy way to refer to our list of tokens and our successors table. Let's make the following assignments:

>>> tokens = shakespeare_tokens()
>>> table = build_successors_table(tokens)

Finally, let's define an easy to call utility function:

>>> def sent():
        return construct_sent('The', table)

>>> sent()
' The plebeians have done us must be news-cramm'd '

>>> sent()
' The ravish'd thee , with the mercy of beauty '

>>> sent()
' The bird of Tunis , or two white and plucker down with better ; that's God's sake '

Now, if we want to start the sentence with a random word, we can use the folowing:

def random_sent():
    import random
    return construct_sent(random.choice(table['.']), table)

>>> random_sent()
' You have our thoughts to blame his next to be praised and think ?'

>>> random_sent()
' Long live by thy name , then , Dost thou more angel , good Master Deep-vow , And tak'st more ado but following her , my sight Of speaking false !'

>>> random_sent()
' Yes , why blame him , as is as I shall find a case , That plays at the public weal or the ghost .'

Creating a Cypher (Optional)

In this section, we'll be making our own message encryptor using a technique called a one-time pad.

Basically, this technique works by randomly generating a pad for each word in a message. A pad is just a random string of letters the same length as the word. For example, a pad for the word 'dog' might be the string 'yaz'.

Then, we create our encryption by adding the letters of the original word to our pad, mod 26, then using that as the new value for that letter.

For example, the first letter our word 'dog' is d, which is the 3rd letter in the alphabet (indexing from 0). The first letter in our pad 'yaz' is y, the 24th letter in the alphabet. Therefore, the first letter in our encryption would be b (24 + 3 = 27 % 26 = 1).

Since decrypting this type of message relies on the pad we randomly generate, after encrypting our message, we're going to use higher order functions and nonlocal to lock our randomly generated pad away.

In the Cypher section of your starter file, you've already been given the function pad_creator. This function takes in a single word, and returns a pad for it. See the doctests for further clarification on the domain and range of pad_creator.

You've also been given the dictionary letter_dict, which contains key-value pairs of letters of the alphabet with their corresponding index.

Question 3

What is a pad? What is it used for? How is it different from our encrypted word?

How are we encrypting our words? Review the Wikipedia article linked above if you are still unclear on this process.

Make sure you can answer these questions before moving on.

Question 4

The first thing we need to do is to find a way to encrypt a single word.

Write the function word_mutator which takes in a single word and a pad, and returns the encrypted version of that word. You may find letter_dict useful for this task. You may also find the value string.ascii_lowercase useful. string.ascii_lowercase returns a string of all of the lowercase ASCII characters in order.

def word_mutator(word, pad):
    """Returns an encrypted version of word using
    the one-time pass techinique.

    >>> word_mutator('charms', 'secret')
    'ulciql'
    """
    new_word = ''
    "*** YOUR CODE HERE ***"

Question 5

Switching gears briefly for this problem, we need to consider the security of our pad. We can only decrypt our original message if we have the pad that encrypted it. However, we don't just want to return the pad with our encrypted message. Instead, lets create a higher order function make_lock.

make_lock takes in the pad we want to secure, a password, and a number of guesses which defaults to 3. make_lock returns a function which takes in a password attempt.

If the password is not the same as what was given to make_lock, we want to reduce the number of guesses left by 1, and store that attempted guess away. If a password is attempted more than once, or the amount of guess attempts left reduces to zero, the pad should be locked away forever.

However, if one gives the correct password before the guess attempts hits zero, the lock should return the pad given to make_lock, and then never return the pad again.

Make sure you read and understand the doctests for make_lock before you start writing your code.

def make_lock(pad, password, n=3):
    """Returns a function which takes in password attempts.
    If more than n passwords are attempted, then the pad is 
    locked away forever.

    If the same password is attempted more than one, the pad 
    is locked away forever.

    If the password is correct, the pad is returned, and can 
    never be retrieved again from this lock.

    >>> lock1 = make_lock('correcthorsebatterystaple', 'letmein')
    >>> lock1('bad password')
    'Sorry, wrong password. Try again?'
    >>> lock1('123456')
    'Sorry, wrong password. Try again?'
    >>> lock1('letmein')
    'correcthorsebatterystaple'
    >>> lock1('letmein')
    'Out of password attempts!'
    >>> lock2 = make_lock('xyzzy', 'worst. password. ever.')
    >>> lock2('Pikachu')
    'Sorry, wrong password. Try again?'
    >>> lock2('Pikachu')
    'Password attempt repeated: security system locked!'
    """
    attempts = []
    def lock(attempt):
        "*** YOUR CODE HERE ***"
    return lock

Question 6

Now we can finally write our encryption function!

Write the function OTP_encrypter, which takes in a list of words to encrypt, and a password to lock the corresponding pad with. OTP_encrypter should encrypt the original list of words, and mutate the message to store the encrypted words in it. It should return a lock for the pad corresponding to the message. In order to recover the list of words, we would have to use the password to recover the pad (which would be a list of strings, one for each word).

def OTP_encrypter(message, password):
    """Encrypts the words in the orignial list message, and
    returns a lock for the pad generated using password.

    >>> message = ['robbery', 'planned', 'on', 'monday']
    >>> message_copy = message[:]
    >>> lock = OTP_encrypter(message, 'open sesame')
    >>> message == message_copy
    False
    >>> lock('open please?')
    'Sorry, wrong password. Try again?'
    >>> pad = lock('open sesame')
    >>> assert type(pad) == list
    >>> assert len(pad) == len(message)
    >>> assert len(message) == len(message_copy)
    >>> for i in range(len(message)):
    ...     assert word_mutator(message_copy[i], pad[i]) == message[i]
    ...
    """
    pad = []
    "*** YOUR CODE HERE ***"

Congratuations! You've made your very own cryptography program.