Project 2: Typing Test

typing_test.zip

Introduction

Important submission note: For full credit:

Submit Phase 1 (Questions) and Phase 2 (Problems 1 to 3) by Friday, July 12.

Submit Phase 3 by Tuesday, July 23.

You will get an extra credit point by submitting the entire project by Monday, July 22 AND filling out a feedback survey (Problem 9).

You may work with a partner for the entire project.

Phase 1 consists of planning and design for Phase 2 and Phase 3. It is important to carefully plan out your project in order to be successful in Phases 2 and 3.

In this project, you will create a Typing Test, which will allow users to measure their typing speed. Additionally, you will implement autocorrect, which is a feature that attempts to correct the spelling of a word after a user types it. This project is inspired by Type Racer. The exciting part of this project is that the amount of code provided to you is very minimal, so by the end you'll have constructed almost an entire program by yourself!

You can try out our solution to the project online at typing.pythonanywhere.com!

As a disclaimer, this is a new project. It is functional to our understanding, but if you believe you've found an error in the specifications, tests, or provided files, please let us know on Piazza and we will get it fixed as soon as possible.

Download starter files

You can download all of the project code as a zip archive. This project includes several files, but all of your changes will be made to only two: design_doc.txt and typing_test.py. Here are all the files included in the archive:

questions/design_doc.txt: This file contains a sequence of questions about your planning and design. You will submit this file during Phase 1.
typing_test.py: This file will contain all the logic for the Typing Test. You will submit this file during Phase 2 and Phase 3.
utils.py: Functions for interacting with files, strings, and to create some data structures used elsewhere in the project.
gui.py: This file contains code for the graphical interface for the Typing Test.
data/sample_paragraphs.txt: This is a spreadsheet containing all of the sample paragraphs that will be used for your typing test.
data/words.txt: This is a spreadsheet containing all of the valid words that can be recommended by your autocorrect algorithm.

Logistics

The project is worth 20 points. 16 points are assigned for correctness of your final code, 1 point for the questions in phase 1, 1 point for submitting Phases 1/2 by Friday, July 12, and 2 points for composition.

There is one point of extra credit for submitting the project 24 hours early and completing a feedback survey. See [Q9] for this survey.

You will turn in the following files:

design_doc.txt
typing_test.py
utils.py

You do not need to modify or turn in any other files to complete the project. To submit the project, run the following command:

python3 ok --submit

You will be able to view your submissions on the Ok dashboard.

For the functions that we ask you to complete, there may be some initial code that we provide. If you would rather not use that code, feel free to delete it and start from scratch. You may also add new function definitions as you see fit.

However, please do not modify any other functions. Doing so may result in your code failing our autograder tests. Also, please do not change any function signatures (names, argument order, or number of arguments).

Throughout this project, you should be testing the correctness of your code. It is good practice to test often, so that it is easy to isolate any problems. However, you should not be testing too often, to allow yourself time to think through problems.

We have provided an autograder called ok to help you with testing your code and tracking your progress. The first time you run the autograder, you will be asked to log in with your Ok account using your web browser. Please do so. Each time you run ok, it will back up your work and progress on our servers.

The primary purpose of ok is to test your implementations.

We recommend that you submit after you finish each problem. Only your last submission will be graded. It is also useful for us to have more backups of your code in case you run into a submission issue.

If you do not want us to record a backup of your work or information about your progress, you can run

python3 ok --local

With this option, no information will be sent to our course servers. If you want to test your code interactively, you can run

 python3 ok -q [question number] -i

with the appropriate question number (e.g. 01) inserted. This will run the tests for that question until the first one you failed, then give you a chance to test the functions you wrote interactively.

You can also use the debug printing feature in OK by writing

 print("DEBUG:", x)

which will produce an output in your terminal without causing OK tests to fail with extra output.

Graphical User Interface

A graphical user interface (GUI, for short) is provided for you. At the moment, it doesn't work because you haven't implemented the logic for Typing Test. Once you complete Problems 01 and 02, you will be able to use your Typing Test!

In order to render the graphics, make sure you have Tkinter, Python's main graphics library, installed on your computer. In most cases, it is installed automatically when you install Python because it is part of the standard library. Once you've done that, you can run the GUI from your terminal:

python3 gui.py

Phase 1: Design Questions

For the first part of this project, you will not be writing any code. In order to test the code for the questions in Phase 2/3, you will need to complete & submit questions.txt.

Your task for Phase 1 is as follows:

Read through the spec for Phase 2 and Phase 3 to see what tasks you are responsible for accomplishing.
Try to break down each problem into a number of smaller problems. For example, for Q5 of Hog, you could break the problem down into the following smaller problems:
- Repeatedly take turns until one player has at least goal points.
- Track the current player (so that the correct player takes each turn) and switch players after every turn.
- Determine the number of dice to roll, the number of points scored, and which special rules apply for each turn. Some smaller problems are solved in their own functions, such as take_turn, because those subproblems can be expressed as inputs and outputs. Sometimes those functions are most naturally expressed with yet another function, such as take_turn calling roll_dice to handle the details of rolling more than one die.
For each problem, think about what techniques you will need to use. For example, will you need recursion or lists? All problems in this project can be solved using the techniques you've learned in the course so far, but you will need to think about which ones are applicable for a given problem.
After familiarizing yourself with all of the above, you should write out your plan for how you will be implementing these functions and structuring your code. In order to do this, we have provided a .txt file containing a sequence of questions, design_doc.txt. You should directly edit this file. For phase 1, this file is what you will submit.

How to submit: You will not submit your design doc for Phase 1 through okpy. Instead, we have a separate google form. You may only submit your design doc once, so please do not submit it until you have completed all the questions. However, we will be quite lenient while grading as long as you answer all the questions and show that you put thought into your responses. The intent is to get you to think about your code before starting, not to penalize you if you don't understand something fully before writing code.

Once you submit, you will receive a "secret word". Make sure to write down this word. After receiving this, you should go to utils.py and change the value of the passphrase variable to be a string containing the secret word. You will need to update the value of this variable in order to be able to test your code in Phase 3.

Although you will be able to test your code in Phase 2 without completing the design document, we highly encourage you complete the relevant questions in Phase 1 first. There are no unlocking questions for this project, so this will be how you think through the problem first.

Phase 2: Basic Functionality

All changes in this part will be made in typing_test.py.

Problem 01: Sample Paragraphs (1 pts)

We have provided you with sample paragraphs in data/sample_paragraphs.txt. The purpose of these paragraphs is to appear in the "Reference Text" box of the GUI, because this is the text that the user is typing during the Typing Text. It will also be used elsewhere to get access to a dictionary of words.

In order to do this, you should implement two functions.

The first is lines_from_file, which takes in a string path which indicates the location of a .txt file. lines_from_file should return a list containing each row of the file as a string. Lines in a .txt file are separated by newlines, but may or may not display on a single line when you open it. If a line without any text appears anywhere but the end of the file, you should add an empty string to the list. If you use the functions we provide to you, you will probably not have to worry about this.

The lines should appear in the same order as they appear in the file, so the first row of the file will be the first element of the list. Any spaces or newlines at the beginning or end of the line should be removed.

Next, implement new_sample, which takes in a string path and a nonnegative integer i. The path will indicate the location of the sample_paragraphs.txt file in your directory. new_sample should return the paragraph in the i-th line of this file. We will 0-index lines, so the 0-th line is the top line of the file.

Completing this problem will require familiarizing yourself with the functions in utils.py. In order to understand how to use these functions, we strongly recommend reading their in-depth descriptions in the appendix. Note: When you use the utility functions to read lines from a file, newlines are represented by \n.

After writing code, test your implementation:

python3 ok -q 01

Problem 02: Analyze (3 pts)

Implement the function analyze. This function takes in a string sample_paragraph, a string provided by user input typed_string, a number start_time and a number end_time. Both start_time and end_time are measured in seconds.

This function should output a list containing two values: words per minute and accuracy percentage.

Words per minute is a measure of the number of words typed per minute. However, in order to compensate for the varying length of words, we will calculate the number of words as the number of characters in the string divided by five.

For example, the string "I am glad!" contains three words and ten characters. The words per minute calculation will then use 2.0 (because 10 / 5 = 2.0) as the number of words instead of 3. If we typed this in 30 seconds (half a minute), our speed would be 4.0 words per minute.

Accuracy percentage is a measurement of what percent of the typed words were typed correctly. For example, if the reference text is "this is a test" and the typed text is "this os a tst", the accuracy is 50.0 percent, since half of the typed words were typed correctly. For calculating accuracy, you should calculate the actual number of words in the string (instead of approximating it based on the number of characters).

Note that if the typed string has MORE words than the reference string, all extra words will not be used in the accuracy calculation, so you should calculate accuracy based on the number of words in the reference string. For example, if the reference text is "I love CS61A" and the typed text is "I lvoe CS61A a lot", only the first three words will be used to calculate accuracy. Of these first three words, two are accurately typed, so the accuracy score would be about 66.667.

On the other hand, if the typed string has FEWER words than the reference string, calculate accuracy based on the number of words typed so far. For example, if the reference text is "I love CS61A" and the typed text is "I lvoe", only the first two words will be used to calculate accuracy. Of the first two words, one is accurately typed, so the accuracy score would be 50.0. Additionally, in the case that the typed string contains 0 words, the accuracy percentage should be 0.0. Extraneous whitespace can be ignored.

It may be helpful to define helper functions inside of analyze to separate the logic of calculating words per minute and calculating accuracy.

Note: Because sample_paragraph and typed_string are both strings containing full paragraphs, you will need some way figuring out how many words each string contains and whether those strings match (for accuracy). Reading about Some of the string functions we provided might be helpful.

After writing code, test your implementation:

python3 ok -q 02

Once your implementation is done, you should be able to try out the Typing Test in the GUI. Press "Start", then type out the sample paragraph in the lower text box, and press "Save" once you are done. As you type, your Words Per Minute and Accuracy scores will be calculated using your algorithm and displayed on the righthand side.

You can try it out using the following command:

python3 gui.py

Problem 03: Pig Latin (2 pts)

Time for some fun! In this problem, we will implement the function pig_latin, which will allow us to translate the sample paragraphs to pig latin in the GUI. pig_latin is a function which takes in a single word as a string and returns the translated version of the word.

For our purposes, here are the rules of Pig Latin.

Rule One. Words beginning with a Consonant
- When a word begins with a consonant (such as dog) or a consonant cluster (such as brush), simply take the consonant/consonant cluster and move it to the end of the word, adding the suffix ‘ay’ to the end of the word. A "consonant cluster" refers to all the consonants at the start of a word before the first vowel.
- Example 1: ‘dog’ in Pig Latin becomes ‘ogday’, because ’og’ + ’d’ + ’ay’ = ‘ogday’. (In other words, the leading consonant ‘d’ has been moved to the end of the word, leaving simply ‘og’ at the beginning, and the suffix ‘ay’ has been appended to the ‘d’).
- Example 2: brush in Pig Latin becomes ‘ushbray’, because ‘ush’ + ’br’ + ’ay’ = ‘ushbray’ (In other words, the leading consonant cluster ‘br’ has been moved to the end of the word, leaving simply ‘og’ at the beginning, and the suffix ‘ay’ has been appended to the ‘d’).
Rule Two. Words Beginning with a Vowel
- When a word begins with a vowel, simply leave the word as is and add the suffix ‘way’ to the end of the word.
- Example 1: ‘elephant’ in Pig Latin becomes ‘elephantway’, because ‘elephant‘ + ‘way’ = ‘elephantway’.
- Example 2: ‘awesome’ in Pig Latin becomes ‘awesomeway’, because ‘awesome’ + ‘way’ = ‘awesomeway’.

You can see how these rules work in practice below:

Original Phrase

Translation

If the word does not contain any vowel, it should treat the entire word as a singly consonant cluster and simply append "ay" at the end. For example, ‘nth’ in Pig Latin becomes ‘nthay’, because ‘nth’ + ‘ay’ = ‘nthay’.

Your implementation of Pig Latin should be able to work with words that contain or begin with non-alphabetic characters (e.g. numbers or punctuation). You can treat every character that isn't a vowel as a consonant.

Also note that for our purposes here, we will always consider y as a consonant. Vowels, therefore, consist of the characters a, e, i, o, and u.

You may assume that the input will only contain lowercase letters and non-alphabetic characters.

After writing code, test your implementation:

python3 ok -q 03

Once your implementation is done, you should be able to check the "Pig Latin" box in the GUI. When you check the "Pig Latin" box, the sample paragraph will be translated to Pig Latin. You can try it out using the following command:

python3 gui.py

Congratulations! You've completed Phase 1 and 2 of the project.

Phase 3: Autocorrect

Important note: You will not be able to test your code in this phase before answering the design questions. This is because it is important to have an understanding of the entire algorithm you will be implementing before you get started. Please see the instructions on Phase 1 for how to be able to test these questions.

In this section of the project, you will implement autocorrect. Autocorrect is a feature that effectively "corrects typos". In other words, given a string that is not a valid word, autocorrect finds the intended word.

Problem 04: Autocorrect Skeleton (2 pts)

Implement the function autocorrect. This function takes in a string user_input representing a single word, a list of all valid words words_list, and a function score_function.

If the user_input string is contained inside the words_list, the autocorrect function should simply return that word. Otherwise, the autocorrect function should return the word from words_list that has the lowest difference from the provided user_input string based on the score_function.

A score function takes in two arguments, which are the two strings we're comparing (first the user_input string and then the word from words_list). The output of the score function, which is a number, represents the "difference" between the two strings. The bigger the output of the score function, the greater the difference between the two strings.

You will not implement any score functions in this problem. You will simply use the score function passed in to the function.

Important - if multiple strings have the lowest difference according to the score_function, you should return the string that appears first in words_list.

If you have not attempted the optional questions for Lab04, we recommend you attempt at least key_of_min_value. You may find that using max or min with the key argument is helpful for this project.

After writing code, test your implementation:

python3 ok -q 04

Problem 05: First Score Function (2 pts)

Let's try implementing a score_function. Implement swap_score, which should take in two strings.

It will return the number of characters we need to substitute to change the characters in the first string into the corresponding characters in the second string.

If the strings not of equal length, we will disregard all "extra" characters in the larger string.

You should consider the leftmost character of each string to correspond to each other. For example, if the two inputs are "word" and "weird", we'll only consider potential substitutions for "word" and "weir".

Here are some more examples:

 >>> swap_score("nice", "rice") # Substitute n to r
 1
 >>> swap_score("range", "rungs") # Substitute a to u, e to s
 2
 >>> swap_score("pill", "pillage") # Don't substitute anything
 0
 >>> swap_score("byte", "bit") # Substitute y to i
 1

Although you can implement this function iteratively, we highly recommend using recursion. This will make the next problem significantly easier.

After writing code, test your implementation:

python3 ok -q 05

Problem 06: Second Score Function (3 pts)

Now, you will implement a more complicated score_function. Your score_function takes in two strings, word1 and word2, and returns the minimum number of operations that need to occur in order for word1 to be transformed to match word2.

In order to transform word1 to word2, you can perform the following three operations:

Delete letters from word1
Add letters to word1
Substitute a letter in word1 for another

For this function, each of these options will contribute 1 to the score, but for future score functions these may have a different weight. This means that your function should return the number of these operations it takes to convert the first string to the second.

For example:

>>> score_function("tesng", "testing")
2    # tesng -> testng -> testing. Two letters need to be inserted to transform tesng to testing.
>>> score_function("rlogcul", "logical")
3 # rlogcul -> logcul -> logicul -> logical. One letter needs to be deleted, one letter needs to be inserted, and one letter needs to be substituted to transform rlogcul to logical.
>>> score_function("misspelled", "spelling")
6 # misspelled -> isspelled -> spelled -> spellied -> spellind -> spelling # Three letters need to be deleted, one letter needs to be inserted, and two letters need to be substituted with different letters.

We have provided a skeleton implementation in typing_test.py. This is a recursive function with three recursive calls. As a hint, one of these recursive calls will be similar to the recursive call for the function you implemented in Problem 05. The remaining two are very similar to questions you've seen (or will see) in Lab05. Once you have these, it is up to you to figure out how to combine them and what base cases are necessary.

You may modify the skeleton implementation however you want, including as many additional elif cases as you see fit.

Note: you might find that your code is timing out for this problem (or the next problem). This does not necessarily mean your code is incorrect, but it might mean that you have something that is making your code slightly inefficient. If you are using print statements to debug, try removing them once you pass the initial tests. You might also try rearranging the order in which you make any recursive calls.

After writing code, test your implementation:

python3 ok -q 06

Problem 07: Accuracy (1 pts)

Now, there may be a problem with relying on our existing score_function. Sometimes, for a given user_input string, multiple valid words have an equal difference from the user_input string. However, giving that the user is typing on a QWERTY keyboard (see above), certain words are much more likely to be the intended word compared to others. For example, given the string "wird" as user_input, both "bird" and "wire" are valid words that have a difference of 1 from "wird." "wire" is a much more likely to be the word that the user was intending to type, because "d" is much closer to "e" than "w" is to "b" on the keyboard. In other words, "wire" is the better word to autocorrect to.

We've provided you with a dictionary KEY_DISTANCES. You can see how this dictionary was created in utils.py if you're curious, but you will not need to call any function in utils.py in order to create it.

Your task is to implement the function score_function_accurate. This is a score function that, for every substitute operation where an existing letter is substituted with another letter, takes into account how far the letter being substituted in is from the letter being replaced. For example, substituting "q" with "t" increases the difference measurement by twice as much as "q" and "e" because "q" and "t" are 4 keys apart, while "q" and "e" are 2 keys apart.

You should still weight each addition or deletion operation as one unit, as in the previous question.

You can assume that all inputs only include letter characters (no numbers, quotation marks ("), commas (,), periods (.), etc.).

After writing code, test your implementation:

python3 ok -q 07

Problem 08: Efficiency (2 pts)

The score function we wrote in the previous two questions is very inefficient. You will likely find that you will make the same recursive call multiple times. Evaluating the score of any two words is not that slow, but you will likely find that running autocorrect on more than about 20 words is prohibitively slow. We will need a way to prevent re-computing recursive calls that we have already computed once.

If score_function has already been called on two words, then we should not have to re-calculate their difference. Instead, if score_function is called on two words a second time, the previously calculated difference should be returned without re-calculating the difference. The process of "remembering" when we have made a call on two strings in this way is called memoization.

Remember, the order that the two words were input does not matter, because the difference that score_function returns should be equal regardless of the order of the words. In other words, score_function("bite", "might") and score_function("might", "bite") are equal, so the second function call should not involve re-calculating the difference between the two words.

Your task is to implement the function score_function_final, which should maintain the same accuracy of score_function_accurate but run more quickly using memoization. We will test this by using your function to search a large list of words. Without using memoization, you are likely to time out. Your implementation will need to be able to compare a word against the entirety of words.txt in less than 10 seconds, but a reasonable implementation will be able to do it in less than half that time.

After writing code, test your implementation:

python3 ok -q 08

Congratulations, you have reached the end of your second CS 61A project! If you haven't already, relax and practice your typing using the test you've created.

Problem 09: Feedback

This question is not worth any points by itself, but in order to get the extra credit point you must complete it (as well as submit the project 24 hours early).
As this is a new project, we'd love your feedback. Please fill out this form to give us your feedback. Although we will collect your personal information in order to be able to give extra credit, your responses will be anonymized when actually viewing the feedback. Your feedback will additionally not affect your ability to get the extra credit point or your score on the project and in the class.

Even if you do not submit the project in time for the extra credit point, please fill out the form! We've tried quite a few new things in this project, and while we hope that some of it worked, we know there's still certainly room for improvement. We really want to know what went well and what we can change in the future to make this project, and the projects in this course as a whole, even better.

Conclusion

Again, congratulations on reaching the end of this project!

Ensure you have submitted your code by running python3 ok --submit and that you have submitted your design doc to the form

By running python3 gui.py, you will be able to use ALL of your code to test your typing speed. The "start" button will begin a round, and the "save" button will stop the test and display your WPM and accuracy. The "Pig Latin" checkbox will transform the reference text into Pig Latin, and the "autocorrect" checkbox will correct any typos using your Score Function.

There are many additions that can still be made to this function. For example, you might find some examples where your final score function is still not quite perfect. For example, while testing this project, we found that it translates the string "speling" to "working", wheras its much more likely that the intended word was "spelling".

Some ideas of how to improve your score function are:

Scaling the key_distance dictionary differently
Taking into account which additions and deletions are more likely than others (for example, it's much more likely that you'll accidentally leave out a letter if it appears twice in a row)
Trying to incorporate common misspellings

You might also improve your project by changing the structure of your code. You likely noticed that you had to copy and paste a lot of code for the last couple of functions, so think about what changes you can make to reuse code.

If you come up with any cool additions or modifications to this project, please let us know! Since this is a new project, there's a lot of room to change questions or add new features in the future. You might help make a new question for a future semester!

Acknowledgments: Ryan Moughan, Tiffany Perumpail, and Chris Allsman developed this project, including the tests, skeleton code, and spec. Aman Shah developed the GUI alongside Ryan Moughan. Rahul Arya created the Pig Latin demo for the spec. Many people, including all the TAs and Tutors for Summer 2019 put many hours into testing the project and giving valuable input.

Appendix: Utility Functions

A number of utility functions are provided in utils.py. You will not need to use all of these, but you will need to use a handful of the file reading utilities we provide, and might find some of the string utilities helpful. We also define here a function to create the dictionary of key distances you will use in Question 7.

Note: the following descriptions are modified from the official documentation for the io and string built-in libraries, as well as some additional documentation on file I/O. We have wrapped these methods in some more familiar syntax and removed some unnecessary (for your purposes) parameters from the calls. You are free to call these methods directly in your code if you wish or to use other methods not explicitely given here.

File-reading Functions

open(path, mode='r')

Open the file located at path and return a corresponding file object. The file object returned by open is sometimes called a "stream". If the file cannot be opened, an OSError is raised. path should be a string representing the path to the file (for example, "data/words.txt").

mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. Other common values are 'w' for writing (truncating the file if it already exists), 'x' for exclusive creation (failing if the file already exists), and 'a' for appending.

Note: open is a built-in function in Python, thus it does not appear in utils.py. There are many other optional parameters, but you will not need to concern yourself with any of them.

close(file)

Flush and close file, which should be a file object. This method has no effect if the file is already closed. Once the file is closed, any operation on the file (e.g. reading or writing) will raise a ValueError.

As a convenience, it is allowed to call this method more than once; only the first call, however, will have an effect.

If you don’t explicitly close a file, Python will eventually close the open file for you and free any resources it is using, but the file may stay open for a while. Another risk is that different Python implementations will do this clean-up at different times.

readable(file)

Return True if the stream can be read from. If False, trying to read from the file will raise an OSError.

readline(file)

Read and return one line from the stream. If size is specified, at most size bytes will be read.

Includes a line terminator at the end of the line for text files. If you have reached the end of a file, readline(file) will return an empty string (with no line terminator).

Once reading a row, you will be unable to read it again until you open the file again.

readlines(file)

Read and return a list of lines from the stream. The list will contain all unread lines.

String Functions

lower(s)

Return a copy of the string s with all the cased characters converted to lowercase. Cased characters, for our purposes, are just letters.

split(s, sep=None)

Return a list of the words in the string, using sep as the delimiter string.

If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an empty string with a specified separator returns [''].

For example:

>>> split('1,2,3', ',')
['1', '2', '3']
>>> split('1,2,,3,', ',')
['1', '2', '', '3', '']

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

Whitespace includes spaces, tabs, and newlines.

For example:

>>> split('1 2 3')
['1', '2', '3']
>>> split('   1   2   3   ')
['1', '2', '3']

strip(s, chars=None)

Return a copy of the string with leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. Whitespace includes spaces, tabs, and newline characters. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:

>>> strip('   spacious   ')
'spacious'
>>> strip('www.example.com','cmowz.')
'example'

The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end.