Lab 8: Practical Asymptotic Analysis and TreeMaps

Due Date: Friday 3/11 11:59PM.

A. Intro


In lecture you have been learning about complexity theory and how to determine the asymptotic bounds of functions. This practice has involved computing theoretical bounds, but it can be useful to tie this theory into real examples. This is exactly what you will be doing in this lab!

We consider some alternative ways of measuring the efficiency of a given code segment. Given a function f, we want to find out how fast that function runs. One way of doing this is to take out a stopwatch and measure exactly how much time it takes for f to run on some input. We will look at some plots of runtime to understand how some functions' runtimes change with the size of their input. However, we will also see that there are problems with using elapsed time for this purpose. These empirical measures are not meant to replace the better models for analyzing runtime complexity from lecture and discussion, but merely provide another perspective on the runtime of functions. You will conduct several experiments, discuss the results with your partner, and then record some of the answers from your discussion.

The next part of the lab will take a look at different implementations of maps. We have provided a complete implementation to a map which uses a linked list to store key-value pairs. You will be asked you to complete the implementation of a faster map which uses a binary search tree (BST) to store the key-value pairs. After implementing your tree map, you should be able to compare its performance to java.util's implementation of a BST map, TreeMap, and hopefully see that the performance is similar (at least on randomly inserted points).

To get started pull the files for lab 8 from the skeleton.

git fetch shared
git merge shared/lab8 -m "Start lab8"
git push

Update: (March 9th, 9:22 am)
There was an off by one bug in LinkedListMap.java. If you've pull the lab skeleton code before March 9th 9:22 am, please run the following command to fetch the fix:

git fetch shared
git merge shared/lab8 -m "Pull LinkedListMap Fix"

Since students are not required to write in LinkedListMap.java, this shouldn't result in merge conflict. But, if you are prompted to enter a merge message, you can exit out of that window by pressing Esc then type :wq and then press Enter to exit out of the window and complete the merge. Sorry for the inconvenience!

The following are useful links and content review videos for this lab:

B. Estimating Efficiency by Timing Functions


Algorithms

An algorithm is an abstract notion that describes an approach for solving a problem. The code we write in this class, our programs, are implementations of algorithms. For example, consider the problem of sorting a list of numbers. There are many different algorithms that will allow us to sort a list of numbers, each of which might have its own runtime. We will learn more about many of these sorting algorithms later in the class, but we will introduce a few of them briefly here.

One algorithm we might use to solve this problem is called bubble sort. Bubble sort tells us we can sort a list by repeatedly looping through the list and swapping adjacent items if they are out of order, until the entire sorting is complete. Another algorithm we might use to solve this problem is called insertion sort. Insertion sort says to sort a list by repeatedly looping through our list, removing the minimum element, and putting it into a new list in the correct order.

Several websites like VisuAlgo, Sorting Algorithms, and USF have developed some animations that can help us visualize these sorting algorithms. Spend a little time playing around with these demos to get an understanding of how much time it takes for bubble sort or insertion sort to completely sort a list. Again we'll revisit sorting in more detail later in this course, but for now, try to get a feeling of how long each algorithm takes to sort a list. How many comparisons does each sort need? And how many swaps? Can you guess the overall runtime? Don't spend too much time on this, we will revisit sorting in much more detail later in the class.

Since each comparison and each swap takes time, we want to know which is the faster algorithm. The runtime bounds that we compute using asymptotic analysis might not tell us this. Asymptotic bounds might tell us that two functions should run in time proportional to one another, but they would not tell us which of the two is actually faster. This is one case where it might be better to turn to empirical measures, despite their many potential problems.


Time Estimates

To start, it seems that the most reasonable way to estimate the time an algorithm takes is to measure the time directly. Each computer has an internal clock that keeps track of time, usually in the number of fractions of a second that have elapsed since a given base date. The Java method that accesses the clock is System.nanoTime(). We have provided the timing.Timer class which implements functions that behave like a stopwatch which we'll use to time different operations. You should not have to interact directly with the timing.Timer class, but it might be useful to understand how it is implemented and what functionality it has. Spend a few minutes familiarizing yourself with the file and discuss what each part of this class seems to be doing with your lab partner.


Exercise: Timing Sorting Algorithms

The file timing.Sorter.java contains multiple implementations of various sorting algorithms, including the bubble sort and insertion sort algorithms mentioned earlier. Each Sorter contains the following methods (some of which might be inherited from the abstract class Sorter).

Again you do not understand completely what each of these sorters is doing, but it might be interesting to see how each of them works. Pay close attention to the asymptotic runtimes of each of these different algorithms. The full list of algorithms contained and their asymptotic runtimes are as follows:

We would like to compare these sorters, and much as before with asymptotic analysis we are mainly concerned with how the speed of the algorithm changes as a function of the input size. The input size here will be the length of the array to be sorted. In order to do this we will be running a series of calls to each of the different sorting algorithms and then plotting the results in order to visualize how each of them appears to grow. Optional: If you are interested in seeing how we are plotting these in Java, feel free to check out the timing.GraphUtil.java file. We will not be expecting you to understand the code here, but again it might be interesting to look at.

The file timing.SortTiming contains code to generate a plot of the runtimes of the different sorting implementations when sorting different sizes of arrays. Again feel free to examine this code, but you are not expected to understand everything that is going on here. You should be able to treat this class as a black box that accepts three arguments (or sets them by default) corresponding to the number of different array sizes it should test (ntrials), the interval by which it should increase the array size between trials (by), and the number of times it should repeat each size array per sorting implementation (nrepeats). By default, the minimum size array is 5. When you run this program it will take some time to run all of the experiments; then, a nice plot should be opened on your computer.

For example, the commands

javac timing/SortTiming.java
java timing.SortTiming 100 50 10

uses the command-line arguments to plot all sorters' runtimes for 100 trials, each increasing by 50 elements, and repeating each trial 10 times (and averaging the result).

Try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of SortTiming and alter the ntrials, by, and nrepeats values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of ntrials, by, and nrepeats to see how the graphs change!


Discussion: Timing Results

Once you have the visualization up and running, answer each of these questions with your lab partner!

We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the sort_timing.txt file in your lab8 directory.

Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.


Amortization

In lecture you should have also learned about amortization. If you have not watched lecture or reviewed the slides please take some time to go through slides 8-13 or the rest of this section will not make as much sense. Still we will briefly explain amortization here to recap the main points.

If we look at a sequence of $N$ operations to be carried out for a specific algorithm (which might be an operation on a data structure), it is possible that each one of those operations differs in time complexity. Amortized time means that instead of considering the worst case time complexity for a given operation, we will instead look at the "average" time for each one of the operation. For certain instances, this can be more insightful than just assuming that each of the operations will run in worst case time. In this next exercise we will examine different schemas of resizeable arrays to see what the amortized performance of each of them is.


Exercise: Timing Amortized Lists

The file timing.GrowList.java contains multiple implementations of a list that can grow as more elements are inserted into it. Each GrowList contains the following methods (some of which might be inherited from the abstract class GrowList).

Unlike the sorters from above, you should be able to understand all of the code for each of the GrowList implementations that we have provided. After reading the below descriptions, spend some time with your partner to look at each of their implementations and discuss how each of them work. Once again, pay close attention to the asymptotic runtimes of each of these variants. The full list of implementations and their asymptotic runtimes are as follows:

The file timing.AmortizationTiming contains code to generate plots of the runtimes of the different GrowList implementations as many elements are added to each of them. Again feel free to examine this code, but you can again treat this class as a black box. The three arguments accepted (or set by default) are maxSize which specifies the number of elements to add, nLists which specifies the number of lists to test on, and accumulate which will specify whether you would like to visualize the accumulated runtimes or the per operation runtimes (accumulate should be passed as -accum or -noaccum, see below for more information). When you run this program it will take some time to run all of the experiments, and then a nice plot should be opened on your computer.

For example, the commands

javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -noaccum

uses the command-line arguments to plot all lists' runtimes per operation for 1024 additions into 1000 different lists (averaging the result). On the other hand the commands

javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -accum

uses the command-line arguments to plot the accumulated runtimes for the first i insertion for the same 1024 additions into the same 1000 lists.

Just like before try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of AmortizationTiming and alter the maxSize, nLists, and accumulate values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of maxSize, nLists, and accumulate to see how the graphs change!


Discussion: Amortization Results

Once you have the visualization up and running, answer each of these questions with your lab partner!

We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the amortized_timing.txt file in your lab8 directory.

Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.

C. Practice with Asymptotics and Maps


Update: (March 9th, 9:22 am)
There was an off by one bug in LinkedListMap.java. If you've pull the lab skeleton code before March 9th 9:22 am, please run the following command to fetch the fix:

git fetch shared
git merge shared/lab8 -m "Pull LinkedListMap Fix"

Since students are not required to write in LinkedListMap.java, this shouldn't result in merge conflict. But, if you are prompted to enter a merge message, you can exit out of that window by pressing Esc then type :wq and then press Enter to exit out of the window and complete the merge. Sorry for the inconvenience!


Simple Map Introduction

For the next part of the lab we will shift gears and look at different implementations of the SimpleMap interface we have provided for you. A map is an abstract data type which maintains pairs of keys and associated values. You probably have used them before for many tasks in 61B this semester or previously in other classes (a Python dictionary is a map). The SimpleMap interface contains the following two following methods:


LinkedListMap

As the name suggests this implementation of SimpleMap is backed by a linked list. Unlike the nodes in previous linked lists we have looked at (IntList and IntDList), the MapNode elements of this linked list will contain both a key and a value. Optional: You will not be implementing any of the functionality of this class, but familiarize yourself with the code. Discuss with your partner what the best and worst case time is for the put and get operations are.


Exercise: TreeMap

The TreeMap implementation of the SimpleMap class will be an implementation of the SimpleMap interface that is instead backed by a binary search tree. If you would like a refresher on binary search trees, please see the slides from lecture. We have provided the basic structure of this class for you which we will describe in more detail here. Similar to the LinkedListMap there will be a TreeMapNode inner class which will be the underlying recursive binary search tree structure. The TreeMap class has a _root variable which points to the root of the tree. The TreeMapNode class contains the following variables:

We have provided the put and the get functions for you and you should not need to edit these functions (although you may if you so choose as long as the tests still pass). The structure presented here is a common design pattern for classes which are wrapper classes for tree recursive data structures. The get and put functions themselves are not recursive as they operate on a TreeMap (which itself is not a recursive data structure). These methods call putHelper and getHelper which are recursive functions that take in TreeMapNodes and return TreeMapNodes.

Your task for this lab will be to fill in the recursive putHelper and getHelper functions which handle most of the tree related operations. The goal will be to complete these methods such that your completed TreeMap implementation will perform similarly to that of Java's java.util.TreeMap. Before beginning take a look at the code provided in the TreeMap and make sure you understand the structure. Next implement the two methods by filling in the // FIXMEs in the code. Finally discuss with your partner what the worst case and best case runtimes for each of these methods will be.

Hint: The keys of the map will be Comparable<K> which means that you should use the compareTo(K key) function in order to compare keys. This will be useful for traversing through the tree correctly.


MapTest

We have provided a small set of unit tests to test both the LinkedListMap and the TreeMap that you will be implementing. There are three helper methods smallTestMap, fuzzTestMap (this test is a fuzz test), and timedFuzzTestMap. More information can be found in the comments of these methods. The fuzzTestMap and the timedFuzzTestMap might be different than other testing functions that you have seen before. Optional: Spend some time looking at each of these functions and try to understand what they are doing, discuss this with your partner. You might be able to take ideas from these tests to write new kinds of test for other code in your future!

The actual tests that will be ran on the autograder are as follows (all just call one of the above three testing functions).

By default you should pass the LinkedListMap tests and fail the TreeMap tests. In order to get full credit for this section of the lab you must pass all 5 of these tests.

Optional: It might be fun to see in practice how much better we can do when we use a binary search tree over a linked list. Try writing a new test, or modifying an existing one to see what the ratio between the speed of the LinkedListMap and the java.util.TreeMap is.

Optional: The TreeMap that you have created here probably offers a good speedup over the LinkedListMap, and might even be close to that of the java.util.TreeMap however it might not be quite as good as it seems. In the tests that we have provided we insert random key-value pairs which has empirically been shown to result in roughly balanced tree structures. If we instead insert a large number of key-value pairs into our tree with keys that are in sorted order, then the performance of our tree might be decreased significantly. Discuss with your partner why this might be, and then see if you can verify this empirically by writing a new test or refactoring the existing code.

D. Submission and Grading


Deliverables

For full credit on the lab you must:

Submission

You can submit the same as usual:

<git add / commit>
git tag lab8-0 # or the next highest submission number
git push
git push --tags