Lab 8: Practical Asymptotic Analysis and TreeMaps

Due Date: Friday 10/15 11:59PM.

## A. Intro

In lecture you have been learning about complexity theory and how to determine the asymptotic bounds of functions. This practice has involved computing theoretical bounds, but it can be useful to tie this theory into real examples. This is exactly what you will be doing in this lab!

We consider some alternative ways of measuring the efficiency of a given code segment. Given a function f, we want to find out how fast that function runs. One way of doing this is to take out a stopwatch and measure exactly how much time it takes for f to run on some input. We will look at some plots of runtime to understand how some functions' runtimes change with the size of their input. However, we will also see that there are problems with using elapsed time for this purpose. These empirical measures are not meant to replace the better models for analyzing runtime complexity from lecture and discussion, but merely provide another perspective on the runtime of functions. You will conduct several experiments, discuss the results with your partner, and then record some of the answers from your discussion.

The next part of the lab will take a look at different implementations of maps. We have provided a complete implementation to a map which uses a linked list to store key-value pairs. You will be asked you to complete the implementation of a faster map which uses a binary search trees (BST) to store the key-value pairs. After implementing your tree map, you should be able to compare its performance to java.util's implementation of a BST map, TreeMap and hopefully see that the performance is similar (at least on randomly inserted points).

To get started pull the files for lab 8 from the skeleton.

git fetch shared
git merge shared/lab8 -m "Start lab8"
git push

## B. Estimating Efficiency by Timing Functions

### Algorithms

An algorithm is an abstract notion that describes an approach for solving a problem. The code we write in this class, our programs, are implementations of algorithms. For example, consider the problem of sorting a list of numbers. There are many different algorithms that will allow us to sort a list of numbers, each of which might have its own runtime. We will learn more about many of these sorting algorithms later in the class, but we will introduce a few of them briefly here.

One algorithm we might use to solve this problem is called bubble sort. Bubble sort tells us we can sort a list by repeatedly looping through the list and swapping adjacent items if they are out of order, until the entire sorting is complete. Another algorithm we might use to solve this problem is called insertion sort. Insertion sort says to sort a list by repeatedly looping through our list, removing the minimum element, and putting it into a new list in the correct order.

Several websites like VisuAlgo, Sorting.at, Sorting Algorithms, and USF have developed some animations that can help us visualize these sorting algorithms. Spend a little time playing around with these demos to get an understanding of how much time it takes for bubble sort or insertion sort to completely sort a list. Again we'll revisit sorting in more detail later in this course, but for now, try to get a feeling of how long each algorithm takes to sort a list. How many comparisons does each sort need? And how many swaps? Can you guess the overall runtime? Don't spend too much time on this, we will revisit sorting in much more detail later in the class.

Since each comparison and each swap takes time, we want to know which is the faster algorithm. The runtime bounds that we compute using asymptotic analysis might not tell us this. Asymptotic bounds might tell us that two functions should run in time proportional to one another, but they would not tell us which of the two is actually faster. This is one case where it might be better to turn to empirical measures, despite their many potential problems.

### Time Estimates

To start, it seems that the most reasonable way to estimate the time an algorithm takes is to measure the time directly. Each computer has an internal clock that keeps track of time, usually in the number of fractions of a second that have elapsed since a given base date. The Java method that accesses the clock is System.nanoTime(). We have provided the timing.Timer class which implements functions that behave like a stopwatch which we'll use to time different operations. You should not have to interact directly with the timing.Timer class, but it might be useful to understand how it is implemented and what functionality it has. Spend a few minutes familiarizing yourself with the file and discuss what each part of this class seems to be doing with your lab partner.

### Exercise: Timing Sorting Algorithms

The file timing.Sorter.java contains multiple implementations of various sorting algorithms, including the bubble sort and insertion sort algorithms mentioned earlier. Each Sorter contains the following methods (some of which might be inherited from the abstract class Sorter).

• int[] getRandomArray(int numElems): Returns an array of the specified size filled with randomly generated values.
• void sort(int[] arr): Sort the given input array arr.
• List<Double> score(int by, int ntrials, int nrepeats): Conduct a timing experiment. This will involve sorting ntrials different size arrays starting at 5 and increasing in size by by each time. The experiments will then be ran nrepeats times.

Again you do not understand completely what each of these sorters is doing, but it might be interesting to see how each of them works. Pay close attention to the asymptotic runtimes of each of these different algorithms. The full list of algorithms contained and their asymptotic runtimes are as follows:

• BubbleSorter: This algorithm is as described above. On each iteration a forward pass is taken through the array or numbers, and any adjacent out of order elements are swapped. This continues until a forward pass is made and there are no swaps necessary. The time complexity for bubble sort is worst case $\Theta(N^2)$ where $N$ is the length of the array.
• WipingBubbleSorter: This algorithm is similar to the bubble sort described above, but WipingBubbleSorter will alternate between taking forward passes and backwards passes through the array to swap elements. This however does not change the time complexity which for the worst case is $\Theta(N^2)$ where $N$ is the length of the array.
• InsertionSorter: This algorithm will build a progressively larger sublist of sorted numbers, by continually inserting the next value in sorted sublist. The worst case runtime for insertion sort is also $\Theta(N^2)$ where $N$ is the length of the array.
• JavaSorter: This algorithm is Java's native sorting algorithm for arrays. The actual implementation of this algorithm is quicksort for primitive values and mergesort for Object values. You do not need to know what either of these algorithms do at this points, but the resulting time complexity is expected to be $\Theta(N \cdot \log(N))$ where $N$ is the length of the array.
• CountingSorter: This algorithm (at least as it is coded here) does not allow for all values of integers. We override the getRandomArray function to only contain values of a fixed size. This sort then counts each of the possible values, and fills in an array corresponding to the correct counts of each of the possible values. The time complexity for the worst case is $\Theta(N)$ where $N$ is the length of the array.

We would like to compare these sorters, and much as before with asymptotic analysis we are mainly concerned with how the speed of the algorithm changes as a function of the input size. The input size here will be the length of the array to be sorted. In order to do this we will be running a series of calls to each of the different sorting algorithms and then plotting the results in order to visualize how each of them appears to grow. Optional: If you are interested in seeing how we are plotting these in Java, feel free to check out the timing.GraphUtil.java file. We will not be expecting you to understand the code here, but again it might be interesting to look at.

The file timing.SortTiming contains code to generate a plot of the runtimes of the different sorting implementations when sorting different sizes of arrays. Again feel free to examine this code, but you are not expected to understand everything that is going on here. You should be able to treat this class as a black box that accepts three arguments (or sets them by default) corresponding to the number of different array sizes it should test (ntrials), the interval by which it should increase the array size between trials (by), and the number of times it should repeat each size array per sorting implementation (nrepeats). By default, the minimum size array is 5. When you run this program it will take some time to run all of the experiments then a nice plot should be opened on your computer.

For example, the commands

javac timing/SortTiming.java
java timing.SortTiming 100 50 10

uses the command-line arguments to plot all sorters' runtimes for 100 trials, each increasing by 50 elements, and repeating each trial 10 times (and averaging the result).

Try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of SortTiming and alter the ntrials, by, and nrepeats values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of ntrials, by, and nrepeats to see how the graphs change!

### Discussion: Timing Results

Once you have the visualization up and running, answer each of these questions with your lab partner!

• Is one sorting algorithm always faster than another?
• Above we said that BubbleSort, WipingBubbleSort, and InsertionSort each had the same $\Theta(N^2)$ asymptotic time complexity. How can you explain the differences in the plots for these three algorithms?
• What information can we gain from empirical analysis of algorithms which might not be as noticeable in asymptotical bounds?
• For any given sorting algorithm, does increasing the array size always mean the sorting takes longer?
• How does changing nrepeats change the plot?
• Is your plot the exact same as your partner's plot, even with the same values of ntrials, by, and nrepeats?
• Optional: Look at the source code for BubbleSorter to WipingBubbleSorter. After looking at the plots, can you intuitively explain why WipingBubbleSorter is usually 2x as fast as BubbleSorter? (Hint: Consider the immobility of some elements when the swapping passes are single directional (i.e. only going forward), and how this "Wiping" strategy helps deal with that issue.) Can you come up with an example that shows the difference in runtime?

We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the sort_timing.txt file in your lab8 directory.

Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.

### Amortization

In lecture you should have also learned about amortization. If you have not watched lecture or reviewed the slides please take some time to go through slides 8-13 or the rest of this section will not make as much sense. Still we will briefly explain amortization here to recap the main points.

If we look at a sequence of $N$ operations to be carried out for a specific algorithm (which might be an operation on a data structure), it is possible that each one of those operations differs in time complexity. Amortized time means that instead of considering the worst case time complexity for a given operation, we will instead look at the "average" time for each one of the operation. For certain instances, this can be more insightful than just assuming that each of the operations will run in worst case time. In this next exercise we will examine different schemas of resizeable arrays to see what the amortized performance of each of them is.

### Exercise: Timing Amortized Lists

The file timing.GrowList.java contains multiple implementations of a list that can grow as more elements are inserted into it. Each GrowList contains the following methods (some of which might be inherited from the abstract class GrowList).

• void add(int e): Inserts an int element into the given list. Each GrowList's behavior is described below.
• GrowList newList(): Creates a new GrowList, this is a method that is included to allow for easier creation of lists during tests.
• List<Double> score(int maxSize, int nLists): Conduct a timing experiment by inserting maxSize elements into nLists different lists. The output of this will be a list such that ith item is the time for the ith insertion. We use nLists different lists and average the performance over all of them to hopefully reduce noise in our data collection.
• List<Double> accumScore(int maxSize, int nLists): Conduct a timing experiment as specified above. The output of this will be a list such that ith item is the cumulative total of the average times of the first i insertions.

Unlike the sorters from above, you should be able to understand all of the code for each of the GrowList implementations that we have provided. After reading the below descriptions, spend some time with your partner to look at each of their implementations and discuss how each of them work. Once again, pay close attention to the asymptotic runtimes of each of these variants. The full list of implementations and their asymptotic runtimes are as follows:

• GeomGrowList: This is an array-backed list that resizes by doubling the size of the array when the array is fully filled up. The runtime for $N$ insertions into a GeomGrowList is $\Theta(N)$, and the amortized runtime of a single addition is $\Theta(1)$.
• ArithGrowList: This is an array-backed list that resizes by increasing the size of the array by one when the array is fully filled up. The runtime for $N$ insertions into a ArithGrowList is $\Theta(N^2)$, and the amortized runtime of a single addition is $\Theta(N)$.
• JavaGrowList: This is a wrapper to Java's ArrayList which is array-backed list that resizes by doubling the size of the array when the array is fully filled up. The runtime for $N$ insertions into a JavaGrowList is $\Theta(N)$, and the amortized runtime of a single addition is $\Theta(1)$.

The file timing.AmortizationTiming contains code to generate plots of the runtimes of the different GrowList implementations as many elements are added to each of them. Again feel free to examine this code, but you can again treat this class as a black box. The three arguments accepted (or set by default) are maxSize which specifies the number of elements to add, nLists which specifies the number of lists to test on, and accumulate which will specify whether you would like to visualize the accumulated runtimes or the per operation runtimes (accumulate should be passed as -accum or -noaccum, see below for more information). When you run this program it will take some time to run all of the experiments, and then a nice plot should be opened on your computer.

For example, the commands

javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -noaccum

uses the command-line arguments to plot all lists' runtimes per operation for 1024 additions into 1000 different lists (averaging the result). On the other hand the commands

javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -accum

uses the command-line arguments to plot the accumulated runtimes for the first i insertion for the same 1024 additions into the same 1000 lists.

Just like before try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of AmortizationTiming and alter the maxSize, nLists, and accumulate values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of maxSize, nLists, and accumulate to see how the graphs change!

### Discussion: Amortization Results

Once you have the visualization up and running, answer each of these questions with your lab partner!

• Is one GrowList implementation always better than the others?
• Why is the runtime for N insertions into a geometrically resizing list a $\Theta(N)$ operation?
• Why is the runtime for N insertions into an arithmetically resizing list a $\Theta(N^2)$ operation?
• How does the runtime per operation for the ArithGrowList compare to that of GeomGrowList and JavaGrowList? Specifically look at the non-accumulated plots and describe the trends for how long each operation takes as a function of how many elements have already been inserted in the list.
• When are there spikes in the per operation runtime graphs for each of the implementations? Do these make sense to you? Hint: some of these should and others might not. Empirical runtime can be quite messy and depends on machine specifics which will be revealed in other subsequent classes like CS61C.
• Optional: Try changing the code for GeomGrowList to resize by a different factor. How does this affect the theoretical asymptotic runtime? How does this effect the plotted runtime?
• Optional: Try changing the code for ArithGrowList to resize by adding a different fixed number of spots in the array. How does this affect the theoretical asymptotic runtime? How does this effect the plotted runtime?

We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the amortized_timing.txt file in your lab8 directory.

Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.

## C. Practice with Asymptotics and Maps

### Simple Map Introduction

For the next part of the lab we will shift gears and look at different implementations of the SimpleMap interface we have provided for you. A map is an abstract data type which maintains pairs of keys and associated values. You probably have used them before for many tasks in 61B this semester or previously in other classes (a Python dictionary is a map). The SimpleMap interface contains the following two following methods:

• void put(K key, V value): Inserts a new key-value pair into the map. If the key already exists, then the associated value is updated to the new value.
• V get(K key): Returns the value associated with a particular key.

### LinkedListMap

As the name suggests this implementation of SimpleMap is backed by a linked list. Unlike the nodes in previous linked lists we have looked at (IntList and IntDList), the MapNode elements of this linked list will contain both a key and a value. Optional: You will not be implementing any of the functionality of this class, but familiarize yourself with the code. Discuss with your partner what the best and worst case time is for the put and get operations are.

### Exercise: TreeMap

The TreeMap implementation of the SimpleMap class will be an implementation of the SimpleMap interface that is instead backed by a binary search tree. If you would like a refresher on binary search trees, please see the slides from lecture. We have provided the basic structure of this class for you which we will describe in more detail here. Similar to the LinkedListMap there will be a TreeMapNode inner class which will be the underlying recursive binary search tree structure. The TreeMap class has a _root variable which points to the root of the tree. The TreeMapNode class contains the following variables:

• TreeMapNode _left: Pointer to the left child of the TreeMapNode.
• TreeMapNode _right: Pointer to the right child of the TreeMapNode.
• K _key: The key for the key-value pair represented by the TreeMapNode.
• V _value: The value for the key-value pair represented by the TreeMapNode.

We have provided the put and the get functions for you and you should not need to edit these functions (although you may if you so choose as long as the tests still pass). The structure presented here is a common design pattern for classes which are wrapper classes for tree recursive data structures. The get and put functions themselves are not recursive as they operate on a TreeMap (which itself is not a recursive data structure). These methods call putHelper and getHelper which are recursive functions that take in TreeMapNodes and return TreeMapNodes.

Your task for this lab will be to fill in the recursive putHelper and getHelper functions which handle most of the tree related operations. The goal will be to complete these methods such that your completed TreeMap implementation will perform similarly to that of Java's java.util.TreeMap. Before beginning take a look at the code provided in the TreeMap and make sure you understand the structure. Next implement the two methods by filling in the // FIXMEs in the code. Finally discuss with your partner what the worst case and best case runtimes for each of these methods will be.

Hint: The keys of the map will be Comparable<K> which means that you should use the compareTo(K key) function in order to compare keys. This will be useful for traversing through the tree correctly.

### MapTest

We have provided a small set of unit tests to test both the LinkedListMap and the TreeMap that you will be implementing. There are three helper methods smallTestMap, fuzzTestMap (this test is a fuzz test), and timedFuzzTestMap. More information can be found in the comments of these methods. The fuzzTestMap and the timedFuzzTestMap might be different than other testing functions that you have seen before. Optional: Spend some time looking at each of these functions and try to understand what they are doing, discuss this with your partner. You might be able to take ideas from these tests to write new kinds of test for other code in your future!

The actual tests that will be ran on the autograder are as follows (all just call one of the above three testing functions).

• smallTestLinkedListMap: Runs smallTestMap with a LinkedListMap.
• smallTestTreeMap: Runs smallTestMap with a TreeMap.
• fuzzTestLinkedListMap: Runs fuzzTestMap with a LinkedListMap.
• fuzzTestTreetMap: Runs fuzzTestMap with a TreeMap.
• timedFuzzTestTreeMap: Runs timedFuzzTestMap with a TreeMap.

By default you should pass the LinkedListMap tests and fail the TreeMap tests. In order to get full credit for this section of the lab you must pass all 5 of these tests.

Optional: It might be fun to see in practice how much better we can do when we use a binary search tree over a linked list. Try writing a new test, or modifying an existing one to see what the ratio between the speed of the LinkedListMap and the java.util.TreeMap is.

Optional: The TreeMap that you have created here probably offers a good speedup over the LinkedListMap, and might even be close to that of the java.util.TreeMap however it might not be quite as good as it seems. In the tests that we have provided we insert random key-value pairs which has empirically been shown to result in roughly balanced tree structures. If we instead insert a large number of key-value pairs into our tree with keys that are in sorted order, then the performance of our tree might be decreased significantly. Discuss with your partner why this might be, and then see if you can verify this empirically by writing a new test or refactoring the existing code.

## D. Submission and Grading

### Deliverables

For full credit on the lab you must:

• Put sufficient effort into the responses in sort_timing.txt.
• Put sufficient effort into the responses in ammortized_timing.txt.
• Complete the putHelper and getHelper functions in map.TreeMap.java. In order to get full credit for this part of the lab all of the provided tests in map.MapTest.java must pass.
• Also remember to complete and submit the partner.txt if you worked with a partner. If you did not, you do not need to edit this file.

• You can earn .4 points for each of the Amortized and Sort Timing question tests. You can earn .4 points for passing 3 of the Map tests, .8 points for passing 4, and 1.2 points for passing all 5.

### Submission

You can submit the same as usual:

<git add / commit>
git tag lab8-0 # or the next highest submission number
git push
git push --tags