Due Date: Wednesday 3/18 11:59PM.
Navigation
- A. Intro
- B. Estimating Efficiency by Timing Functions
- C. Practice with Asymptotics and Maps
- D. Submission
A. Intro
In lecture you have been learning about complexity theory and how to determine the asymptotic bounds of functions. This practice has involved computing theoretical bounds, but it can be useful to tie this theory into real examples. This is exactly what you will be doing in this lab!
We consider some alternative ways of measuring the efficiency of a given code segment. Given a function f
, we want to find out how fast that function runs. One way of doing this is to take out a stopwatch and measure exactly how much time it takes for f
to run on some input. We will look at some plots of runtime to understand how some functions' runtimes change with the size of their input. However, we will also see that there are problems with using elapsed time for this purpose. These empirical measures are not meant to replace the better models for analyzing runtime complexity from lecture and discussion, but merely provide another perspective on the runtime of functions. You will conduct several experiments, discuss the results with your partner, and then record some of the answers from your discussion.
The next part of the lab will take a look at different implementations of maps. We have provided a complete implementation to a map which uses a linked list to store key-value pairs. You will be asked you to complete the implementation of a faster map which uses a binary search trees (BST) to store the key-value pairs. After implementing your tree map, you should be able to compare its performance to java.util
's implementation of a BST map, TreeMap
and hopefully see that the performance is similar (at least on randomly inserted points).
To get started pull the files for lab 8 from the skeleton.
git fetch shared
git merge shared/lab8 -m "Start lab8"
git push
B. Estimating Efficiency by Timing Functions
Algorithms
An algorithm is an abstract notion that describes an approach for solving a problem. The code we write in this class, our programs, are implementations of algorithms. For example, consider the problem of sorting a list of numbers. There are many different algorithms that will allow us to sort a list of numbers, each of which might have its own runtime. We will learn more about many of these sorting algorithms later in the class, but we will introduce a few of them briefly here.
One algorithm we might use to solve this problem is called bubble sort. Bubble sort tells us we can sort a list by repeatedly looping through the list and swapping adjacent items if they are out of order, until the entire sorting is complete. Another algorithm we might use to solve this problem is called insertion sort. Insertion sort says to sort a list by repeatedly looping through our list, removing the minimum element, and putting it into a new list in the correct order.
Several websites like VisuAlgo, Sorting.at, Sorting Algorithms, and USF have developed some animations that can help us visualize these sorting algorithms. Spend a little time playing around with these demos to get an understanding of how much time it takes for bubble sort or insertion sort to completely sort a list. Again we'll revisit sorting in more detail later in this course, but for now, try to get a feeling of how long each algorithm takes to sort a list. How many comparison does each sort need? And how many swaps? Can you guess the overall runtime? Don't spend too much time on this, we will revisit sorting in much more detail later in the class.
Since each comparison and each swap takes time, we want to know which is the faster algorithm. The runtime bounds that we compute using asymptotic analysis might not tell us this. Asymptotic bounds might tell us that two functions should run in time proportional to one another, but they would not tell us which of the two is actually faster. This is one case where it might be better to turn to empirical measures, despite their many potential problems.
Time Estimates
To start, it seems that the most reasonable way to estimate the time an algorithm takes is to measure the time directly. Each computer has an internal clock that keeps track of time, usually in the number of fractions of a second that have elapsed since a given base date. The Java method that accesses the clock is System.nanoTime()
. We have provided the timing.Timer
class which implements functions that behave like a stopwatch which we'll use to time different operations. You should not have to interact directly with the timing.Timer
class, but it might be useful to understand how it is implemented and what functionality it has. Spend a few minutes familiarizing yourself with the file and discuss what each part of this class seems to be doing with your lab partner.
Exercise: Timing Sorting Algorithms
The file timing.Sorter.java
contains multiple implementations of various sorting algorithms, including the bubble sort and insertion sort algorithms mentioned earlier. Each Sorter contains the following methods (some of which might be inherited from the abstract class Sorter
).
int[] getRandomArray(int numElems)
: Returns an array of the specified size filled with randomly generated values.void sort(int[] arr)
: Sort the given input arrayarr
.List<Double> score(int by, int ntrials, int nrepeats)
: Conduct a timing experiment. This will involve sortingntrials
different size arrays starting at 5 and increasing in size byby
each time. The experiments will then be rannrepeats
times.
Again you do not understand completely what each of these sorters is doing, but it might be interesting to see how each of them works. Pay close attention to the asymptotic runtimes of each of these different algorithms. The full list of algorithms contained and their asymptotic runtimes are as follows:
BubbleSorter
: This algorithm is as described above. On each iteration a forward pass is taken through the array or numbers, and any adjacent out of order elements are swapped. This continues until a forward pass is made and there are no swaps necessary. The time complexity for bubble sort is worst case $\Theta(N^2)$ where $N$ is the length of the array.WipingBubbleSorter
: This algorithm is similar to the bubble sort described above, butWipingBubbleSorter
will alternate between taking forward passes and backwards passes through the array to swap elements. This however does not change the time complexity which for the worst case is $\Theta(N^2)$ where $N$ is the length of the array.InsertionSorter
: This algorithm will build a progressively larger sublist of sorted numbers, by continually inserting the next value in sorted sublist. The worst case runtime for insertion sort is also $\Theta(N^2)$ where $N$ is the length of the array.JavaSorter
: This algorithm is Java's native sorting algorithm for arrays. The actual implementation of this algorithm is quicksort for primitive values and mergesort for Object values. You do not need to know what either of these algorithms do at this points, but the resulting time complexity is expected to be $\Theta(N \cdot \log(N))$ where $N$ is the length of the array.CountingSorter
: This algorithm (at least as it is coded here) does not allow for all values of integers. We override thegetRandomArray
function to only contain values of a fixed size. This sort then counts each of the possible values, and fills in an array corresponding to the correct counts of each of the possible values. The time complexity for the worst case is $\Theta(N)$ where $N$ is the length of the array.
We would like to compare these sorters, and much as before with asymptotic analysis we are mainly concerned with how the speed of the algorithm changes as a function of the input size. The input size here will be the length of the array to be sorted. In order to do this we will be running a series or calls to each of the different sorting algorithms and then plotting the results in order to visualize how each of them appears to grow. Optional: If you are interested in seeing how we are plotting these in Java, feel free to check out the timing.GraphUtil.java
file. We will not be expecting you to understand the code here, but again it might be interesting to look at.
The file timing.SortTiming
contains code to generate a plot of the runtimes of the different sorting implementations when sorting different sizes of arrays. Again feel free to examine this code, but you are not expected to understand everything that is going on here. You should be able to treat this class as a black box that accepts three arguments (or sets them by default) corresponding to the number of different array sizes it should test (ntrials
), the interval by which it should increase the array size between trials (by
), and the number of times it should repeat each size array per sorting implementation (nrepeats
). By default, the minimum size array is 5. When you run this program it will take some time to run all of the experiments then a nice plot should be opened on your computer.
For example, the commands
javac timing/SortTiming.java
java timing.SortTiming 100 50 10
uses the command-line arguments to plot all sorters' runtimes for 100 trials, each increasing by 50 elements, and repeating each trial 10 times (and averaging the result).
Try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of SortTiming
and alter the ntrials
, by
, and nrepeats
values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of ntrials
, by
, and nrepeats
to see how the graphs change!
Discussion: Timing Results
Once you have the visualization up and running, answer each of these questions with your lab partner!
- Is one sorting algorithm always faster than another?
- Above we said that
BubbleSort
,WipingBubbleSort
, andInsertionSort
each had the same $\Theta(N^2)$ asymptotic time complexity. How can you explain the differences in the plots for these three algorithms? - What information can we gain from empirical analysis of algorithms which might not be as noticeable in asymptotical bounds?
- For any given sorting algorithm, does increasing the array size always mean the sorting takes longer?
- How does changing
nrepeats
change the plot? - Is your plot the exact same as your partner's plot, even with the same values of
ntrials
,by
, andnrepeats
? - Optional: Look at the source code for
BubbleSorter
toWipingBubbleSorter
. After looking at the plots, can you intuitively explain whyWipingBubbleSorter
is usually 2x as fast asBubbleSorter
? (Hint: Consider the immobility of some elements when the swapping passes are single directional (i.e. only going forward), and how this "Wiping" strategy helps deal with that issue.) Can you come up with an example that shows the difference in runtime?
We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the sort_timing.txt
file in your lab8 directory.
Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.
Amortization
In lecture you should have also learned about amortization. If you have not watched lecture or reviewed the slides please take some time to go through slides 8-13 or the rest of this section will not make as much sense. Still we will briefly explain amortization here to recap the main points.
If we look at a sequence of $N$ operations to be carried out for a specific algorithm (which might be an operation on a data structure), it is possible that each one of those operations differs in time complexity. Amortized time means that instead of considering the worst case time complexity for a given operation, we will instead look at the "average" time for each one of the operation. For certain instances, this can be more insightful than just assuming that each of the operations will run in worst case time. In this next exercise we will examine different schemas of resizeable arrays to see what the amortized performance of each of them is.
Exercise: Timing Amortized Lists
The file timing.GrowList.java
contains multiple implementations of a list that can grow as more elements are inserted into it. Each GrowList
contains the following methods (some of which might be inherited from the abstract class GrowList
).
void add(int e)
: Inserts anint
element into the given list. EachGrowList
's behavior is described below.GrowList newList()
: Creates a newGrowList
, this is a method that is included to allow for easier creation of lists during tests.List<Double> score(int maxSize, int nLists)
: Conduct a timing experiment by insertingmaxSize
elements intonLists
different lists. The output of this will be a list such thati
th item is the time for thei
th insertion. We usenLists
different lists and average the performance over all of them to hopefully reduce noise in our data collection.List<Double> accumScore(int maxSize, int nLists)
: Conduct a timing experiment as specified above. The output of this will be a list such thati
th item is the cumulative total of the average times of the firsti
insertions.
Unlike the sorters from above, you should be able to understand all of the code for each of the GrowList
implementations that we have provided. After reading the below descriptions, spend some time with your partner to look at each of their implementations and discuss how each of them work. Once again, pay close attention to the asymptotic runtimes of each of these variants. The full list of implementations and their asymptotic runtimes are as follows:
GeomGrowList
: This is an array-backed list that resizes by doubling the size of the array when the array is fully filled up. The runtime for $N$ insertions into aGeomGrowList
is $\Theta(N)$, and the amortized runtime of a single addition is $\Theta(1)$.ArithGrowList
: This is an array-backed list that resizes by increasing the size of the array by one when the array is fully filled up. The runtime for $N$ insertions into aArithGrowList
is $\Theta(N^2)$, and the amortized runtime of a single addition is $\Theta(N)$.JavaGrowList
: This is a wrapper to Java'sArrayList
which is array-backed list that resizes by doubling the size of the array when the array is fully filled up. The runtime for $N$ insertions into aJavaGrowList
is $\Theta(N)$, and the amortized runtime of a single addition is $\Theta(1)$.
The file timing.AmortizationTiming
contains code to generate plots of the runtimes of the different GrowList
implementations as many elements are added to each of them. Again feel free to examine this code, but you can again treat this class as a black box. The three arguments accepted (or set by default) are maxSize
which specifies the number of elements to add, nLists
which specifies the number of lists to test on, and accumulate
which will specify whether you would like to visualize the accumulated runtimes or the per operation runtimes (accumulate
should be passed as -accum
or -noaccum
, see below for more information). When you run this program it will take some time to run all of the experiments, and then a nice plot should be opened on your computer.
For example, the commands
javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -noaccum
uses the command-line arguments to plot all lists' runtimes per operation for 1024 additions into 1000 different lists (averaging the result). On the other hand the commands
javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -accum
uses the command-line arguments to plot the accumulated runtimes for the first i
insertion for the same 1024 additions into the same 1000 lists.
Just like before try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of AmortizationTiming
and alter the maxSize
, nLists
, and accumulate
values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of maxSize
, nLists
, and accumulate
to see how the graphs change!
Discussion: Amortization Results
Once you have the visualization up and running, answer each of these questions with your lab partner!
- Is one
GrowList
implementation always better than the others? - Why is the runtime for
N
insertions into a geometrically resizing list a $\Theta(N)$ operation? - Why is the runtime for
N
insertions into a arithmetically resizing list a $\Theta(N^2)$ operation? - How does the runtime per operation for the
ArithGrowList
compare to that ofGeomGrowList
andJavaGrowList
? Specifically look at the non-accumulated plots and describe the trends for how long each operation takes as a function of how many elements have already been inserted in the list. - When are there spikes in the per operation runtime graphs for each of the implementations? Do these make sense to you? Hint: some of these should and others might not. Empirical runtime can be quite messy and depends on machine specifics which will be revealed in other subsequent classes like CS61C.
- Optional: Try changing the code for
GeomGrowList
to resize by a different factor. How does this effect the theoretical asymptotic runtime? How does this effect the plotted runtime? - Optional: Try changing the code for
ArithGrowList
to resize by adding a different fixed number of spots in the array. How does this effect the theoretical asymptotic runtime? How does this effect the plotted runtime?
We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the amortized_timing.txt
file in your lab8 directory.
Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.
C. Practice with Asymptotics and Maps
Simple Map Introduction
For the next part of the lab we will shift gears and look at different implementations of the SimpleMap
interface we have provided for you. A map is an abstract data type which maintains pairs of keys and associated values. You probably have used them before for many tasks in 61B this semester or previously in other classes (a Python dictionary is a map). The SimpleMap
interface contains the following two following methods:
void put(K key, V value)
: Inserts a new key-value pair into the map. If the key already exists, then the associated value is updated to the new value.V get(K key)
: Returns the value associated with a particular key.
LinkedListMap
As the name suggests this implementation of SimpleMap
is backed by a linked list. Unlike the nodes in previous linked lists we have looked at (IntList
and IntDList
), the MapNode
elements of this linked list will contain both a key and a value. Optional: You will not be implementing any of the functionality of this class, but familiarize yourself with the code. Discuss with your partner what the best and worst case time is for the put
and get
operations are.
Exercise: TreeMap
The TreeMap
implementation of the SimpleMap
class will be an implementation of the SimpleMap
interface that is instead backed by a binary search tree. If you would like a refresher on binary search trees, please see the slides from lecture. We have provided the basic structure of this class for you which we will describe in more detail here. Similar to the LinkedListMap
there will be a TreeMapNode
inner class which will be the underlying recursive binary search tree structure. The TreeMap
class has a _root
variable which points to the root of the tree. The TreeMapNode
class contains the following variables:
TreeMapNode _left
: Pointer to the left child of theTreeMapNode
.TreeMapNode _right
: Pointer to the right child of theTreeMapNode
.K _key
: The key for the key-value pair represented by theTreeMapNode
.V _value
: The value for the key-value pair represented by theTreeMapNode
.
We have provided the put
and the get
functions for you and you should not need to edit these functions (although you may if you so choose as long as the tests still pass). The structure presented here is a common design pattern for classes which are wrapper classes for tree recursive data structures. The get
and put
functions themselves are not recursive as they operate on a TreeMap
(which itself is not a recursive data structure). These methods call putHelper
and getHelper
which are recursive functions that take in TreeMapNode
s and return TreeMapNodes
.
Your task for this lab will be to fill in the recursive putHelper
and getHelper
functions which handle most of the tree related operations. The goal will be to complete these methods such that your completed TreeMap
implementation will perform similarly to that of Java's java.util.TreeMap
. Before beginning take a look at the code provided in the TreeMap
and make sure you understand the structure. Next implement the two methods by filling in the // FIXME
s in the code. Finally discuss with your partner what the worst case and best case runtimes for each of these methods will be.
Hint: The keys of the map will be Comparable<K>
which means that you should use the compareTo(K key)
function in order to compare keys. This will be useful for traversing through the tree correctly.
MapTest
We have provided a small set of unit tests to test both the LinkedListMap
and the TreeMap
that you will be implementing. There are three helper methods testSmallMap
, fuzzTestMap
(this test is a fuzz test), and timedFuzzTestMap
. More information can be found in the comments of these methods. The fuzzTestMap
and the timedFuzzTestMap
might be different than other testing functions that you have seen before. Optional: Spend some time looking at each of these functions and try to understand what they are doing, discuss this with your partner. You might be able to take ideas from these tests to write new kinds of test for other code in your future!
The actual tests that will be ran on the autograder are as follows (all just call one of the above three testing functions).
smallTestLinkedListMap
: RunssmallTestMap
with aLinkedListMap
.smallTestTreeMap
: RunssmallTestMap
with aTreeMap
.fuzzTestLinkedListMap
: RunsfuzzTestMap
with aLinkedListMap
.fuzzTestTreetMap
: RunsfuzzTestMap
with aTreeMap
.timedFuzzTestTreeMap
: RunstimedFuzzTestMap
with aTreeMap
.
By default you should pass the LinkedListMap
tests and fail the TreeMap
tests. In order to get full credit for this section of the lab you must pass all 5 of these tests.
Optional: It might be fun to see in practice how much better we can do when we use a binary search tree over a linked list. Try writing a new test, or modifying an existing one to see what the ratio between the speed of the LinkedListMap
and the java.util.TreeMap
is.
Optional: The TreeMap
that you have created here probably offers a good speedup over the LinkedListMap
, and might even be close to that of the java.util.TreeMap
however it might not be quite as good as it seems. In the tests that we have provided we insert random key-value pairs which has empirically been shown to result in roughly balanced tree structures. If we instead insert a large number of key-value pairs into our tree with keys that are in sorted order, then the performance of our tree might be decreased significantly. Discuss with your partner why this might be, and then see if you can verify this empirically by writing a new test or refactoring the existing code.
D. Submission and Grading
Deliverables
For full credit on the lab you must:
- Put sufficient effort into the responses in
sort_timing.txt
. - Put sufficient effort into the responses in
ammortized_timing.txt
. - Complete the
putHelper
andgetHelper
functions inmap.TreeMap.java
. In order to get full credit for this part of the lab all of the provided tests inmap.TestMap.java
must pass. - Also remember to complete and submit the
partner.txt
if you worked with a partner. If you did not, you do not need to edit this file.
Submission
You can submit the same as usual:
<git add / commit>
git tag lab8-0 # or the next highest submission number
git push
git push --tags