Navigation
A. Preliminaries
As usual, you can retrieve the files for this lab (after first using
git status
and git tag
to make sure you have cleaned up,
committed, and pushed everything that needs it) with
$ git fetch shared
$ git checkout -b lab3 shared/lab3
$ git push -u origin lab3
Some of you prefer to keep everything in a single branch (say
master). Fortunately, our software doesn't particularly care about
branches, as long as you tag the right things. To add
shared/lab3
into a common branch that you have already set up and
pushed (I suggest using master for
this), simply check out that branch (if it isn't already) and then use
$ git fetch shared
$ git merge shared/lab3
$ git push
instead of the sequence above. DO NOT do both of these for the same assignment!! You'll cause chaos.
When you are ready to submit your work, use the usual procedure, whichever of the methods you used to start it. First, commit and push everything, and then use
$ git tag lab3-1 # Or whatever sequence number you get to
$ git push
$ git push --tags
If you already set up and pushed the lab3 branch and now want to download it to a different local repository (you started on the instructional machines and now want to work entirely on your own laptop) then go into this other repository (presumably cloned from your central cs61b repository) and (after committing anything you already have there) type
$ git fetch origin
$ git checkout -b lab3 origin/lab3
which will create a local lab3
branch on this other repository, also
tracking your central repository.
B. IntDLists
In class, you've seen what is called a singly linked list,
IntList
. The term "singly linked" refers to the single pointer (or link),
.tail
, used to traverse the elements of the list. For this
problem, we'll define a new data structure, the IntDList
, which uses a
doubly linked list.
The IntDList
class uses another, internal, data structure, called a
DNode
, to actually house its data. Each DNode
has two links,
one to the next DNode
in the sequence (as for IntList
) and one to
the previous one. The IntDList
itself contains two pointers, one to
the first DNode
in the chain and one to the last. Here, for
example, is how we could represent the sequence [3, 14, 15]
:
As you can see, unlike IntList
, there's an extra level of
indirection at work here. The user (or client) of IntDList
doesn't see the actual DNode
s that hold the data. This has various
advantages. One of them is that the client can now do operations that modify
the list without having to return and reassign its value (that is, we
can use void
methods to insert and delete things). That's because
we never change the pointer to the IntDList
itself; we only modify
its fields.
Among the skeleton files, you'll find IntDList
, with some obvious
things for you to fill in. The IntDListTest
class will test them
for you.
C. Collections
So far, we've seen fairly primitive types for representing lists of things.
Arrays:
- Advantages: fast random access to items.
- Disadvantages: hard to insert items or delete items.
Hand-made linked lists (e.g., IntList, IntDList):
- Advantages: easy to expand, insert, delete.
- Disadvantages: random access (as opposed to in-sequence access) is slow, requires more space (for pointers) than arrays, and pointer manipulation can be tricky.
Both allow us to represent a sequence of objects, but their syntax is very different, so it's hard to switch from one to the other if you change your mind.
The Java library has types to represent collections of objects and save us from the burden of implementing such collections ourselves, including:
- Lists (sequences) of objects:
ArrayList
,LinkedList
. - Sets (like lists, but with no order and disallowing duplicates)
of objects:
TreeSet
,HashSet
. - Maps* (a.k.a. associative arrays, symbol tables, and
dictionaries):
TreeMap
,HashMap
.
All of the collections above are found in the package java.util
. A
package is a set of (supposedly related) classes and subpackages.
As elsewhere in Java, the notation java.util.ArrayList
means "The
class named ArrayList
in the (sub)package named util
in the
package named java
". To make your life easier, you'll want to import
them before you can use them:
import java.util.ArrayList;
import java.util.LinkedList;
(otherwise, you have to write java.util.ArrayList
everywhere instead
of just ArrayList
).
For today's lab, we'll familiarize ourselves with how to use lists and sets, which you'll be welcome to use on any future homeworks, projects, labs, discussion worksheets, and exams, except where explicitly prohibited.
For example, the following code creates an ArrayList
L
of integers
and adds 5 and 23 to it:
ArrayList<Integer> L = new ArrayList<>();
L.add(5);
L.add(23);
In the code above, don't worry too much about the mysterious
<Integer>
or <>
syntax. We'll learn about this in the coming weeks —
for now, just accept that somehow this means our L
will contain integers.
Since the LinkedList
class is supposed to represent essentially the
same abstraction (a numbered list of items), it has almost the same
API (Application Programming Interface) as ArrayList
. For our
purposes today, that means it supports almost the same methods. This
makes it easy to change back and forth between an ArrayList
and a LinkedList
.
For our toy example, since LinkedList
also has an
add method, we could easily change our code to read:
LinkedList<Integer> L = new LinkedList<>();
L.add(5);
L.add(23);
That is, only the declaration of L
changes.
It is nice that the Java designers gave us many implementations of the same collection, since each of these implementations has its own strengths and weaknesses (we'll see a glimpse of these tradeoffs in today's lab). However, with what we know so far, we run into a potential problem.
Imagine that Alice and Bob are working on a project together. Alice
needs Bob to write a method readList
that reads information and
returns it as some kind of list. Alice doesn't care much what kind of
list it is, since her application is just going to read it in order, a
task at which every list implementation excels. Bob, on the other
hand, isn't 100% sure what kind of list he needs for his task.
With what we know so far, Alice would have to guess what Bob is going to end up deciding and write either:
ArrayList<Integer> myList = readList();
or
LinkedList<Integer> myList = readList();
This is a huge pain. One potential fix is for Bob stop being such a slacker and get his work done so Alice can use his code, but as we all know, this is just not going to happen. Bob is just that kind of a guy. Through a mechanism known as an interface (upcoming topic) we can avoid this problem.
Java provides not just ArrayList
and LinkedList
types, but also
a type called List
, of which ArrayList
and LinkedList
are said
to be implementations. List
is not a class, but instead an
interface. There are interfaces Set
and Map
in
java.util
as well. What this means is that Alice can actually just
write:
List<Integer> myList = readList();
In an upcoming lecture, we'll discuss about how a Java interface is an abstraction of an API (usually without any implementation) that may be shared by many classes. By specifying such an interface as the type of a variable, you get a program that is able to accommodate any of the implementing classes without having to change most of the program. For today, we just want you to know how to use interfaces and implementation classes.
Let's get started with a concrete example by considering the
program in Dups1.java
, provided with this lab. Dups1 goes through the
provided input, identifies all duplicates, and then prints all
duplicates to the screen. It takes a single command line argument that
tells it whether to use a linked list or an array list to solve this
problem.
Compile this program and execute it with
java Dups1 linked < magus.txt
java Dups1 arrays < magus.txt
You should find that the results of the two runs are identical. Look
at the code, and observe that the duplicates
method, which actually
does the work of finding duplicates, doesn't actually know whether
it's using a linked list or an array list. It just tells the List
to
do its job, and since the LinkedList
and ArrayList
classes
implement the List
class, it can make the exact same calls to both.
ListIterators (Part 1)
Look at the duplicates
function and get a feeling for how it works.
Observe that it involves repeated calls to the
get(i)
method of the List, which returns the ith item of a list.
Think back to the IntDList
class that you implemented above. The get
operation in that class has to crawl along a potentially
long list, whereas getting any element of an array can be
done trivially by looking at only one item total.
Since LinkedList
is essentially a more-elaborate version of IntDList
under
the hood, its get(i)
method is slow for the same reason.
Let's do our first computational experiment to measure performance. Try the Unix time commands
time java Dups1 linked < hammurabi110.txt
time java Dups1 array < hammurabi110.txt
Compare the results. (We've also set up 'make time' to run these and a couple of other timings. You might want to look at Makefile to how that is done).
In this case, the problem isn't the LinkedList
class itself, but the
bone-headed way in which we're trying to use it. Supposing L
is a
LinkedList
, it's rather silly to be making the sequence of
calls L.get(65)
, L.get(66)
, L.get(67)
, and
so forth. Under the hood (this is not an obvious fact), the LinkedList
is seeking all the way to item #65 and returning it, and then when we
tell it to get item #66, it starts all the way over from the beginning
instead of starting from 65.
Our fundamental problem is that we are using indices to iterate
through the LinkedList
using get()
instead of letting it handle
the iteration using abstractions that are built specifically around
iteration.
As a warmup to the rather tricky next part of this lab, we'll use another interfaces build to supports iteration. The interface types Iterator and ListIterator describe "moving fingers" that effectively point at elements in the middle of one of the collection types from the Java library and allow a program to move through the collection sequentially, regardless of how the collection is actually implemented. Iterators move only forward, and ListIterators move in either direction (and make sense only for Lists, which have an order).
As an example of code using a ListIterator, see ListTour.java
. Read
and run the code, and make sure you understand
quickIteratorExamples
. Then fill out the printTour
method as
described. If you want to fix the style errors, feel free to delete
the quickIteratorExamples code, which was provided only for
pedagogical purposes.
ListIterators (Part 2)
The program Dups2.java
uses the ListIterator
abstraction to get
around the problem in Dups1.java
and puts the two implementations on a
more nearly equal footing, albeit in a rather obtuse way.
Again time the two programs:
time java Dups2 linked < hammurabi110.txt
time java Dups2 array < hammurabi110.txt
If you haven't already, read and understand both programs. The
continue
keyword (covered in the reading, but not lecture) means to
terminate the current iteration of the current loop and move on to the
next one immediately.
You may find to be useful the Java library documentation for the java.util.ArrayList and java.util.LinkedList classes and the java.util.List and java.util.ListIterator interfaces.
We'll finish off this section of the lab by finally having you do some coding: Clean up our asinine implementation of Dups2. The Dups2.duplicates method uses two int variables, m and n, to keep track of positions in the list L (and stops once we have scanned back to the position of element we are checking for duplication).
By making better use of the ListIterator interface, we can remove both
of these variables. Copy and paste Dups2.java to Dups3.java, and
modify the duplicates
method so that duplicates does not use
any integer variables. Your code should end up looking much
more elegant than Dups2.java. If you're stuck make sure to see the
java.util.ListIterator
documentation.
We're throwing you in the deep end here with strange new entitites.
There are weird symbols and syntactical strangenesses like the angle
brackets in ListIterator<String>
, but by doing some pattern
matching, you should be fine — we'll learn these new things later.
Don't be afraid to experiment with method calls you aren't sure will
work (recall that you can use the hw command to restore older copies
of committed work in case you find yourself boxed in). Try and cajole
your neighbors into discussion for more efficient progress. This will
probably seem very hard because of all the new syntax, so collaborate
away!
Utilities
Currently, the list of duplicate words gets printed in whatever order the words occur in the text. By adding one short line to the program, you can arrange for the list to be printed in sorted order. Figure out how to do this (there is a hint about where to look towards the top of the file) and change Dups3.java accordingly.
Optional: With a bit more searching (this time in the documentation for java.lang.String), you may be able to find out how to get the sort to ignore the case of letters, instead of putting all capital letters before all lower-case letters. If you do this, modify Dups3.java accordingly.
Sets
We used an ArrayList called result
to store our result in Dups1
through Dups3
. To avoid having duplicate duplicates in result
, we
used the result.contains()
method to continue
to the next
iteration whenever a redundant duplicate was found.
The problem is that the contains
method scans the ArrayList
sequentually. While this is not much of a problem for the small inputs
we've been using, it can cause difficulties with larger ones.
The Java library provides collections that are faster for the purpose
of collecting sets of objects without duplicates, namely the TreeSet
and HashSet
. Create a new program Dups4 that uses a
java.util.TreeSet for the storing results in the .duplicate method.
See the TreeSet
documentation for a list of methods supported by the TreeSet
class.
You'll only need a couple of the methods from the list. In case you
are tempted by dark forces beyond my comprehension, do not try to read
the entirety of the TreeSet
documentation.
To make things a little more interesting, you must still return the
result as a List<String>
, which a TreeSet<String>
is not. Hint: A
TreeSet
is a kind of Collection<String>
. Take a look through the
documentation of ArrayList.