Lab 5: OOP with Collections, Iterators, and Iterables

Due Date: Friday 2/21 11:59PM. Labs are always due Friday the week they are released unless otherwise stated.

In this lab we will be giving you a small taste of the Java Standard Library as a means for understanding Object Oriented Programming. In particular, we will be looking at the Collection, Iterators, and Iterable interfaces and several classes which implement these. Collections encompass many of the data structures that you will be working with this semester (we will later talk about how they are implemented, but for now we will just use them). Iterators are objects that control iteration through items in a collection via two methods: hasNext (are there more items yet to be iterated over?), and next (return the next item in the iteration). Iterators are provided for every class that implements Collection in java.util. More generally, they provide a mechanism for unifying the operations of processing an array in several formats, handling input, and traversing more complicated structures.

We will first introduce these interfaces and some of their subclasses, and then you will work with them in a few exercises. By the end of this lab, you should be comfortable with searching through the Java Standard Library documentation and with using Collection, Iterators, and Iterables.

## A. Collections

### The Collection Interface

The interface Collection represents any collection of data. As above, most of the data structures we will study in the rest of this class are used to represent collections of data. At the most general level, pretty much anything you use to store multiple items at once is going to fufill the requirements to be a Collection. The most commonly used collections are either a Set or a List, but there are many others. The hierarchy of all Collection classes is organized using interfaces! As you will see soon, what that means is that sets and lists in Java are represented as interfaces that extend the Collection interface. A representation of this hierarchy can be found below (image sourced from here).

At the root of the organization is the Collection interface. This interface specifies methods like isEmpty(), add(Object o), and many others. All collections must implement Collection in one way or another (i.e. they could directly implement it or they might extend / implement another class which implements Collection). For example, you have probably already used ArrayList or LinkedList which implement the List interface which is itself extended from the Collection interface.

These collections and more are found in the package java.util. A package is a set of (supposedly related) classes and subpackages. As elsewhere in Java, the notation java.util.ArrayList means "The class named ArrayList in the (sub)package named util in the package named java". To make your life easier, you'll want to import them before you can use them as follows:

import java.util.ArrayList;
import java.util.LinkedList;

(otherwise, you will have to write java.util.ArrayList everywhere instead of just ArrayList). Luckily for you, IntelliJ will typically import these for you when you type them into the editor.

### The Set Interface

A set is a group of items with no duplicates. The Set interface does not require ordering (e.g. HashSet), but sets can be ordered (TreeSet, ConcurrentSkipListSet). Sets support (at least) the following operations:

• add(E e): Adding an element to the set if it's not already there.
• remove(Object o): Removing an element from the set.
• contains(Object o): Checking whether a given item is in the set.
• isEmpty(): Checking whether the set is empty.

While a class implementing the Set interface must be able to support the above operations, the actual implementation and manipulation of the data is up to the class implementing the Set interface. Regardless of how any subclass of Set is defined you will be able to use it in the same way which is the beauty of Interfaces / Object Oriented Programming. All of these Set subtypes fulfill the requirement of being a Set. Discuss with your lab partner how could you represent a set of nonnegative integers. (Hint: How could you use a boolean array?)

It's important to realize that a Set, although a subtype of Collection interface, is itself an interface. Yes, an interface can extend another interface. It is defined like this:

public interface Set extends Collection {
...
}

Note: Interfaces extend other interfaces, while classes implement interfaces.

### The List Interface

A list is an ordered group of items (also known as a sequence). The List interface therefore differs from Set in that elements have positions within a List and duplicates are allowed. Lists support (at least) the following operations:

• add(E e): Adding an element at the end of the list.
• add(int index, E e): Adding an element at a given position.
• remove(Object o): Removing the first occurence of an element in a list.
• remove(int index): Removing the element at a given position.
• contains(Object o): Checking whether a given item is in the list.
• get(int index): Identifying the value at a given position in the list.
• isEmpty(): Checking whether the list is empty.

The implementation of a list normally involves storing the elements of the list explicitly. For example one might use an array whose 0th element is the first list item, and so forth. You have worked with and implemented the IntList and IntDList classes. These themselves did not extend either List or Collection but give you an idea of what an implementation of a List might look like.

Similar to Set interface, List is an interface that might be defined like this:

public interface List extends Collection {
...
}

### Collections Example

Now that we have introduced these interfaces, let's take a closer look at the declaration, instantiation, and utilization of Collections. For example, to instantiate a List of ints, add the 5 to the list, then get it back, we might write:

List<Integer> s;
s = new ArrayList<Integer>();
int x = s.get(0); /* gets the five back */

There are a few funny things about the syntax above:

• Angle bracket notation is used to denote the type of the item that goes in the collection. These are refered to as generics, and allow for Collections to contain any type of object. Discuss with your partner why this might be useful. (Hint: Think about our IntList from before. What would we have to do if we wanted to make a similar list that stores Strings). For now at least, you should always use angle brackets for both the declaration and the instantiation. We'll see later in this lab and in this course that angle bracket notation can be used in situations other than using a Collection.
• If you're creating a Collection of primitives (ints, doubles, booleans, etc.) you should use the wrapper type (e.g. Integer, Double, Boolean) instead of the primitive type.
• The retrieval method of our List returns an Integer, but thanks to Java's autoboxing functionality, the value we want is automatically unwrapped into an int. For example, the following is valid Java code int x = new Integer(5).

Also note that we declared s to be a variable that points to a List object, however the actual object we created was specifically an ArrayList. Recall that List refers to an interface, and so it is not actually possible to create a List object. Instead, we had to choose a concrete class that actually implemented List, in this example the Java ArrayList class. Recall that a concrete class is a non-abstract class that implements all the methods it potentially inherits from any abstract superclasses or interfaces.

Since the LinkedList class is supposed to represent essentially the same abstraction (a numbered list of items), it has almost the same API (Application Programming Interface) as an ArrayList. For our purposes today, that means it supports almost the same methods. This makes it easy to change back and forth between an ArrayList and a LinkedList. The almost qualifiers have been added because LinkedList also implements the Deque interface which means it has a few more methods available than just a List.

For our toy example, since LinkedList also has an add and get method, we could easily change our code to read:

List<Integer> s;
int x = s.get(0); /* gets the five back */

That is, only the instantiation of s changes. It is nice that the Java designers gave us many implementations of the same collection, since each of these implementations has its own strengths and weaknesses. As we continue to discuss data structures, you will begin to see reasons why we might choose one implementation over another.

## B. Iterators and Iterables

### The Iterator Interface

An Iterator is an interface which defines the functionality to traverse the values in a collection of objects (here we mean the abstract notion of a collection, not necessarily Collection as defined in Java). In other words you can have an Iterator which iterates over objects stored in something that is not a collection (you will see this later in this lab). An Iterator will yield the elements one at a time through the Iterator API. This interface is rather succint and contains the following four methods. Note: In this lab and the rest of this class, we will almost exclusively use only the first two methods below.

• boolean hasNext(): Returns true if the iteration has more elements. (In other words, returns true if next() would return an element rather than throwing an exception.)
• E next(): Returns the next element in the iteration. It's worth noting that it is also typically used with a call to hasNext() beforehand, thus ensuring that the Iterator does indeed have a next value to return. If there are no more elements to remaining, it is common practice to throw a NoSuchElementException.
• default void remove(): Because this is declared as a default method in the interface, it need not be overriden and as such it is rarely implemented in this class and elsewhere. The default implementation throws an UnsupportedOperationException and performs no other action, thus this is an optional operation. If implemented the method removes from the underlying collection the last element returned by this iterator. This method can be called only once per call to next(). The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.
• default void forEachRemaining(Consumer<? super E> action): Also declared as default and thus does not need to be implemented. This will never be overridden for Iterators you will be asked to make. Performs the given action for each remaining element until all elements have been processed or the action throws an exception. Actions are performed in the order of iteration, if that order is specified. Exceptions thrown by the action are relayed to the caller.

While in principle a collection could itself implement the iterator interface, this might result in very strange code. We can see that this is true by considering the interfaces abstractly. A collection itself is not an iterator, rather an Iterator is some object that iterates over a collection! Collections that wish to support iteration typically provide an iterator() method that returns an appropriate Iterator object. This can be used to iterate over the collection (this is a part of the Iterable interface that will be described in more detail below). Consequently, the collection will now have an iterator that is accessible by calling iterator().

For example, if L is a List<String>, you could write

for (Iterator<String> i = L.iterator(); i.hasNext();) {
String value = i.next();
System.out.print(value + " ");
}

This code would print out each item in the list, one at a time. We could have instead of written the code using a while loop as follows:

Iterator<String> i = L.iterator();
while (i.hasNext()) {
String value = i.next();
System.out.print(value + " ");
}

### The Iterable Interface

Iterating through interfaces using next and hasNext would be tedious to write every time (as we have written the code in the above two loops), and thus Java has introduced a special syntax for this Java idiom known as the Iterable interface. If L has a type that implements Iterable<String> (as List<String> does), then the loops above may be written using a loop called an enhanced for loop. This will be more similar to the loops that you might have seen before in Python. These loops are also known as for each loops. The example below could be read in English as "for each String value in L, print value."

for (String value : L) {
System.out.print(value + " ");
}

The Iterable interface is really simple. A class that implements it must simply provide an iterator method that returns an Iterator.

package java.lang;
public interface Iterable<Value> {
/** Returns an iterator over my values. */
Iterator<Value> iterator();
}

### Iterators and Iterables Example

Here's an example of how Iterable and Iterator are used in practice. Suppose we had a class FixedSizeList that represents a list that stores its values in an integer array. Remember that we ask the FixedSizeList for an Iterator instance through the iterator() method. This then returns a FixedSizeListIterator, which is an inner class we've defined to iterate through our FixedSizeList.

import java.util.Iterator;

public class FixedSizeList implements List<Integer>, Iterable<Integer> {

// instance variables
protected int[] values;   // list elements
int count;                // number of array cells used by list

// private Iterator class
private class FixedSizeListIterator implements Iterator<Integer> {

int nextIndexToReturn = 0;    // index of next item to return by iterator

public Integer next() {
int valToReturn = values[nextIndexToReturn];
nextIndexToReturn++;
return valToReturn;
}

public boolean hasNext() {
return nextIndexToReturn < count;
}
}

public Iterator<Integer> iterator() {
return new FixedSizedListIterator();
}

...
}

We could build a FixedSizeList as follows.

public static void main(String [] args) {
FixedSizeList list = new FixedSizeList(10);
// list now contains the integers 5, 3;
// thus values[0] is 5, values[1] is 3,
// and count is 2.
...
}

Then we can use an iterator.

Iterator<Integer> iter = list.iterator();
int m;
m = iter.next();
// m now contains 5 and nextIndexToReturn contains 1
m = iter.next();
// m now contains 3 and nextIndexToReturn contains 2

At this point, another call to next would be invalid since nextIndexToReturn is past the end of the list values. The code maintains an important invariant: prior to any call to next, nextIndexToReturn contains the index of the next value in the list to return.

Finally, the values in a FixedSizeList can be iterated over using an enhanced for loop like so:

for (Integer i : list) {
...
}

## C. Practice with Object Oriented Programming

### Introduction to Table Exercise

For the remainder of the lab we will be dealing with a simple database system to give you practice with Object Oriented Programming and some of the above interfaces we discussed (mainly Iterator). Before beginning to write any code, it will be helpful to understand the structure of the classes that we have provided for you, so that you will be able to work with it. You are not required to read or understand all of the implementations of the methods, but it may be helpful or interesting to look at some of them. Instead you should be able to rely on the descriptions of the classes here or the comments in the code to explain how other methods work.

### Table

The main class that represents a database is Table.java. A Table represents a single database table, which can be read in from file in the format of a csv (comma separated values file). The operations are somewhat limited, but the functionality supported includes getting rows from the Table by index, iterating through the rows of a table, filtering rows from a Table using a TableFilter, and creating Tables from the cartesian product or cross join of two input Tables. The methods and constructors contained within the Table class are:

• private Table(): Initialize a Table without a header or any rows. For internal use only.
• public Table(String file): Initialize a Table from a file.
• private void initColumnMap(String headerRow) / private void initColumnMap(List<String> headerList): Initialize a mapping from column name to column index for a Table.
• private void addRow(String dataRow) / private void addRow(TableRow row): Add a row to a Table. Errors if the data is not the correct size.
• private String headerRow(): Returns a string representation of a Table's header.
• public int colNameToIndex(String colName): Returns the int corresponding to the column named colName.
• public int numColumns(): Returns the number of columns.
• public List<String> headerList(): Returns the list of columns in a Table, in the correct order.
• public TableRow getRow(int i): Returns the ith row of a Table.
• public static Table join(Table t1, Table t2): Returns the result of doing a cross join on two tables.
• public static Table filter(TableFilter filter): Returns the result of doing a filtering a table using filter.

### Table.TableRow

The inner class of TableRow represents a single row of data. The implementation of Table is essentially just a List of TableRows. Similarly a TableRow is essentially just a List of stings. The following methods and constructors are contained within the TableRow class:

• public TableRow(List<String> data): Initialize a TableRow from the given data.
• public String getValue(int i): Returns the ith value in a TableRow.
• public static TableRow joinRows(TableRow tr1, TableRow tr2): Returns the TableRow which is the result of joining two TableRows.
• public int size(): Return the size of a TableRow.

### TableFilter

A useful operation may be to filter out rows which do not match some criteria. Filtering can be done over multiple different criteria, but each of these filters can be implemented in the same way (besides the code that specifies which TableRows should be kept). In order to achieve this shared functionality, we have created the TableFilter class, which is an abstract class that allows for filtered iteration through a Table. This abstract class implements Iterator<Table.TableRow> which means that it specifies a hasNext and a next method that allow for the filtered iteration.

What differentiates one implementation of TableFilter from another is the abstract method keep() which returns true if and only if the value of _next stored in the TableFilter should be delivered by the next() method. By implementing just this method (as well as any necessary instance variables and a constructor), you can create a new TableFilter. This is much nicer design as it puts all of the control logic shared into an abstract class while the specific functionality to a particular filter is contained solely in that class.

For this lab, we have provided the completed TableFilter.java and IdentityFilter which is an implementation which does not filter out any TableRows. Your task will be to complete the following four other TableFilters.

• ColumnMatchFilter: In construction two column names are passed in. This filter filters out any TableRows where the data for each of these two columns does not match exactly.
• EqualityFilter: In construction a column name and a string to match is passed in. This filter filters out any TableRows where the data for the given column does not exactly match the given string.
• GreaterThanFilter: In construction a column name and a string to compare to is passed in. This filter filters out any TableRows where the data for the given column is not greater than the given string.
• SubstringFilter: In construction a column name and a substring is passed in. This filter filters out any TableRows where the data for the given column does not contain the given substring. Hint: the String contains method) will be helpful for this TableFilter.

The code for each of these four implementations will involve adding the necessary instance variables to the class, setting up the instance variables in the constructor, and implementing the keep() method. Although the code for each will differ the general structure should be the same.

### Table.JoinIterator

Another useful operation which can be combined with the previous filters for more interesting queries is a cross join. From two starting Tables t1 and t2, a cross join will output another table which contains the combination of each row from t1 to each row from t2. This is illustrated with the following example. Suppose t1 is the table:

item,color
---------------
sky,blue
grass,green

and suppose t2 is the table:

item,is_alive
---------------
grass,yes
cat,yes
sky,no

Then the result of calling join(t1, t2) would be the following table:

t1.item,t1.color,t2.item,t2.is_alive
---------------
sky,blue,grass,yes
sky,blue,cat,yes
sky,blue,sky,no
grass,green,grass,yes
grass,green,cat,yes
grass,green,sky,no

You could imagine we could combine this with a ColumnMatchFilter on the columns t1.item and t2.item which would result in the following table:

t1.item,t1.color,t2.item,t2.is_alive
---------------
sky,blue,sky,no
grass,green,grass,yes

Your final task for this lab will be to complete the functionality of the inner class Table.JoinIterator to make the join(Table t1, Table t2) function work. We have provided most of the implementation of this class, besides the hasNext method. One common pattern is to have the hasNext() method handle most of the logic of advancing the iterator. The next() method can then call the hasNext() and return the value set through hasNext(). We have followed that pattern here. To complete this you should first read through the provided code in Table.JoinIterator so you can familiarize yourself with the setup, then begin coding the hasNext() method. You should not need to make any changes to the class besides this method, but as long as you pass the tests you can alter the design however you see fit.

Hint: You will need to use the iterators from each of the tables to output all of the combinations of the row.

Hint: Use the joinRows function in the inner Table.TableRow class to help with the combination of TableRows.

### TestTable

We have provided you the complete set of autograder tests to aid you in testing and debugging your code that you write. With the skeleton code we have provided you should be failing five of the fourteen tests provided. In order to get full credit for the lab, you need to pass all fourteen of the tests, that is you must add the functionality we have specified in the above two parts without altering the existing structure of the Table. The task should be mostly independent of the existing behavior, so this likely is not something you should worry about, but be mindful of this.

Finally the tests rely on the data inside the sample_db folder. If you want to pass the tests locally you should not alter these files. If you do, you can always use Git to check them out. The autograder will run the tests with staff versions of these files, so even if you alter them you can still pass the tests on Gradescope.

## D. Submission

In order to complete this lab you should have filled out the following:

• ColumnMatchFilter: Constructor, instance variables, and keep() method.
• EqualityFilter: Constructor, instance variables, and keep() method.
• GreaterThanFilter: Constructor, instance variables, and keep() method.
• SubstringFilter: Constructor, instance variables, and keep() method.
• Table: You should only have to fill in one FIXME for the hasNext() method in the Table.JoinIterator inner class.

Make sure that all of the tests in TestTable are passing before submitting your code. As usual add your files, commit, tag the commit, and then push your tags to the central repository to submit your work.