Lab 11: Optional Graphs Lab

This lab is fully optional and is here as an enrichment exercise. It is rather rough around the edges, but we hope you'll find it useful. It serves as a reasonable introduction to some of the things you'll need to think about for project 3.

The abstract graph

One of the first data structures we studied in this course was the linked list, which consisted of a set of nodes connected in sequence. Then we looked at trees, which were a generalized version of linked lists: you still connected nodes in sequence, but one node could branch off leading to multiple others. Now we'll look at a generalization of a tree, called a graph.

In a graph, we still have a collection of nodes, but each node in a graph can be connected to any other node without limitation. This means there isn't necessarily a hierarchical structure like you get in a tree. For example, here's a tree, where every node is descendent from one another:

Now suppose you add an edge back up the tree. This is no longer a tree (notice the hierarchical structure is broken -- is C descendent from A or is A descendent from C?), but it is still a graph!

There are other edges that are not allowed in a tree. For example, now node E looks like it is descendent from two nodes. Again this is no longer a tree, but it is a graph.

In general a graph can have any possible connection between nodes:

First, let's go over some terminology. An element in a graph is called a vertex (though are sometimes called 'nodes'). Vertices usually represent the objects in our graph, namely the things that have the relationships such as people, places, or things. A connection between two vertices is called an edge. An edge represents some kind of relationship between two vertices.

Quite a few examples of graphs exist in the everyday world:

Road maps are a great example of graphs. Each city is a vertex, and the edges that connect these cities are the roads and freeways. An abstract example of what such a graph would look like can be found here. For a more local example, each building on the Berkeley campus can be thought of as a vertex, and the paths that connect those buildings would be the edges.
Facebook depends on graphs. Each person that participates is a vertex, and an edge is established when two people claim to be friends or associates of each other.
For a more technical example, a computer network is also a graph. In this case, computers and other network machinery (like routers and switches) are the vertices, and the edges are the network cables. Or for a wireless network, an edge is an established connection between a single computer and a single wireless router.

In more formal mathematical notation, a vertex is written as a variable, such as v0, v1, v2, etc. An edge is written as a pair of vertices, such as (v0, v1), or (v2, v0).

Directed vs. undirected graphs

Once again, an element in a graph is called a vertex, and a connection between two vertices is called an edge.

If all edges in a graph are showing a relationship between two vertices that works in either direction, then it is called an undirected graph. A picture of an undirected graph looks like this:

But not all edges in graphs are the same. Sometimes the relationships between two vertices sometimes only go in one direction. Such a relationship is called a directed graph. An example of this could be a city map, where the edges are sometimes one-way streets between locations. A two-way street would actually have to be represented as two edges, one of them going from location A to location B, and the other from location B to location A.

In terms of a visual representation of a graph, an undirected graph does not have arrows on its edges (because the edge connects the vertices in both directions), whereas each edge in a directed graph does have an arrow that points in the direction the edge is going. An example directed graph appears below.

More formally, an edge from a vertex v0 to a vertex v1 is printed as the pair (v0, v1). In a directed graph, the pair is ordered; thus even if the edge (v0,v1) exists, (v1,v0) might not. In an undirected graph, the pair isn't ordered, so the edge (v0,v1) is the same as the edge (v1,v0).

Graph Terminology Reminders

Now that we have the basics of what a graph is, here are a few more terms that might come in handy while discussing graphs.

| Term              | Definition                                                                                                                          |
|---------------------|------------------------------------------------------------------|
| Edge                | A single connection between two vertices                                                                                              
| Adjacent            | Two vertices are adjacent if there is an edge connecting them                                                                         
| Connected           | A graph is connected if every vertex has a path to all other vertices. 
|                     | (Also describes two nodes if there is an edge connecting them) 
| Neighbor            | Two vertices are neighbors if there is an edge connecting them                                                                        
| Set of neighbors    | The set of all nodes that are adjacent to/connected to/neighbors of a node                                                            
| Incident to an edge | A vertex that is an endpoint of an edge is incident to it                                                                             
| Path                | A sequence of edges that can be followed from one vertex to another                                                                   
| Cycle               | A special kind of path that ends at same vertex where it originally started

Basic Graph Problems

Data structures for graphs

Now that we know how to draw a graph on paper and understand the basic concepts and definitions, we can consider how a graph should be represented inside of a computer. We want to be able to get quick answers for the following questions about a graph:

- Are given vertices v and w adjacent?

- Is vertex v incident to a particular edge e?

- What vertices are adjacent to v?

- What edges are incident to v?

Most of this lab will involve thinking about how fast and how efficient each of these operations is using different representations of a graph. This might be handy when working on project 3.

Imagine that we want to represent a graph that looks like this:

One data structure we could use to implement this graph is called an adjacency list. In such a data structure, an array is created that has the same size as the number of vertices in the graph. Each position in the array represents one of the vertices in the graph. Then, each location in the array points to a list that contains indexes to other vertices, which are the vertices neighbors.

The adjacency list that represents the above graph looks like this:

Another data structure we could use to represent the edges in a graph is called an adjacency matrix. In this data structure, we also have an array with each position representing a vertex in the graph. However, instead of the array pointing to a linked list, it points to another array representing possible neighbors. This matrix just contains boolean values, true when there is an edge between the two given vertices, false when no edge exists. Thus, each vertex has a row and a column in the matrix, and the value in that table says true or false whether or not that edge exists.

The adjacency matrix that represents the above graph looks like this:

Discussion: Representing a graph with a linked structure

A third data structure we could use to represent a graph is simply an extension of the linked nodes idea, where each vertex is an object that contains pointers to other vertices. This may seem like the most straightforward approach: aren't the adjacency list and adjacency matrix roundabout in comparison?

Discuss withs someone (in lab or wherever you happen to be when reading this) reasons why the adjacency list or adjacency matrix might be preferred for a graph.

On the flipside, notice that we could also represent a tree using an ajacency matrix or list. Discuss reasons why an ajacency list or adjacency matrix might not be preferred for a tree.

Representation Efficiency Questions

Common Neighbor Timing

A CS 61B student claims that, in a graph with N vertices and E edges that's implemented with an array of adjacency lists, the worst-case time to see if vertices v and w have a common neighbor is proportional to N². (Vertices are not in any particular order in an adjacency list.)

This estimate is insufficiently specific. Explain why, and give a more specific estimate.

Exercise: practice graph traversal

Below is the psuedocode for traversing a graph. It might help you to print out copies of the graph Graph Pictures. Try to figure out the order in which the nodes are visited if you start at different vertices. You can check your answers for starting at vertices 1 - 5 below.

    Stack fringe = new Stack ();
    fringe.add (some vertex);
    while (!fringe.isEmpty()) {

    // select a vertex from fringe
    // if the vertex is not yet visited:
            // process the vertex
            // add the vertex's neighbors to fringe
            // mark the vertex as visited

    }

For this problem assume that we add the vertex's unvisited edges to the stack in the order higher number to lower number.

Or as a list:

Answers for practicing graph traversal

Here are the solutions for nodes 1-5 (just practice these until you feel comfortable).

| Starting vertex | Order of visiting remaining vertices |
|-----------------|--------------------------------------|
| 1               | 3, 4, 2, 5, 7, 6, 9, 8, 10           |
| 2               | 4, 3, 1, 6, 7, 5, 8, 9, 10           |
| 3               | 1, 4, 2, 5, 7, 6, 9, 8, 10           |
| 4               | 2, 5, 7, 3, 1, 6, 9, 8, 10           |
| 5               | 2, 4, 3, 1, 6, 7, 8, 9, 10           |

Wikipedia as a graph

The internet is a (very) large directed graph! Web pages are vertices, and links are edges between them.

Wikipedia is a particulary interesting graph, which we will investigate in this homework assignment. Here's an experiment to try:

Visit a Wikipedia page. Click on the first link in the main body text that's not in parentheses, and you'll end up on another Wikipedia page. Again, click on the first link in the main body text that's not in parentheses. Continue. Eventually, you'll likely end up at pages like "Philosophy", "Human", "Logic", and "Consciousness".

Try it again, starting with other pages. Again, most likely you will end up at pages like "Philosophy", "Consciousness", etc. According to wikipedia, roughly 94% of pages eventually lead to "Philosophy"!