\n", "v1.1 (2015 Fall) Kabir Chandrasekher \\*\\*, Max Kanwal \\*\\*, Kangwook Lee \\*\\*\\*, Kannan Ramchandran \\*\\*\\*

\n", "v1.2 (2016 Fall) Kabir Chandrasekher, Tony Duan, David Marn, Ashvin Nair, Kangwook Lee, Kannan Ramchandran

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 -- The Erdős–Rényi Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To begin the lab, we explore random graphs, introduced by [Erdős and Rényi](http://www.renyi.hu/~p_erdos/1959-11.pdf). -- $G(n,p)$ has $n$ nodes and probability $p$ of an edge between each node.\n", "\n", "You will need to install [NetworkX](http://networkx.github.io/documentation/latest/install.html) in order to complete this lab. If you have difficulty installing it, you can follow a StackOverflow thread available [here](http://stackoverflow.com/questions/9836909/easy-install-networkx). Many of you may already have NetworkX because it comes default with the Anaconda installation of iPython." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We provide the following basic imports as well as a function written to draw graphs for you. The structure of a graph object is a collection of edges, in (node1, node2) form. You should know how to use `draw_graph`, but you don't really need to know how it works. Play around with it and look at those pretty graphs :)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "from pylab import *\n", "import random as rnd\n", "import networkx as nx\n", "from __future__ import division\n", "\n", "rcParams['figure.figsize'] = 12, 12 # that's default image size for this interactive session\n", "\n", "def draw_graph(graph, labels=None, graph_layout='shell',\n", " node_size=1600, node_color='blue', node_alpha=0.3,\n", " node_text_size=12,\n", " edge_color='blue', edge_alpha=0.3, edge_tickness=1,\n", " edge_text_pos=0.3,\n", " text_font='sans-serif'):\n", " \"\"\" \n", " Based on: https://www.udacity.com/wiki/creating-network-graphs-with-python\n", " We describe a graph as a list enumerating all edges.\n", " Ex: graph = [(1,2), (2,3)] represents a graph with 2 edges - (node1 - node2) and (node2 - node3)\n", " \"\"\"\n", " \n", " # create networkx graph\n", " G=nx.Graph()\n", "\n", " # add edges\n", " for edge in graph:\n", " G.add_edge(edge[0], edge[1])\n", "\n", " # these are different layouts for the network you may try\n", " # shell seems to work best\n", " if graph_layout == 'spring':\n", " graph_pos=nx.spring_layout(G)\n", " elif graph_layout == 'spectral':\n", " graph_pos=nx.spectral_layout(G)\n", " elif graph_layout == 'random':\n", " graph_pos=nx.random_layout(G)\n", " else:\n", " graph_pos=nx.shell_layout(G)\n", "\n", " # draw graph\n", " nx.draw_networkx_nodes(G,graph_pos,node_size=node_size, \n", " alpha=node_alpha, node_color=node_color)\n", " nx.draw_networkx_edges(G,graph_pos,width=edge_tickness,\n", " alpha=edge_alpha,edge_color=edge_color)\n", " nx.draw_networkx_labels(G, graph_pos,font_size=node_text_size,\n", " font_family=text_font)\n", " # show graph\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "graph = [(1,2),(2,3),(1,3)]\n", "draw_graph(graph)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "graph = [(1,1),(2,2)]\n", "draw_graph(graph) # no self-loops, so put a self-loop if you want a disconnected node" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets create a function that returns all the nodes that can be reached from a certain starting point given the representation of a graph above.\n", "\n", "### 1a. Fill out the following method to find the set of connected components from a starting node on a graph." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def find_connected_component(graph, starting_node):\n", " \"\"\"\n", " >>> graph = [(1,2),(2,3),(1,3)]\n", " >>> find_connected_component(graph,1)\n", " {1, 2, 3}\n", " >>> graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n", " >>> find_connected_component(graph,1)\n", " {1, 7, 8}\n", " >>> find_connected_component(graph,2)\n", " {2, 3, 4, 5, 6}\n", " \"\"\"\n", " connected_nodes = set()\n", " connected_nodes.add( starting_node )\n", " \n", " #Your code here\n", " \n", " return connected_nodes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "graph = [(1,2),(2,3),(1,3)]\n", "find_connected_component(graph,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n", "# draw_graph(graph)\n", "find_connected_component(graph,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "find_connected_component(graph,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1b. Fill out the following method that takes and returns all the connected components of the graph." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may want to use the function you wrote above." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def connected_components(graph):\n", " \"\"\"\n", " >>> graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n", " >>> connected_components(graph)\n", " [{1, 7, 8}, {2, 3, 4, 5, 6}]\n", " >>> largest_component_size(graph)\n", " 5\n", " \"\"\"\n", " nodes = set()\n", " components = []\n", "\n", " # Your code here\n", " \n", " return components" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# These guys should work after you've implemented connected_components\n", "component_sizes = lambda graph: [len(component) for component in (connected_components(graph))]\n", "largest_component_size = lambda graph: max(component_sizes(graph))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print(connected_components(graph))\n", "print(largest_component_size(graph))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we want to create a function that, given the number of nodes in a graph, will randomly generate edges between nodes. That is, we want to construct a random graph following the Erdős–Rényi model.\n", "\n", "### 1c. Fill out the following function to create an Erdős–Rényi random graph $G(n,p)$." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def G(n,p):\n", " graph = [] \n", " # Recall that we describe a graph as a list enumerating all edges. Node names can be numbers.\n", " \n", " #Your code here\n", " \n", " return graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure you can see all nodes from 1 to 10 in the graph below -- if not, check your code!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "graph = G(10,0.1)\n", "draw_graph(graph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 -- Phase Transitions!\n", "\n", "Now let's examine some of the qualitative properties of a random graph developed in the original Erdős & Rényi paper.\n", "\n", "(You don't need to code anything for this question)." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [], "source": [ "epsilon = 1/100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transition 1: If $np < 1$, then a graph in $\\operatorname{G}(n, p)$ will almost surely have no connected components of size larger than $\\operatorname{O}(\\log(n))$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "largest_sizes = []\n", "n = 50\n", "p = 1/50 - epsilon\n", "for i in xrange(1000):\n", " graph = G(n,p)\n", " largest_sizes.append(largest_component_size(graph))\n", " \n", "print \"We expect the largest component size to be on the order of: \", np.log2(n)\n", "print \"True average size of the largest component: \", np.mean(largest_sizes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check a visualization of the last graph we generated:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "draw_graph(graph, graph_layout='spring')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transition 2: If $np = 1$, then a graph in $\\operatorname{G}(n, p)$ will almost surely have a largest component whose size is of order $n^{2/3}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "largest_sizes = []\n", "n = 50\n", "p = 1/50\n", "for i in xrange(1000):\n", " graph = G(n,p)\n", " largest_sizes.append(largest_component_size(graph))\n", "\n", "print \"We expect the largest componenet size to be on the order of: \", n**(2/3)\n", "print \"True average size of the largest componenent: \", np.mean(largest_sizes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see this largest component visually:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "draw_graph(graph, graph_layout='spring')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transition 3: If $np → c > 1$, where $c$ is a constant, then a graph in $\\operatorname{G}(n,p)$ will almost surely have a unique giant component containing a positive fraction of the vertices. No other component will contain more than $\\operatorname{O}(\\log(n))$ vertices.\n", "\n", "We'll increase the number of nodes by a factor of 10 here so we can see this more clearly. Pay attention to the precipitous decline from the size of the largest connected component to that of all the rest." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "largest_sizes = []\n", "epsilon = 1/10000\n", "n = 5000\n", "p = 1/5000 + epsilon\n", "graph = G(n,p)\n", "\n", "print \"The sorted sizes of the components are:\"\n", "print sorted(component_sizes(graph))[::-1]\n", "print \"No other component should have size more than on the order of:\", np.log2(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transition 4: If $p<\\tfrac{(1-\\epsilon)\\ln n}{n}$, then a graph in $\\operatorname{G}(n,p)$ will almost surely contain isolated vertices, and thus be disconnected." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rnd.seed(1)\n", "largest_sizes = []\n", "epsilon = .1\n", "n = 10000\n", "p = (1-epsilon)*np.log(n) / n\n", "num_isolated = 0\n", "trials = 10\n", "for _ in xrange(trials):\n", " graph = G(n,p)\n", " print 'List of component sizes:', component_sizes(graph)\n", " if 1 in component_sizes(graph):\n", " num_isolated += 1\n", "print \"Probability of graphs containing isolated vertices: \", num_isolated / trials" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transition 5: If $p>\\tfrac{(1+\\epsilon)\\ln n}{n}$, then a graph in $\\operatorname{G}(n,p)$ will almost surely be connected." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rnd.seed(1)\n", "largest_sizes = []\n", "epsilon = 1/3\n", "n = 10000\n", "p = (1+epsilon)*np.log(n) / n\n", "num_isolated = 0\n", "trials = 10\n", "for _ in xrange(trials):\n", " graph = G(n,p)\n", " print 'List of component sizes:', component_sizes(graph)\n", " if 1 in component_sizes(graph):\n", " num_isolated += 1\n", "print \"Probability that graphs are connected: \", 1 - num_isolated / trials" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cool! Now we've experimentally verified the results of the Erdős–Rényi paper. \n", "\n", "Isn't it neat that you can rigorously formalize this kind of qualitative behavior of a graph, and then clearly see these transitions in simulation? " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Question 3 -- The Stochastic Block Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far we've discussed the Erdős–Rényi model of a random graph $G(n,p)$. There are extensions that are better, more realistic models in many situations.\n", "\n", "As a motivating example, consider the graph formed by friendships of Berkeley students and Stanford students on Facebook. The probability of a friendship between two students both attending UC Berkeley is much higher than the probability that a student from UC Berkeley is friends with a student from Stanford. In the Erdos-Renyi model, however, the two edges formed by these friendships have the same probability!\n", "\n", "In this section, we will explore communities such as the following:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How will we do this? Use the