{
 "metadata": {
  "name": "",
  "signature": "sha256:0f51e58de9c2d13317870983e692ad145bb2aa6c081dfbefc1620bcc660a71b5"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Lab9 - Random Graphs\n",
      "\n",
      "In this lab, we explore random graphs, introduced by Erdos and Renyi. You will need to install networkx in order to complete this lab (http://networkx.github.io/documentation/latest/install.html, http://stackoverflow.com/questions/9836909/easy-install-networkx). Many of you may already have networkx because it comes default with the Anaconda installation of iPython."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You will need the following basic imports as well as a function written to draw graphs for you. The structure of a graph object is a collection of edges, in (node1, node2) form. You should know how to use `draw_graph`, but you don't really need to know how it works. Play around with it to see what it does. Wow! Look at those pretty graphs `:)`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%matplotlib inline\n",
      "from pylab import *\n",
      "import random as rnd\n",
      "import networkx as nx\n",
      "from __future__ import division\n",
      "\n",
      "rcParams['figure.figsize'] = 12, 12  # that's default image size for this interactive session\n",
      "\n",
      "def draw_graph(graph, labels=None, graph_layout='shell',\n",
      "               node_size=1600, node_color='blue', node_alpha=0.3,\n",
      "               node_text_size=12,\n",
      "               edge_color='blue', edge_alpha=0.3, edge_tickness=1,\n",
      "               edge_text_pos=0.3,\n",
      "               text_font='sans-serif'):\n",
      "    \"\"\" \n",
      "    Based on: https://www.udacity.com/wiki/creating-network-graphs-with-python\n",
      "    We describe a graph as a list enumerating all edges.\n",
      "    Ex: graph = [(1,2), (2,3)] represents a graph with 2 edges - (node1 - node2) and (node2 - node3)\n",
      "    \"\"\"\n",
      "    \n",
      "    # create networkx graph\n",
      "    G=nx.Graph()\n",
      "\n",
      "    # add edges\n",
      "    for edge in graph:\n",
      "        G.add_edge(edge[0], edge[1])\n",
      "\n",
      "    # these are different layouts for the network you may try\n",
      "    # shell seems to work best\n",
      "    if graph_layout == 'spring':\n",
      "        graph_pos=nx.spring_layout(G)\n",
      "    elif graph_layout == 'spectral':\n",
      "        graph_pos=nx.spectral_layout(G)\n",
      "    elif graph_layout == 'random':\n",
      "        graph_pos=nx.random_layout(G)\n",
      "    else:\n",
      "        graph_pos=nx.shell_layout(G)\n",
      "\n",
      "    # draw graph\n",
      "    nx.draw_networkx_nodes(G,graph_pos,node_size=node_size, \n",
      "                           alpha=node_alpha, node_color=node_color)\n",
      "    nx.draw_networkx_edges(G,graph_pos,width=edge_tickness,\n",
      "                           alpha=edge_alpha,edge_color=edge_color)\n",
      "    nx.draw_networkx_labels(G, graph_pos,font_size=node_text_size,\n",
      "                            font_family=text_font)\n",
      "    # show graph\n",
      "    plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "graph = [(1,2),(2,3),(1,3)]\n",
      "draw_graph(graph)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "graph = [(1,1),(2,2)]\n",
      "draw_graph(graph) # no self-loops, so put a self-loop if you want a disconnected node"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Lets create a function that returns all the nodes that can be reached from a certain starting point given the representation of a graph above.\n",
      "\n",
      "### <font color=blue>Q1. Fill out the following method, `find_connected_component`, that takes graph and starting_node as input arguments. It must return a set of nodes that are connected to the `starting_node`, including the `starting_node` itself.</font>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "def find_connected_component(graph, starting_node):\n",
      "    \"\"\"\n",
      "    >>> graph = [(1,2),(2,3),(1,3)]\n",
      "    >>> find_connected_component(graph,1)\n",
      "    {1, 2, 3}\n",
      "    >>> graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n",
      "    >>> find_connected_component(graph,1)\n",
      "    {1, 7, 8}\n",
      "    >>> find_connected_component(graph,2)\n",
      "    {2, 3, 4, 5, 6}\n",
      "    \"\"\"\n",
      "    connected_nodes = set()\n",
      "    connected_nodes.add( starting_node )\n",
      "    \n",
      "    #Your code here\n",
      "    \n",
      "    return connected_nodes "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### <font color=blue>Q2. Fill out the following method, `connected_components`, that takes graph and returns all the connected components of the graph. You may want to use the function you wrote above.</font>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def connected_components(graph):\n",
      "    \"\"\"\n",
      "    >>> graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n",
      "    >>> connected_components(graph)\n",
      "    [{1, 7, 8}, {2, 3, 4, 5, 6}]\n",
      "    >>> largest_component_size(graph)\n",
      "    5\n",
      "    \"\"\"\n",
      "    nodes = set()\n",
      "    components = []\n",
      "\n",
      "    # Your code here\n",
      "    \n",
      "    return components\n",
      "\n",
      "# These guys should work after you've implemented connected_components\n",
      "component_sizes = lambda graph: [len(component) for component in (connected_components(graph))]\n",
      "largest_component_size = lambda graph: max(component_sizes(graph))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Next, we want to create a function that, given the number of nodes in a graph, will randomly generate edges between nodes. That is, we want to construct a random graph folowing the Erdos-Renyi model.\n",
      "\n",
      "### <font color=blue>Q3. Fill out the following function to create an Erdos-Renyi random graph $\\operatorname{G}(n,p)$. For each pair of nodes, randomly create an edge between them with probability $p$. Return the resulting graph (same format as before).</font>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "def G(n,p):\n",
      "    graph = [] \n",
      "    # Recall that we describe a graph as a list enumerating all edges. Node names can be numbers.\n",
      "    \n",
      "    #Your code here\n",
      "    \n",
      "    return graph"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Try this!\n",
      "graph = G(10,0.1)\n",
      "draw_graph(graph)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Phase Transitions!!!\n",
      "\n",
      "Now let's examine some of the qualatative properties of a random graph developed in the original Erdos & Renyi paper."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "epsilon = 1/100"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Transition 1: If $np < 1$, then a graph in $\\operatorname{G}(n, p)$ will almost surely have no connected components of size larger than $\\operatorname{O}(\\log(n))$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "largest_sizes = []\n",
      "n = 50\n",
      "p = 1/50 - epsilon\n",
      "for i in range(1000):\n",
      "    graph = G(n,p)\n",
      "    largest_sizes.append(largest_component_size(graph))\n",
      "    \n",
      "\n",
      "print \"We expect the largest component size to be on the order of: \", np.log2(n)\n",
      "print \"True average size of the largest component: \", np.mean(largest_sizes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Transition 2: If $np = 1$, then a graph in $\\operatorname{G}(n, p)$ will almost surely have a largest component whose size is of order $n^{2/3}$."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "largest_sizes = []\n",
      "n = 50\n",
      "p = 1/50\n",
      "for i in range(1000):\n",
      "    graph = G(n,p)\n",
      "    largest_sizes.append(largest_component_size(graph))\n",
      "    \n",
      "\n",
      "print \"We expect the largest component size to be on the order of: \", n**(2/3)\n",
      "print \"True average size of the largest component: \", np.mean(largest_sizes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Transition 3: If $np \u2192 c > 1$, where $c$ is a constant, then a graph in $\\operatorname{G}(n,p)$ will almost surely have a unique giant component containing a positive fraction of the vertices. No other component will contain more than $\\operatorname{O}(\\log(n))$ vertices.\n",
      "\n",
      "We'll increase the number of nodes by a factor of 10 here so we can see this more clearly. Pay attention the precipitious decline from the size of the largest connected tomponent to that of all the rest."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "largest_sizes = []\n",
      "epsilon = 1/10000\n",
      "n = 5000\n",
      "p = 1/5000 + epsilon\n",
      "graph = G(n,p)\n",
      "print sorted(component_sizes(graph))[::-1]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Transition 4: If $p<\\tfrac{(1-\\epsilon)\\ln n}{n}$, then a graph in $\\operatorname{G}(n,p)$ will almost surely contain isolated vertices, and thus be disconnected."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "largest_sizes = []\n",
      "epsilon = 1/20000\n",
      "n = 10000\n",
      "p = (1-epsilon)*np.log(n) / n - epsilon\n",
      "num_isolated = 0\n",
      "for _ in range(10):\n",
      "    graph = G(n,p)\n",
      "    if 1 in component_sizes(graph):\n",
      "        num_isolated += 1\n",
      "print \"Probability of graphs containing isolated vertices: \", num_isolated / 10"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Transition 5: If $p>\\tfrac{(1-\\epsilon)\\ln n}{n}$, then a graph in $\\operatorname{G}(n,p)$ will almost surely be connected."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "largest_sizes = []\n",
      "epsilon = 1/20000\n",
      "n = 10000\n",
      "p = (1-epsilon)*np.log(n) / n + epsilon\n",
      "num_isolated = 0\n",
      "for _ in range(10):\n",
      "    graph = G(n,p)\n",
      "    if 1 in component_sizes(graph):\n",
      "        num_isolated += 1\n",
      "print \"Probability that graphs are connected: \", 1 - num_isolated / 10"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "   "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Cool! Now we've experimentally verified the results of the Erdos-Reyni paper. Isn't it neat that you can rigorously formalize this kind of qualatative behavior of a graph? And that you can clearly see these transitions in simulation? I think it's cool."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}