{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lab 8 - Random Graphs Part 1: The Erdős–Rényi Model\n",
    "\n",
    "#### Authors:\n",
    "\n",
    "##### v1.0 (2014 Fall) Rishi Sharma \\*\\*, Sahaana Suri \\*\\*, Kangwook Lee \\*\\*, Kannan Ramchandran \\*\\*\n",
    "\n",
    "##### v1.1 (2015 Fall) Kabir Chandrasekher \\*, Max Kanwal \\*, Kangwook Lee \\*\\*, Kannan Ramchandran \\*\\*\n",
    "\n",
    "In this lab, we explore random graphs, introduced by [Erdős and Rényi](http://www.renyi.hu/~p_erdos/1959-11.pdf). You will need to install NetworkX in order to complete this lab (http://networkx.github.io/documentation/latest/install.html, http://stackoverflow.com/questions/9836909/easy-install-networkx). Many of you may already have NetworkX because it comes default with the Anaconda installation of iPython."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You will need the following basic imports as well as a function written to draw graphs for you. The structure of a graph object is a collection of edges, in (node1, node2) form. You should know how to use `draw_graph`, but you don't really need to know how it works. Play around with it and look at those pretty graphs :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from pylab import *\n",
    "import random as rnd\n",
    "import networkx as nx\n",
    "from __future__ import division\n",
    "\n",
    "rcParams['figure.figsize'] = 12, 12  # that's default image size for this interactive session\n",
    "\n",
    "def draw_graph(graph, labels=None, graph_layout='shell',\n",
    "               node_size=1600, node_color='blue', node_alpha=0.3,\n",
    "               node_text_size=12,\n",
    "               edge_color='blue', edge_alpha=0.3, edge_tickness=1,\n",
    "               edge_text_pos=0.3,\n",
    "               text_font='sans-serif'):\n",
    "    \"\"\" \n",
    "    Based on: https://www.udacity.com/wiki/creating-network-graphs-with-python\n",
    "    We describe a graph as a list enumerating all edges.\n",
    "    Ex: graph = [(1,2), (2,3)] represents a graph with 2 edges - (node1 - node2) and (node2 - node3)\n",
    "    \"\"\"\n",
    "    \n",
    "    # create networkx graph\n",
    "    G=nx.Graph()\n",
    "\n",
    "    # add edges\n",
    "    for edge in graph:\n",
    "        G.add_edge(edge[0], edge[1])\n",
    "\n",
    "    # these are different layouts for the network you may try\n",
    "    # shell seems to work best\n",
    "    if graph_layout == 'spring':\n",
    "        graph_pos=nx.spring_layout(G)\n",
    "    elif graph_layout == 'spectral':\n",
    "        graph_pos=nx.spectral_layout(G)\n",
    "    elif graph_layout == 'random':\n",
    "        graph_pos=nx.random_layout(G)\n",
    "    else:\n",
    "        graph_pos=nx.shell_layout(G)\n",
    "\n",
    "    # draw graph\n",
    "    nx.draw_networkx_nodes(G,graph_pos,node_size=node_size, \n",
    "                           alpha=node_alpha, node_color=node_color)\n",
    "    nx.draw_networkx_edges(G,graph_pos,width=edge_tickness,\n",
    "                           alpha=edge_alpha,edge_color=edge_color)\n",
    "    nx.draw_networkx_labels(G, graph_pos,font_size=node_text_size,\n",
    "                            font_family=text_font)\n",
    "    # show graph\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "graph = [(1,2),(2,3),(1,3)]\n",
    "draw_graph(graph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "graph = [(1,1),(2,2)]\n",
    "draw_graph(graph) # no self-loops, so put a self-loop if you want a disconnected node"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets create a function that returns all the nodes that can be reached from a certain starting point given the representation of a graph above.\n",
    "\n",
    "### <font color=blue>Q1. Fill out the following method, `find_connected_component`, that takes `graph` and `starting_node` as input arguments. It must return a set of nodes that are connected to `starting_node`, including `starting_node` itself.</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def find_connected_component(graph, starting_node):\n",
    "    \"\"\"\n",
    "    >>> graph = [(1,2),(2,3),(1,3)]\n",
    "    >>> find_connected_component(graph,1)\n",
    "    {1, 2, 3}\n",
    "    >>> graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n",
    "    >>> find_connected_component(graph,1)\n",
    "    {1, 7, 8}\n",
    "    >>> find_connected_component(graph,2)\n",
    "    {2, 3, 4, 5, 6}\n",
    "    \"\"\"\n",
    "    connected_nodes = set()\n",
    "    connected_nodes.add( starting_node )\n",
    "    \n",
    "    #Your code here\n",
    "    \n",
    "    return connected_nodes "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color=blue>Q2. Fill out the following method, `connected_components`, that takes `graph` and returns all the connected components of the graph. You may want to use the function you wrote above.</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def connected_components(graph):\n",
    "    \"\"\"\n",
    "    >>> graph = [(1,1),(2,3),(2,4),(3,5),(3,6),(4,6),(1,7),(7,8),(1,8)]\n",
    "    >>> connected_components(graph)\n",
    "    [{1, 7, 8}, {2, 3, 4, 5, 6}]\n",
    "    >>> largest_component_size(graph)\n",
    "    5\n",
    "    \"\"\"\n",
    "    nodes = set()\n",
    "    components = []\n",
    "\n",
    "    # Your code here\n",
    "    \n",
    "    return components"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# These guys should work after you've implemented connected_components\n",
    "component_sizes = lambda graph: [len(component) for component in (connected_components(graph))]\n",
    "largest_component_size = lambda graph: max(component_sizes(graph))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we want to create a function that, given the number of nodes in a graph, will randomly generate edges between nodes. That is, we want to construct a random graph following the Erdős–Rényi model.\n",
    "\n",
    "### <font color=blue>Q3. Fill out the following function to create an Erdős–Rényi random graph $\\operatorname{G}(n,p)$. For each pair of nodes, randomly create an edge between them with probability $p$. Return the resulting graph (same format as before).</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def G(n,p):\n",
    "    graph = [] \n",
    "    # Recall that we describe a graph as a list enumerating all edges. Node names can be numbers.\n",
    "    \n",
    "    #Your code here\n",
    "    \n",
    "    return graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Try this!\n",
    "graph = G(10,0.1)\n",
    "draw_graph(graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Phase Transitions!!!\n",
    "\n",
    "Now let's examine some of the qualitative properties of a random graph developed in the original Erdős & Rényi paper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "epsilon = 1/100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Transition 1: If $np < 1$, then a graph in $\\operatorname{G}(n, p)$ will almost surely have no connected components of size larger than $\\operatorname{O}(\\log(n))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "largest_sizes = []\n",
    "n = 50\n",
    "p = 1/50 - epsilon\n",
    "for i in range(1000):\n",
    "    graph = G(n,p)\n",
    "    largest_sizes.append(largest_component_size(graph))\n",
    "    \n",
    "\n",
    "print \"We expect the largest component size to be on the order of: \", np.log2(n)\n",
    "print \"True average size of the largest component: \", np.mean(largest_sizes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Transition 2: If $np = 1$, then a graph in $\\operatorname{G}(n, p)$ will almost surely have a largest component whose size is of order $n^{2/3}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "largest_sizes = []\n",
    "n = 50\n",
    "p = 1/50\n",
    "for i in range(1000):\n",
    "    graph = G(n,p)\n",
    "    largest_sizes.append(largest_component_size(graph))\n",
    "    \n",
    "\n",
    "print \"We expect the largest component size to be on the order of: \", n**(2/3)\n",
    "print \"True average size of the largest component: \", np.mean(largest_sizes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Transition 3: If $np → c > 1$, where $c$ is a constant, then a graph in $\\operatorname{G}(n,p)$ will almost surely have a unique giant component containing a positive fraction of the vertices. No other component will contain more than $\\operatorname{O}(\\log(n))$ vertices.\n",
    "\n",
    "We'll increase the number of nodes by a factor of 10 here so we can see this more clearly. Pay attention to the precipitous decline from the size of the largest connected component to that of all the rest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "largest_sizes = []\n",
    "epsilon = 1/10000\n",
    "n = 5000\n",
    "p = 1/5000 + epsilon\n",
    "graph = G(n,p)\n",
    "\n",
    "print sorted(component_sizes(graph))[::-1]\n",
    "print \"No other component should have size more than on the order of:\", np.log2(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Transition 4: If $p<\\tfrac{(1-\\epsilon)\\ln n}{n}$, then a graph in $\\operatorname{G}(n,p)$ will almost surely contain isolated vertices, and thus be disconnected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rnd.seed(1)\n",
    "largest_sizes = []\n",
    "epsilon = .1\n",
    "n = 10000\n",
    "p = (1-epsilon)*np.log(n) / n\n",
    "num_isolated = 0\n",
    "trials = 10\n",
    "for _ in xrange(trials):\n",
    "    graph = G(n,p)\n",
    "    print component_sizes(graph)\n",
    "    if 1 in component_sizes(graph):\n",
    "        num_isolated += 1\n",
    "print \"Probability of graphs containing isolated vertices: \", num_isolated / trials"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Transition 5: If $p>\\tfrac{(1-\\epsilon)\\ln n}{n}$, then a graph in $\\operatorname{G}(n,p)$ will almost surely be connected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rnd.seed(1)\n",
    "largest_sizes = []\n",
    "epsilon = 1/3\n",
    "n = 10000\n",
    "p = (1+epsilon)*np.log(n) / n\n",
    "num_isolated = 0\n",
    "trials = 10\n",
    "for _ in xrange(trials):\n",
    "    graph = G(n,p)\n",
    "    print component_sizes(graph)\n",
    "    if 1 in component_sizes(graph):\n",
    "        num_isolated += 1\n",
    "print \"Probability that graphs are connected: \", 1 - num_isolated / trials"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool! Now we've experimentally verified the results of the Erdős–Rényi paper. Isn't it neat that you can rigorously formalize this kind of qualatative behavior of a graph? And that you can clearly see these transitions in simulation? Yes, yes it is cool."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}