{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inverse Kinematics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.animation as animation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#Function Definitions\n",
    "This computes $\\vec f(\\vec \\theta)$, which is the function that takes the amount each joint will be rotate and gives us the position of the end effector.  This computation is called forward kinematics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# compute the forward kinematics\n",
    "# r is the length of each joint of the arm; this should be a fixed number\n",
    "# theta is the amount each joint should be rotated\n",
    "# start is the location of the first/root joint of the arm; this vector needs an extra 1 appended to the end\n",
    "def f(r, theta, start):\n",
    "    curPoint = start\n",
    "    points = start\n",
    "    T = np.eye(3,3)\n",
    "    \n",
    "    # here, we go down the arm rotating the current point and moving it down the length of the joint for each joint\n",
    "    for i in range(0,r.size):\n",
    "        # rotation\n",
    "        X = np.matrix([[np.cos(theta[0,i]),-np.sin(theta[0,i]),0],[np.sin(theta[0,i]),np.cos(theta[0,i]),0],[0,0,1]])\n",
    "        T = T * X\n",
    "        # translation\n",
    "        Z = np.matrix([[1,0,r[0,i]],[0,1,0],[0,0,1]])\n",
    "        T = T * Z\n",
    "        # update the current point to the end of the current joint\n",
    "        curPoint = T * start\n",
    "        points = np.c_[points,curPoint] # keep track of the set of points so we can plot the arm later\n",
    "    return points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This computes $\\vec g(\\vec \\theta)$, which is the function that computes the difference between the location of the end effector and our target point $\\vec t$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# r, theta, and start are the same for f\n",
    "# target is a vector indicating the target point we want the end of the arm to be at\n",
    "def g(r, theta, start, target):\n",
    "    points = f(r, theta, start)\n",
    "    pos = points[0:2,-1]\n",
    "    return pos - target"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This computes the Jacobian $J_{\\vec g}(\\vec\\theta)$, which is the 2x4 matrix of partial derivatives of the function $\\vec g(\\vec \\theta)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# the inputs are the same as g\n",
    "# this computes the Jacobian matrix numerically\n",
    "def Jg(r, theta, start, target):\n",
    "    epsilon = 1e-6\n",
    "    J = np.zeros((2,0))\n",
    "    for i in range(0,r.size):\n",
    "        # use a central difference to estimate the derivatives\n",
    "        t = theta\n",
    "        t[0,i] = t[0,i] + epsilon\n",
    "        p = g(r, t, start, target)\n",
    "        t = theta\n",
    "        t[0,i] = t[0,i] - epsilon\n",
    "        n = g(r, t, start, target)\n",
    "        J = np.c_[J,(p - n) / (2 * epsilon)]\n",
    "    return J"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This computes the pseudoinverse of the input matrix.  This is the function that you will be filling in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def pseudoinverse(A):\n",
    "    # Your code here: (a)\n",
    "    u, s, v = #STUDENT, use NumPy to compute the SVD of A\n",
    "    ## # End Code #\n",
    "    for i in range(0, s.size):\n",
    "        # Your code here: (b) #STUDENT, invert the entries of s here\n",
    "        ## # End Code #\n",
    "    # Your code here: (c)\n",
    "    diagS = np.diag(s)\n",
    "    Ainv = #STUDENT, compute the pseudo inverse using u, v, and the inverted singular values\n",
    "    ## # End Code #\n",
    "    return Ainv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This computes the direction we need to move $\\vec \\theta$ to get closer to the target by using Newton's method.  The direction is given by $-J_{\\vec g}(\\vec \\theta)^{-1}\\vec g(\\vec \\theta)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# x is the current joint rotations\n",
    "# c is g(x)\n",
    "# J is the Jacobian evaluated at x\n",
    "def newton(x, c, J):\n",
    "    delta = - pseudoinverse(J) * c\n",
    "    return delta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This updates $\\vec \\theta$ by adjusting $\\eta$ to decide how far to move $\\vec \\theta$, and we will thus compute the newton step given by $\\theta^{(i+1)}=\\theta^{(i)}-\\eta J_{\\vec g}(\\vec \\theta)^{-1}\\vec g(\\vec \\theta)$.  To compute $\\eta$, we start with $\\eta=1$ and test $\\vec g(\\vec \\theta^{(i+1)})$ is closer to the target.  If it's not, we divide $\\eta$ by 2 and repeat.  If $\\eta$ become small enough and we still haven't found a $\\vec \\theta^{(i+1)}$ that moves the end effector closer to the target, then we stop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# this performs one newton step and estimates eta by starting with eta=1\n",
    "# if the end effector of the arm is moved farther away after this step,\n",
    "# we divide eta by 2 and then repeat until we move closer or if eta is too small\n",
    "def update(theta, r, start, target):\n",
    "    cost = g(r, theta, start, target)\n",
    "    oldCost = np.linalg.norm(cost)\n",
    "    J = Jg(r, theta, start, target)\n",
    "    delta = newton(theta.T, cost, J).T\n",
    "    eta = 1.0\n",
    "    while eta > 1e-4:\n",
    "        cost = g(r, theta + eta * delta, start, target)\n",
    "        temp = np.linalg.norm(cost)\n",
    "        if temp < oldCost:\n",
    "            theta = theta + eta * delta\n",
    "            newCost = temp\n",
    "            break\n",
    "        eta = eta / 2\n",
    "    return theta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This puts everything together to compute $\\vec \\theta$ that moves the arm to the target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def IK(thetaStart, r, start, target):\n",
    "    theta = thetaStart\n",
    "    oldCost = np.linalg.norm(g(r, theta, start, target))\n",
    "    newCost = oldCost\n",
    "    iteration = 0\n",
    "    while (newCost < oldCost or iteration < 1) and iteration < 8:\n",
    "        oldCost = newCost\n",
    "        theta = update(theta, r, start, target)\n",
    "        newCost = np.linalg.norm(g(r, theta, start, target))\n",
    "        iteration = iteration + 1\n",
    "    return theta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a helper function to visualize the arm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def plotArm(r, theta, start, target):\n",
    "    p = f(r, theta, start)\n",
    "    fig = plt.figure()\n",
    "    plt.xlim(-4, 7)\n",
    "    plt.ylim(-4, 7)\n",
    "    plt.plot(target[0,:].T,target[1,:].T,'b-o',markersize=12)\n",
    "    plt.plot(p[0,:].T,p[1,:].T,'r-o',linewidth=4,markersize=8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#Testing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tests your pseudo inverse function in three cases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib nbagg\n",
    "\n",
    "# Define the location of the first arm joint and the length of each joint\n",
    "r = np.matrix([[1.0,1.0,1.0,1.0]])\n",
    "start = np.matrix([[0],[0],[1]]) # The one at the end is there for homogeneous coordinates\n",
    "\n",
    "# Test if the arm can reach a target\n",
    "theta = np.matrix([[-0.16227303, -0.60693046, 0.72069313, 0.70009674]])\n",
    "target = np.matrix([[2.0],[1.6]])\n",
    "theta = IK(theta, r, start, target)\n",
    "plotArm(r, theta, start, target)\n",
    "\n",
    "# Test if the arm can point to a target out of reach\n",
    "theta = np.matrix([[0.0,0.0,0.0,0.0]])\n",
    "target = np.matrix([[4.0],[4.0]])\n",
    "theta = IK(theta, r, start, target)\n",
    "plotArm(r, theta, start, target)\n",
    "\n",
    "# Test how well the pseudoinverse works with small singular values\n",
    "theta = np.matrix([[0.0,0.0,0.0,0.0]]) + (np.random.rand(1,4) - 0.5) / 1e6\n",
    "start = np.matrix([[0],[0],[1]])\n",
    "target = np.matrix([[5.0],[2.0]])\n",
    "theta = IK(theta, r, start, target)\n",
    "plotArm(r, theta, start, target)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This animates the arm reaching for a moving target.  When the target goes out of reach, the arm should point towards the target because that is the closest it can get to the target."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# initialize the arm\n",
    "theta = np.matrix([[0.0,0.0,0.0,0.0]])\n",
    "r = np.matrix([[1.0,1.0,1.0,1.0]])\n",
    "start = np.matrix([[0],[0],[1]])\n",
    "\n",
    "# set up the plot\n",
    "fig = plt.figure()\n",
    "ax = plt.axes(xlim=(-4, 4), ylim=(-4, 4), aspect='equal')\n",
    "p = f(r, theta, start)\n",
    "plt.xlim(-1, 4)\n",
    "plt.ylim(-5, 5)\n",
    "line, = plt.plot([], [], 'r-o', linewidth=4, markersize=10)\n",
    "point, = plt.plot([],[],'b-o',markersize=12)\n",
    "angle = 2.0\n",
    "\n",
    "def update_arm(num, r, s):\n",
    "    global angle, theta\n",
    "    angle = np.mod(angle + 0.05,2 * np.pi)\n",
    "    t = np.matrix([[2.0],[4.5 * np.sin(angle)]])\n",
    "    theta = IK(theta, r, s, t)\n",
    "    theta = np.mod(theta, 2 * np.pi)\n",
    "    p = f(r, theta, s)\n",
    "    line.set_data(p[0:2,:])\n",
    "    point.set_data(t)\n",
    "\n",
    "arm_ani = animation.FuncAnimation(fig, update_arm, 200, fargs=(r, start),\n",
    "    interval=1000/12, blit=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Clustering With $k$-means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$k$-means is one of the basic clustering algorithms. You will need to \n",
    "\n",
    "1. Implement the $k$-means algorithm, and \n",
    "2. Run your algorithm on the hand drawn digits dataset and evaluate its performance.\n",
    "\n",
    "To make your lives easier, we have provided you with:\n",
    "\n",
    "1. simple unit tests for each part of the algorithm, and \n",
    "2. synthetic data to test your implementation.\n",
    "\n",
    "We **highly recommend** you first test and debug your code on the unit tests, then the synthetic data. Then, once your code works, continue to the digits dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize NumPy and Matplotlib."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%pylab inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Increase the default figure size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "matplotlib.rcParams['figure.figsize'] = [8, 8]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import the modules required for animation. (there might be a warning that can be safely ignored)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.html.widgets import interact, IntSlider"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also reset the pseudo-random number generator to a consistent state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.random.seed(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The $k$-means algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section you will implement the $k$-means algorithm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use the following notation:\n",
    "\n",
    "* `K` is the number of clusters (and cluster centers)\n",
    "* `D` is the number of dimensions\n",
    "* `N` is the number of data points\n",
    "\n",
    "In other words:\n",
    "    \n",
    "* data points and cluster centers are numpy arrays (vectors) of size `D`\n",
    "* the data set is an `N` by `D` array\n",
    "* the concatenated cluster centers form a `K` by `D` array\n",
    "* and so on..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part (a)\n",
    "Implement the labeling step. Given $k$ cluster centers and the array of data points, return an array containing the index of the cluster center nearest to it for each point.\n",
    "\n",
    "For example, if $k$=2, there are 3 data points and the first point is closest to cluster 0, the second cluster 1 and the third cluster 0 the function should return np.array([0,1,0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def label_dataset(cluster_centers, dataset):\n",
    "    \"\"\"k-means labeling\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    cluster_centers : K by D array, each row is a cluster center\n",
    "    datasets: N by D array, each row is a data point\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    Returns an array of size N, where each entry is the index of the \n",
    "    nearest cluster center\n",
    "    \"\"\"\n",
    "    \n",
    "    # a few sanity checks to catch errors early\n",
    "    assert cluster_centers.shape[1] == dataset.shape[1], \"Dimensions do not match\"\n",
    "    assert cluster_centers.dtype == np.float\n",
    "    assert dataset.dtype == np.float\n",
    "    \n",
    "    K = cluster_centers.shape[0]\n",
    "    \n",
    "    # Your code here: (a)\n",
    "    # STUDENT: compute and return an array containing the index of the cluster center nearest to it for each point.\n",
    "    ## # End Code #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Test your code against a small one-dimensional example:\n",
    "\n",
    "* Cluster centers at -10 and 10, \n",
    "* Data points at -20, 4, -6, and 11. \n",
    "\n",
    "We expect the labels to 0, 1, 0, 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cluster_centers_1 = np.array([[-10.], [10.]])\n",
    "dataset_1 = np.array([[-20.], [4.], [-6.], [20.]])\n",
    "labels_1 = label_dataset(cluster_centers_1, dataset_1)\n",
    "\n",
    "assert np.array_equal(labels_1, np.array([0, 1, 0, 1])), 'Wrong labels: {}'.format(labels_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part (b)\n",
    "Implement the cluster center update step. Given $k$ cluster centers, the dataset and the current labels, return the new cluster centers in a $K$ by $D$ array by averaging all the points with the same label."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def update_cluster_centers(cluster_centers, dataset, labels):\n",
    "    \"\"\"k-means cluster center update\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    cluster_centers : K by D array, each row is a cluster center\n",
    "    datasets: N by D array, each row is a data point\n",
    "    labels: N-sized array, each entry is the index of the nearest cluster center\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    Returns a K by D array, each row is an updated cluster center, computed by \n",
    "    averaging all the data points with the corresponding label\n",
    "    \"\"\"\n",
    "    \n",
    "    # a few sanity checks to catch errors early\n",
    "    assert cluster_centers.shape[1] == dataset.shape[1], \"Dimensions do not match\"\n",
    "    assert cluster_centers.dtype == np.float\n",
    "    assert dataset.dtype == np.float\n",
    "    assert labels.shape[0] == dataset.shape[0]\n",
    "    assert np.issubdtype(labels.dtype, np.integer)\n",
    "\n",
    "    K = cluster_centers.shape[0]\n",
    "    \n",
    "    # Your code here: (b)\n",
    "    # STUDENT: return the new cluster centers in a K by D array by averaging all the points with the same label.\n",
    "    ## # End Code #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try your code against the same example and compute the new cluster centers.\n",
    "\n",
    "**Note that we use `np.allclose()` to compare the result. In general, it is a bad idea to compare floating point numbers using double-equals as there always might be some level or error.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cluster_centers_2 = update_cluster_centers(cluster_centers_1, dataset_1, labels_1)\n",
    "assert np.allclose(cluster_centers_2, np.array([[-13], [12]])), 'Wrong cluster centers: {}'.format(cluster_centers_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part (c)\n",
    "Implement a function that computes the distortion of a given dataset, cluster centers and labels.\n",
    "\n",
    "Recall the formula for distortion:\n",
    "\n",
    "$$D=\\sum_{i=0}^{N-1}||x_i-m_{C(i)}||^2$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def compute_distortion(cluster_centers, dataset, labels):\n",
    "    \"\"\"k-means distortion computation\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    cluster_centers : K by D array, each row is a cluster center\n",
    "    datasets: N by D array, each row is a data point\n",
    "    labels: N-sized array, each entry is the index of the nearest cluster center\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    Return the distortion of the cluster centers. That is, compute sum of the distance square from each\n",
    "    cluster center to the data points that labeled as belonging to the corresponding cluster\n",
    "    \n",
    "    \"\"\"\n",
    "    # Your code here: (c)\n",
    "    # STUDENT: compute and return the distortion of the given dataset, cluster centers and labels.\n",
    "    ## # End Code #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try your code agains the same example, compute the distortion of `cluster_centers_1` versus `cluster_centers_2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "assert np.isclose(compute_distortion(cluster_centers_1, dataset_1, labels_1), 252.)\n",
    "assert np.isclose(compute_distortion(cluster_centers_2, dataset_1, labels_1), 226.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part (d)\n",
    "Implement the complete $k$ means algorithm, using the previous functions. Recall that there are 4 steps to the K-means algorithm:\n",
    "\n",
    "1. Initialization\n",
    "2. Find new cluster centers (`update_cluster_centers`)\n",
    "3. Assign points to cluster centers (`label_dataset`)\n",
    "4. Find the ditortion (`compute_distortion`)\n",
    "5. If the distortion changed from the previous iteration `GOTO` 2, else terminate\n",
    "\n",
    "There is an extra parameter `max_iter`. You should not iterate through assignment and updating more than `max_iter` times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def k_means(initial_cluster_centers, dataset, max_iter=100):\n",
    "    \"\"\"k-means algorithm\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    initial_cluster_centers : K by D array, each row is an initial cluster center\n",
    "    datasets: N by D array, each row is a data point\n",
    "    max_iter: the maximum number of iterations. \n",
    "        If k_means() does not converge after max_iter iteration, stop\n",
    "\n",
    "    Notes:\n",
    "    ------\n",
    "    Run the k-means algorithm on the dataset starting from initial_cluster centers.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "        cluster_centers: K by D array, each row is the final cluster center\n",
    "        labels: N-sized array, the index of the nearest cluster center to each data point\n",
    "        distortion: the distortion of the final cluster centers\n",
    "    \"\"\"    \n",
    "    \n",
    "    # initialize the cluster centers\n",
    "    cluster_centers = initial_cluster_centers\n",
    "    \n",
    "    # label the data points according to the nearest cluster center\n",
    "    labels = label_dataset(cluster_centers, dataset)\n",
    "    \n",
    "    # compute the distortion\n",
    "    distortion = compute_distortion(cluster_centers, dataset, labels)\n",
    "\n",
    "    for i in range(max_iter):\n",
    "\n",
    "        # save the distortion from the previous iteration\n",
    "        prev_distortion = distortion\n",
    "        \n",
    "        # Your code here: (d)\n",
    "        # STUDENT: update the cluster centers, find the new labels, and compute the new distortion\n",
    "        ## # End Code #\n",
    "        \n",
    "        # if distortion not improved, stop\n",
    "        if distortion >= prev_distortion:\n",
    "            break\n",
    "    \n",
    "    return cluster_centers, labels, distortion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Test your code on the simple example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cluster_centers_1_final, labels_1_final, distortion_1_final = k_means(cluster_centers_1, dataset_1)\n",
    "\n",
    "assert np.allclose(cluster_centers_1_final, np.array([[-13], [12]]))\n",
    "assert np.array_equal(labels_1_final, np.array([0, 1, 0, 1]))\n",
    "assert np.isclose(distortion_1_final, 226)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You now have a working implementation of $k$-means (Well, at least it works on a trivial unit test)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generating synthetic 2D data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set the random seed to 0 for consistency."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "np.random.seed(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As in last week's discussion, for each cluster we'll draw random points around a predefined cluster center, with a predefined spread.\n",
    "\n",
    "[You will learn more about probability, the gaussian distribution, cluster center and standard deviation in CS70 and EE126. For now it is enough to know that the points of each cluster are centered around the cluster center and at least 95% are within twice the spread of the synthetic cluster.]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def cluster(cluster_center, stddev, size):\n",
    "    K = cluster_center.shape[0]\n",
    "    return np.random.multivariate_normal(cluster_center, np.eye(K)*stddev**2, size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The synthetic data is generated in two steps:\n",
    "\n",
    "1. Generate `K` cluster centers, centered around `MEANS_CENTER`, with standard deviation of `MEANS_STDDEV`, and\n",
    "2. Generate `CLUSTER_SIZE` points around each cluster center with standard deviation of `POINTS_STDDEV`\n",
    "\n",
    "To make it easy to separate the clusters, we will use `MEANS_CENTER` significantly larger than `POINTS_STDDEV`.\n",
    "\n",
    "We will also provide labels, which are the just the index of the cluster_center used to generate the cluster from."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "K = 5\n",
    "\n",
    "MEANS_CENTER = np.zeros(2)\n",
    "MEANS_STDDEV = 5\n",
    "POINTS_STDDEV = 1\n",
    "CLUSTER_SIZE = 1000\n",
    "\n",
    "synthetic_cluster_centers = cluster(MEANS_CENTER, MEANS_STDDEV, K)\n",
    "synthetic_data = np.concatenate( [cluster(cluster_center, POINTS_STDDEV, CLUSTER_SIZE) \n",
    "                                    for cluster_center in synthetic_cluster_centers] )\n",
    "synthetic_labels = np.concatenate( [ np.array([i] * CLUSTER_SIZE) for i in range(K)] )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dimensions:\n",
    "* `synthetic_cluster_centers` is an `K` $\\times$ 2 array. Each row in a cluster center.\n",
    "* `synthetic_data` is an (`CLUSTER_SIZE*K`) $\\times$ 2 array. Each row is a point of data.\n",
    "* `synthetic_labels` is a (`CLUSTER_SIZE*K`) vector, each entry is the cluster center for which the corresponding point belongs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting the synthetic data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A small helper function for setting the plot limits:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def set_plot_limits(data):\n",
    "    \n",
    "    M = np.max( np.abs(data) )\n",
    "\n",
    "    plt.xlim(-M, M)\n",
    "    plt.ylim(-M, M)\n",
    "\n",
    "    return M"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot the clusters and their centers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "set_plot_limits(synthetic_data)\n",
    "plt.scatter(*synthetic_data.T, c=synthetic_labels, alpha=0.1)\n",
    "plt.scatter(*synthetic_cluster_centers.T, c='red', s=100)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we have K cluster centers, we can classify a point according the cluster center nearest to it.\n",
    "This can be shown graphically In the following plot. Each point in the space is assigned a color according to the cluster center nearest to it.\n",
    "\n",
    "This is called a Voronoi Diagram. The name is not important."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def voronoi(cluster_centers, M, alpha=1.0):\n",
    "\n",
    "    # create a grid of points\n",
    "    h = M / 250\n",
    "    xx, yy = np.meshgrid(np.arange(-M, M, h), np.arange(-M, M, h))\n",
    "    mesh = np.c_[xx.ravel(), yy.ravel()]\n",
    "    \n",
    "    # label the grid according to the nearest cluster center\n",
    "    labels = label_dataset(cluster_centers, mesh)\n",
    "\n",
    "    # draw the labeled grid\n",
    "    plt.imshow(labels.reshape(xx.shape), extent=(-M, M, -M, M), origin='lower', alpha=alpha)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "M = set_plot_limits(synthetic_data)\n",
    "voronoi(synthetic_cluster_centers, M)\n",
    "plt.scatter(*synthetic_cluster_centers.T, c='red', s=100)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now develop a proper plotting routine for $k$-means:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def plot_data(cluster_centers, dataset, labels, cluster_centers2=None, alpha=0.25):\n",
    "    \n",
    "    M = set_plot_limits(dataset)\n",
    "\n",
    "    voronoi(cluster_centers, M, alpha=0.25)\n",
    "    \n",
    "    plt.scatter(*dataset.T, c=labels, s=75, alpha=alpha)\n",
    "    plt.scatter(*cluster_centers.T, c='green', s=100)\n",
    "    \n",
    "    if cluster_centers2 is not None:\n",
    "        \n",
    "        for p, dp in zip(cluster_centers, cluster_centers2-cluster_centers):\n",
    "            plt.arrow(p[0], p[1], dp[0], dp[1], color='white')\n",
    "        \n",
    "        plt.scatter(*cluster_centers2.T, c='red', s=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot the original clusters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_data(synthetic_cluster_centers, synthetic_data, synthetic_labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_data(synthetic_cluster_centers, synthetic_data, synthetic_labels, alpha=1)\n",
    "plt.xlim(6, 8)\n",
    "plt.ylim(0, 2)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why are there yellow dots inside the blue area?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is because the yellow dots were generated from the yellow cluster, even though they turned out to be closer to the blue cluster center. This is because there might be overlap between clusters in the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we derive the labels using the distance from the cluster center, this won't happen (but then we classify those points to the wrong cluster based on how we generated the data):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_data(synthetic_cluster_centers, synthetic_data, label_dataset(synthetic_cluster_centers, synthetic_data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_data(synthetic_cluster_centers, synthetic_data, label_dataset(synthetic_cluster_centers, synthetic_data), alpha=1)\n",
    "plt.xlim(6, 8)\n",
    "plt.ylim(0, 2)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Animation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now run the animation. Experiment by changing K."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def animate(initial_cluster_centers, data):\n",
    "    \n",
    "    @interact(i=IntSlider(min=0, max=99,step=1,value=0))\n",
    "    def plot(i):\n",
    "\n",
    "        n = i // 2\n",
    "        \n",
    "        cluster_centers, labels, _ = k_means(initial_cluster_centers, data, max_iter=n)\n",
    "        \n",
    "        if i%2==1:\n",
    "            cluster_centers2, _, _ = k_means(initial_cluster_centers, data, max_iter=n+1)\n",
    "        else:\n",
    "            cluster_centers2 = None\n",
    "\n",
    "        plot_data(cluster_centers, data, labels, cluster_centers2=cluster_centers2) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "K = 5\n",
    "initial_cluster_centers = synthetic_data[ :K]\n",
    "animate(initial_cluster_centers, synthetic_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hand Drawn Digits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us try our $k$-means implementation on classifying some hand drawn digits."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first load the hand drawn digit data. The `digits` variable is a dictionary which has a variety of fields. The main fileds of `digits` are:\n",
    "\n",
    "* `digits.data`: an array of digits, where each digit is an array of 64 gray scale values (8x8 pixels)\n",
    "* `digits.target`: for each image, it contains the actual digit it represents\n",
    "\n",
    "Note that the images are already flattened (from 8x8 images to 64-long vectors)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "digits = load_digits()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add a simple drawing function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def draw_digits(rows, cols, digit_clusters, error=None):\n",
    "    \"\"\"Draw clusters of digits\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    rows: number of rows to draw\n",
    "    cols: number of columns to draw\n",
    "    digit_cluster: a sequence whose size is \"rows\". Each entry is an array\n",
    "       of atleast \"cols\" digits (a digit is an array of size 64 containing the image data)\n",
    "    error: (optional) similar to digit cluster, but containing booleans instead of digits\n",
    "       if an entry is true, the corresponding digit is drawn in inverse colors\n",
    "    \"\"\"    \n",
    "    M = np.zeros((2 + 10*rows, 2+10*cols))\n",
    "    \n",
    "    for i in range(min(rows, len(digit_clusters))):\n",
    "        for j in range(min(cols, digit_clusters[i].shape[0])):\n",
    "            x = 2+i*10\n",
    "            y = 2+j*10\n",
    "            img = digit_clusters[i][j].reshape(8, 8)\n",
    "            if error and error[i][j]:\n",
    "                M[x:x+8, y:y+8] = np.max(img) - img\n",
    "            else:\n",
    "                M[x:x+8, y:y+8] = img\n",
    "                \n",
    "    plt.matshow(M/np.max(M), cmap='Greys')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first draw some correctly classified digits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "draw_digits(10, 20, [ digits.data[digits.target==label] for label in range(10)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Without the labels, we just have random digits. Here are the first 200:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "draw_digits(10, 20, digits.data[:200].reshape(10, -1, 64))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To test how well our clustering has done, for each cluster we find the most common digit in that cluster and classify everything closest to that cluster center as that digit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def most_common_digit_in_cluster(K, digit_labels):\n",
    "    p = []\n",
    "    for label in range(K):\n",
    "        real_labels = digits.target[digit_labels==label]\n",
    "        if len(real_labels) > 0:\n",
    "            p.append( np.bincount(real_labels).argmax() )\n",
    "        else:\n",
    "            p.append(-1) # not a real digit\n",
    "    return np.array(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A function to evaluate the result: returns the ratio of correctly classified data points (by the classification rule above) to total number of points:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def evaluate(K, digit_labels):\n",
    "    errors = np.zeros(K)\n",
    "    for label, digit in enumerate(most_common_digit_in_cluster(K, digit_labels)):\n",
    "        errors[label] = np.sum((digit_labels==label) & (digits.target!=digit))\n",
    "    return np.sum(errors)/digits.data.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And a drawing routing that draws classification errors in reversed colors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def draw_labels(rows, cols, K, digit_labels):\n",
    "    clusters = [ [] for _ in range(10) ]\n",
    "    errors = [ [] for _ in range(10) ]\n",
    "\n",
    "    for label, digit in enumerate(most_common_digit_in_cluster(K, digit_labels)):\n",
    "        clusters[digit].extend(digits.data[digit_labels==label])\n",
    "        errors[digit].extend(digits.target[digit_labels==label] != digit)\n",
    "        \n",
    "    clusters = [ np.array(c)  for c in clusters ]\n",
    "    errors = [ np.array(e) for e in errors ]\n",
    "        \n",
    "    draw_digits(rows, cols, clusters, errors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part (e)\n",
    "Let's run $k$-means on the dataset, using the first 10 points as initial cluster centers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "digit_cluster_centers_1, digit_labels_1, digit_distortion_1 = k_means(digits.data[:10], digits.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's evaluate the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "K_1 = digit_cluster_centers_1.shape[0]\n",
    "print( \"Most common digits in each cluster: \", most_common_digit_in_cluster(K_1, digit_labels_1))\n",
    "print( \"Error Rate:\", evaluate(K_1, digit_labels_1) )\n",
    "draw_labels(10, 25, K_1, digit_labels_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because we assign a digit to a cluster according to which digit is most common in a cluster, we might end up with more than one cluster assigned with the same digit. As a result, there might be a digit without any data points assigned to it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part (f): Escaping a local-optimum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithm has terminated and reached a local-optimum. It likely has not reached the global-optimum. We can no longer improve the result by continuing iterations of the algorithm (unless `max_iter` was reached). One way to try to escape the local minimum is run the algorithm multiple times with different starting points, each time initializing with different cluster centers.\n",
    "\n",
    "Let's try that by running $k$-means 50 times, each with 10 random data points as initial cluster centers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "digit_result_50 = np.array([ k_means(np.random.permutation(digits.data)[:10], digits.data) for _ in range(50) ])\n",
    "digit_cluster_centers_50, digit_labels_50, digit_distortion_50 = digit_result_50[np.argmin(digit_result_50[:,2])]\n",
    "K_50 = digit_cluster_centers_50.shape[0]\n",
    "\n",
    "print( \"Most common digits in each cluster: \", most_common_digit_in_cluster(K_50, digit_labels_50))\n",
    "print( \"Error rate:\", evaluate(K_50, digit_labels_50) )\n",
    "draw_labels(10, 25, K_50, digit_labels_50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Has it improved greatly?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cluster centers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What about the cluster centers? What do they look like? How well do they match the digit that appears most common in each cluster?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print( \"Most common digits in each cluster: \", most_common_digit_in_cluster(K_50, digit_labels_50))\n",
    "draw_digits(1, 10, digit_cluster_centers_50.reshape(1, -1, 64))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.4"
  },
  "name": "_merged"
 },
 "nbformat": 4,
 "nbformat_minor": 0
}