{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# EE16B: Homework 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you work on the problem below, you might want to refresh a bit on iPython and Numpy. An updated version of the tutorial released last semester is available <a target=\"_blank\" href=\"http://inst.eecs.berkeley.edu/~ee16a/fa15/lab/lab0/iPython%20Tutorial%20EE16.ipynb\">here</a>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Rich (or Not) with Linear Regression\n",
    "In this problem we'll remind ourselves about how to use least squares to perform linear regressions by looking at stock market data. All data used in this problem is courtesy of <a target=\"_blank\" href=\"http://finance.yahoo.com/q/hp?s=%5Eixic+historical+prices\">Yahoo Finance</a>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from pylab import *\n",
    "\n",
    "# Plots graphs in the notebook\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####a) Plotting the NASDAQ composite index\n",
    "The file `nasdaq1.csv` contains the weekly NASDAQ composite index values since January 2014. Plot the data against time in weeks, assuming the latest datum is week 0.  <b>What is the simplest model you might use to try and fit this data?</b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Load the data\n",
    "nasdaq1 = []\n",
    "with open('nasdaq1.csv', 'r') as f:\n",
    "    nasdaq1 = [i.split(\",\") for i in f.read().split()]\n",
    "    \n",
    "# Convert nasdaq1 into a numpy array\n",
    "nasdaq1 = np.array(nasdaq1)\n",
    "# Print the first 5 data points\n",
    "print(\"Sample data points:\")\n",
    "print(nasdaq1[0:5,:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find how many weeks there are\n",
    "datalen1 = # YOUR CODE HERE #\n",
    "print(\"There are \" + str(datalen1) + \" weeks in nasdaq1.\")\n",
    "\n",
    "# Week indices corresponding to the data\n",
    "weeks1 = range(-datalen1+1,1)[::-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(\"Last date: \" + nasdaq1[0,0])\n",
    "print(\"First date: \" + nasdaq1[-1,0])\n",
    "\n",
    "# Plot the data against weeks1\n",
    "plot(weeks1, nasdaqvals) \n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen1+1,0))\n",
    "ylabel(\"Value\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####b) Linear regression\n",
    "Using what you learned from 16A perform a linear regression to find the appropriate coefficients for the stock market model, find the 2-norm of the error vector, and plot the resulting model along with the original data.\n",
    "\n",
    "The error vector is the difference between the real values and our model at each time step. The 2-norm is defined as the square root of the inner product of the vector with itself. (you might find the functions <a href=\"http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.norm.html\">numpy.norm</a>, and <a href=\"http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ndarray.astype.html\">numpy.astype</a> helpful)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Generate matrix A for linear regression\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "# Generate vector b for linear regression\n",
    "# The function ndarray.astype(float) must be used to convert the data into floats\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "# Perform least squares\n",
    "# YOUR CODE HERE\n",
    "\n",
    "print(\"Coeffecients:\")\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find the error vector for 86 weeks of data (place it in variable e)\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "# Print the norm of the error vector\n",
    "print(\"Error vector norm: \" + str(np.linalg.norm(e)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Plot the data\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "# Plot your prediction using the model\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen1+1,0))\n",
    "ylabel(\"Value\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####c) Looking back to history\n",
    "Since we obviously don't have data for the future available to check against, let's see how well our model does against past data. `nasdaq5.csv` contains similar data, but since January 2010. Plot the NASDAQ composite index over the past 5 years along with the prediction of your model and calculate the norm of the error vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Load the data\n",
    "nasdaq5 = []\n",
    "with open('nasdaq5.csv', 'r') as f:\n",
    "    nasdaq5 = [i.split(\",\") for i in f.read().split()]\n",
    "    \n",
    "# Convert nasdaq5 into a numpy array\n",
    "nasdaq5 = np.array(nasdaq5)\n",
    "\n",
    "# Find how many weeks there are\n",
    "datalen5 = nasdaq5.shape[0] \n",
    "print(\"There are \" + str(datalen5) + \" weeks in nasdaq5.\")\n",
    "\n",
    "# Week indices corresponding to the data\n",
    "weeks5 = range(-datalen5+1,1)[::-1]\n",
    "\n",
    "print(\"Last date: \" + nasdaq5[0,0])\n",
    "print(\"First date: \" + nasdaq5[-1,0])\n",
    "\n",
    "# Plot the data against weeks5\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen5+1,0))\n",
    "ylabel(\"Value\")\n",
    "\n",
    "# Generate matrix A5 for prediction\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "# Plot your prediction\n",
    "# YOUR CODE HERE #\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find the error vector and norm for the 5 year data\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Error vector norm: \" + str(np.linalg.norm(e)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####d) Another model?\n",
    "Redo parts (b) and (c), but try using a quadratic model instead of a linear one. This adds another degree of freedom and could potentially be a better model for stock behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Set up a new model and find its coefficients\n",
    "# Let's try a quadratic model y = ax^2 + bx + c\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Coefficients:\")\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find and calculate the error vector for 86 weeks of data and its norm\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Norm of error vector: \" + str(np.linalg.norm(e)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Plot the 86 weeks of data and the prediction based on your new model\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen1+1,0))\n",
    "ylabel(\"Value\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Plot the 5 year data and the prediction based on your new model\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen5+1,0))\n",
    "ylabel(\"Value\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find the error vector and norm for the 5 year data\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Norm of error vector: \" + str(np.linalg.norm(e)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####e) What about logarithms?\n",
    "Many economists say that the stock market (roughly) follows a logarithmic trajectory. Let's try using a logarithmic regression model instead of a linear or quadratic one.\n",
    "\n",
    "Hint: Try taking the log (base10) of the stock data, and make a linear regression model using this new data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Set up a new, logarithmic model and find its coefficients\n",
    "# Try taking the log base10 of the stock data.\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "print(\"Coefficients:\")\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find and calculate the error vector for the 86 week data and its norm\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Norm of error vector: \" + str(np.linalg.norm(e)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Plot the 86 week data and the prediction based on your new model\n",
    "# YOUR CODE HERE \n",
    "\n",
    "\n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen1+1,0))\n",
    "ylabel(\"Value\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find and calculate the error vector for the 86 week data and its norm\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Norm of error vector: \" + str(np.linalg.norm(e)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Plot the 5 year data and the prediction based on your new model\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "\n",
    "title(\"NASDAQ Composite Index\")\n",
    "xlabel(\"Time (weeks from now)\")\n",
    "xlim((-datalen5+1,0))\n",
    "ylabel(\"Value\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Find the error vector and norm for the 5 year data\n",
    "# YOUR CODE HERE #\n",
    "\n",
    "print(\"Error vector norm: \" + str(np.linalg.norm(e)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}