{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lab 12 - RNA Sequencing through Expectation Maximization\n",
"\n",
"#### Authors:\n",
"\n",
"v1.0 (2014 Fall) Rishi Sharma \\*\\*\\*, Sahaana Suri \\*\\*\\*, Paul Rigge \\*\\*\\*, Kangwook Lee \\*\\*\\*, Kannan Ramchandran \\*\\*\\*
\n",
"v1.1 (2015 Fall) Kabir Chandrasekher \\*\\*, Max Kanwal \\*\\*, Kangwook Lee \\*\\*\\*, Kannan Ramchandran \\*\\*\\*
\n",
"v1.2 (2016 Spring) Kabir Chandrasekher, Tony Duan, David Marn, Ashvin Nair, Kangwook Lee, Kannan Ramchandran"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"\n",
"\n",
"- [Introduction](#Introduction)\n",
"- [MLE for a simple model](#Question-1----Simple-Model)\n",
"- [MLE for a harder model](#Question-2----Harder-Model)\n",
"- [EM algorithm for the harder model](#Question-3----EM-Algorithm)\n",
"- [References](#References)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"The problem of [RNA sequencing](http://en.wikipedia.org/wiki/RNA-Seq) is to figure out how much and what type of RNA is present in a genome at a given moment in time. It has many applications including genome annotation, comprehensive identification of fusions in cancer, discovery of novel isoforms of genes, and genome sequence assembly [[1][2]](#References).\n",
"\n",
"For our purposes, we'll formulate the problem as follows: given a set of short reads that are sampled from a set of larger genes, how can we find the relative abundance of each gene? That is, given just the short reads, how do we know how frequently each original gene occurs? This process is depicted in Figure 1. (Aside: in the actual paper, these \"genes\" are actually \"transcripts,\" but that's not relevant for us.)\n",
"\n",
"\n",
"\n",
"####