{ "metadata": { "name": "", "signature": "sha256:3911bc4d6ad61bdfd7f9ebb93e6bc1128d38541e33d113272ffd4dbe26d78a62" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# EE126 Lab10: RNA sequencing\n", "\n", "The direct sequencing of RNA transcripts, known as RNA Sequencing (http://en.wikipedia.org/wiki/RNA-Seq), has many applications including genome annotation, comprehensive identification of fusions in cancer, discovery of novel isoforms of genes, and genome sequence assembly [Lior Pachter 2011] (http://arxiv.org/abs/1104.3889, http://genomemedicine.com/content/3/11/74/abstract). The problem of RNA sequencing is to figure out how much and what type of RNA is present in a genome at a given moment in time.\n", "\n", "For our purposes, we'll phrase the problem as follows: Given a set of short reads that are sampled from a set of larger genes, how can we find the relative abundance of each gene. That is, given just the short reads, how do we know how frequently each original gene occurs. This process is depicted in Figure 1. (Aside: in the actual paper, these \"genes\" are actually \"transcripts,\" but that's not relevant for us)\n", "\n", "\n", "\n", "####