+ All Categories
Home > Documents > CS273a Lecture 5, Win07, Batzoglou Quality of assemblies—mouse N50 contig length Terminology: N50...

CS273a Lecture 5, Win07, Batzoglou Quality of assemblies—mouse N50 contig length Terminology: N50...

Date post: 19-Dec-2015
Category:
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
24
273a Lecture 5, Win07, Batzoglou Quality of assemblies—mouse Terminology: N50 contig length N50 contig length If we sort contigs from largest to smallest, and Covering the genome in that order, N50 is the le Of the contig that just covers the 50 th percentil 7.7X sequence coverage
Transcript

CS273a Lecture 5, Win07, Batzoglou

Quality of assemblies—mouse

Terminology: N50 contig lengthN50 contig lengthIf we sort contigs from largest to smallest, and startCovering the genome in that order, N50 is the lengthOf the contig that just covers the 50th percentile.

7.7X sequence coverage

CS273a Lecture 5, Win07, Batzoglou

Quality of assemblies—dog

7.5X sequence coverage

CS273a Lecture 5, Win07, Batzoglou

Quality of assemblies—chimp

3.6X sequence Coverage

AssistedAssembly

CS273a Lecture 5, Win07, Batzoglou

History of WGA

• 1982: -virus, 48,502 bp

• 1995: h-influenzae, 1 Mbp

• 2000: fly, 100 Mbp

• 2001 – present human (3Gbp), mouse (2.5Gbp), rat*, chicken, dog, chimpanzee,

several fungal genomes

Gene Myers

Let’s sequence the human

genome with the shotgun

strategy

That is impossible, and

a bad idea anyway

Phil Green

1997

CS273a Lecture 5, Win07, Batzoglou

Some new sequencing technologies

CS273a Lecture 5, Win07, Batzoglou

Molecular Inversion Probes

CS273a Lecture 5, Win07, Batzoglou

Illumina Genotype Arrays

CS273a Lecture 5, Win07, Batzoglou

Single Molecule Array for Genotyping—Solexa

CS273a Lecture 5, Win07, Batzoglou

Nanopore Sequencing

http://www.mcb.harvard.edu/branton/index.htm

CS273a Lecture 5, Win07, Batzoglou

Pyrosequencing on a chip

Mostafa Ronaghi, Stanford Genome Technologies Center

454 Life Sciences

CS273a Lecture 5, Win07, Batzoglou

Polony Sequencing

CS273a Lecture 5, Win07, Batzoglou

Short read sequencing protocol

• Random, high-coverage clone library (CovG = 7 – 10x)

• Low-coverage of clone by reads (CovR = 1 – 2x)

1234 1235

FRAGMENT

genome

clones

AMPLIFY & READ

12351234

reads

CovG

CovR

CS273a Lecture 5, Win07, Batzoglou

Short read sequencing protocol

RANDOMLY SELECT 200,000 FRAGMENTS

CLONE

FRAGMENT AND SELECT 150KB SEGMENTS

FRAGMENT

A C G A

bead attachment primer

adapter

clone id tag

LIGATE ADAPTERS

166 clone batch

CLONE ON BEADS BY PCR EMULSION

ACGATGATCGATGATTAC...

TGCTCAGACTTAGCTATT...

CAATTTATATCAGAGACA...

ACGAAATCGAGAGCAAGA...

clone id tag

SEQUENCE 250,000 READS ON PLATE

sequenceread

1200 plates

ASSEMBLY

target genome

“clones”

CS273a Lecture 5, Win07, Batzoglou

Ordering clones into clone contigs

293

1001

1234

882

7

94

clone graph

NODE CONTRACTION

clone contig1234

293

100194

7882

CS273a Lecture 5, Win07, Batzoglou

Contig assembly

CONSTRUCT READ SETS

Euler assembler

intersection read set

subtraction read set

CS273a Lecture 5, Win07, Batzoglou

Contig assemblyCONTIG

ASSEMBLY 1: READ SETS

CONSTRUCT READ SETS

Euler assembler

CONSTRUCT CONTIG SETS

CONTIG ASSEMBLY 2: CONTIG SETS

Euler assembler

CONTIG ASSEMBLY 3:

CLONE CONTIGS

assembly

intersection read set

subtraction read set

contig set

CS273a Lecture 5, Win07, Batzoglou

Assembly quality

Sequence CoverageContig N50 (Kb)

Base quality (Q)

Misassemblies (#/Mb)

Small indels (#/Mb)

D. Melanogaster(118 Mb)

94.2% 160.2 38.4 2.5 1.6

Human chr21(34 Mb)

97.5% 79.0 35.6 1.9 2.3

Human chr11(131 Mb)

96.3% 57.4 34.4 2.8 1.9

Human chr1(223 Mb)

96.2% 63.0 34.4 3.0 2.0

Read length = 200 bp, Error rate = 1%, Net coverage = 20.0x

CS273a Lecture 5, Win07, Batzoglou

Some future directions for sequencing

1. Personalized genome sequencing• Find your ~3,000,000 single nucleotide polymorphisms (SNPs)

• Find your rearrangements

• Goals:• Link genome with phenotype• Provide personalized diet and medicine• (???) designer babies, big-brother insurance companies

• Timeline:• Inexpensive sequencing: 2010-2015• Genotype–phenotype association: 2010-???• Personalized drugs: 2015-???

CS273a Lecture 5, Win07, Batzoglou

Some future directions for sequencing

2. Environmental sequencing• Find your flora: organisms living in your body

• External organs: skin, mucous membranes• Gut, mouth, etc.

• Normal flora: >200 species, >trillions of individuals

• Flora–disease, flora–non-optimal health associations

• Timeline:• Inexpensive research sequencing: today• Research & associations within next 10 years• Personalized sequencing 2015+

• Find diversity of organisms living in different environments• Hard to isolate• Assembly of all organisms at once

CS273a Lecture 5, Win07, Batzoglou

Some future directions for sequencing

3. Organism sequencing• Sequence a large fraction of all organisms

• Deduce ancestors• Reconstruct ancestral genomes• Synthesize ancestral genomes• Clone—Jurassic park!

• Study evolution of function• Find functional elements within a genome• How those evolved in different organisms• Find how modules/machines composed of many genes evolved

CS273a Lecture 5, Win07, Batzoglou

Multiple Sequence Alignment

CS273a Lecture 5, Win07, Batzoglou

CS273a Lecture 5, Win07, Batzoglou

Genome Evolution – Macro Events

• Inversions• Deletions• Duplications

CS273a Lecture 5, Win07, Batzoglou

Synteny maps

Comparison of human and mouse


Recommended