+ All Categories
Home > Science > GMOD 2014 MAKER Lecture

GMOD 2014 MAKER Lecture

Date post: 23-Aug-2014
Category:
Upload: barrymoore
View: 380 times
Download: 4 times
Share this document with a friend
Description:
Lecture for the MAKER2 Tutorial for the GMOD 2014 Summer Training
Popular Tags:
55
MAKER The Genome Annotation Pipeline GMOD Summer Course May 19, 2014 Barry Moore/Carson Holt Yandell Lab University of Utah
Transcript
Page 1: GMOD 2014 MAKER Lecture

MAKERThe Genome Annotation Pipeline

GMOD Summer CourseMay 19, 2014

Barry Moore/Carson HoltYandell Lab

University of Utah

Page 2: GMOD 2014 MAKER Lecture

MAKER

• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER

Page 3: GMOD 2014 MAKER Lecture

What are Annotations?Fu

nctio

nal

Stru

ctur

al

FunctioncAMP-dependent and sulfonylurea-sensitive anion transporter. Key gatekeeper influencing intracellular cholesterol transport.

Subcellular location Membrane; Multi-pass membrane protein Ref.13 Ref.14.

Domain

Multifunctional polypeptide with two homologous halves, each containing a hydrophobic membrane-anchoring domain and an ATP binding cassette (ABC) domain.

Page 4: GMOD 2014 MAKER Lecture

Genomes Online Database

http://www.genomesonline.org/

1998 2000 2002 2004 2006 2008 2010 20120

1000

2000

3000

4000

5000

6000

7000

8000

9000

Genome Project Status

IncompleteComplete

Year

Geno

mes

Page 5: GMOD 2014 MAKER Lecture

http://www.genome.gov/

Page 6: GMOD 2014 MAKER Lecture

http://www.genome.gov/

100

1,600

3,200

4,800

6,400

8,000

0

Page 7: GMOD 2014 MAKER Lecture
Page 8: GMOD 2014 MAKER Lecture

Next Gen Genome Annotation 2013-14

• Coelacanth• Pine• Sacred Lotus• Conus ballatus• Pigeon• King Cobra• Hymenopterids

• Fusarium cirinatum• Cardiocondyla

obscurior• Burmese Python• Sarcocystis neurona• Spotted Gar• Apple magot fly

Page 9: GMOD 2014 MAKER Lecture

The ‘NextGen’ Genome ProjectLab/Small Group FundingShort-read Genome SequencingRNASeq DataGenome/Transcriptome AssemblyGene AnnotationGenome Database / Blast ServerManual curationNew assemblyReannotate/Merge annotations

Page 10: GMOD 2014 MAKER Lecture

• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER

MAKER

Page 11: GMOD 2014 MAKER Lecture

The Source of Annotations

RNA and Protein

Evidence

AccurateGene

Annotations

Ab Initio Computational

Evidence

Page 12: GMOD 2014 MAKER Lecture

Annotating the Genome – Apollo View

current evidence

gene annotations

genome assembly

http://apollo.berkeleybop.org/

Page 13: GMOD 2014 MAKER Lecture

Identify and mask repetitive elements

current evidence

genome assembly

http://www.repeatmasker.org

Page 14: GMOD 2014 MAKER Lecture

Generate ab initio gene predictions

ab initio predictionsSNAPGeneMark

Augustus

current evidence

genome assembly

http://korflab.ucdavis.edu/

Page 15: GMOD 2014 MAKER Lecture

Align RNA and protein evidence

ab initio predictions

protein - BLASTXEST - BLASTN

altEST - TBLASTX

current evidence

genome assembly

http://blast.ncbi.nlm.nih.gov

Page 16: GMOD 2014 MAKER Lecture

Polish BLAST alignments with Exonerate

ab initio predictions

polished proteinpolished EST

current evidence

genome assembly

http://www.ebi.ac.uk/~guy/exonerate/

Page 17: GMOD 2014 MAKER Lecture

current evidence

Pass gene-finders evidence-based ‘hints’

ab initio predictions

Hint-based SNAP Hint-based Augustus

genome assembly

Page 18: GMOD 2014 MAKER Lecture

current evidence

Identify gene model most consistent with evidence

ab initio predictions*Hint-based SNAP Hint-based Augustus

genome assembly

Page 19: GMOD 2014 MAKER Lecture

current evidence

Revise further if necessary; create new annotation

ab initio predictions

genome assembly

Page 20: GMOD 2014 MAKER Lecture

Compute support for each portion of gene model

Eilbeck et al BMC Bioinformatics 2009

genome assembly

Page 21: GMOD 2014 MAKER Lecture

Compute support for each portion of gene model

Cantarel BL et al., Genome Res 2008

genome assembly

Page 22: GMOD 2014 MAKER Lecture

GFF3

FASTA

Page 23: GMOD 2014 MAKER Lecture
Page 24: GMOD 2014 MAKER Lecture

MAKER2 Workflow

Page 25: GMOD 2014 MAKER Lecture

MAKER2 Distributed Workflow

Page 26: GMOD 2014 MAKER Lecture

ParalellizationEfficiency

Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.

30 GB Pine genome annotated in 37 hrs on

6,000 CPUs at the TACC

Page 27: GMOD 2014 MAKER Lecture

• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER

MAKER

Page 28: GMOD 2014 MAKER Lecture

MAKERThe Genome Annotation PipelineMaintenance and Management

^GMOD Summer Course

May 19, 2014

Barry Moore/Carson HoltYandell Lab

University of Utah

Page 29: GMOD 2014 MAKER Lecture

MAKER2 Use Cases

1. De novo annotation providing quality metrics2. Merging multiple annotation sets3. Re-annotation with new evidence4. Mapping annotations forward to a new

assembly5. Generating GMOD Compliant Output

1. Gbrowse/JBrowse2. Apollo3. Tripal

Page 30: GMOD 2014 MAKER Lecture

Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality

Gold Standard Genes

Page 31: GMOD 2014 MAKER Lecture

SN SP AC

1.0 1.0 100%

Gold Standard Genes

Perfect Accuracy

Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality

Page 32: GMOD 2014 MAKER Lecture

SN SP AC

1.0 1.0 100%

1.0 0.5 80%

Gold Standard Genes

Perfect Accuracy

Poor Specificity

Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality

Page 33: GMOD 2014 MAKER Lecture

SN SP AC

1.0 1.0 100%

1.0 0.5 80%

0.5 1.0 80%

Gold Standard Genes

Perfect Accuracy

Poor Specificity

Poor Sensitivity

Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality

Page 34: GMOD 2014 MAKER Lecture

SN SP AC

1.0 1.0 100%

1.0 0.5 80%

0.5 1.0 80%

0.5 0.5 50%

Gold Standard Genes

Perfect Accuracy

Poor Specificity

Poor Sensitivity

Poor Specificityand Sensitivity

Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality

Guigó R et al. Genome Biol. 2006

Page 35: GMOD 2014 MAKER Lecture

MAKER vs. Predictors

Holt C, Yandell M. BMC Bioinformatics. 2011

Page 36: GMOD 2014 MAKER Lecture

MAKER vs. Predictors(the wrong HMM...)

Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.

Page 37: GMOD 2014 MAKER Lecture

Annotation Edit Distance

Gold Standard GenesGold StandardEvidence

Protein Alignments

EST Alignments

mRNASeq

Eilbeck et al BMC Bioinformatics 2009

Page 38: GMOD 2014 MAKER Lecture

Annotation Edit Distance

SN SP AED

1.0 1.0 0.0

1.0 0.5 0.2

0.5 1.0 0.2

0.5 0.5 0.5

Gold StandardEvidence

Perfect Accuracy

Poor Specificity

Poor Sensitivity

Poor Specificityand Sensitivity

Eilbeck et al BMC Bioinformatics 2009

Page 39: GMOD 2014 MAKER Lecture

AED as a Measure of Genome Wide Annotation Quality

Eilbeck et al BMC Bioinformatics 2009

Page 40: GMOD 2014 MAKER Lecture

TAIR Star Rating System

http://www.arabidopsis.org/

Page 41: GMOD 2014 MAKER Lecture

AED Agrees well with the TAIR star system

Evidence: mRNA-seq (17 experiments), ESTs, full length cDNAs, Swiss-Prot (minus Arabidopsis)

0 0.25 0.5 0.75 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

***** (7,880)

**** (12,654)

*** (2,087)

** (2,188)

* (1,788)

(604)

AED

Cum

ulat

ive

Frac

tion

of A

nnot

atio

ns

Page 42: GMOD 2014 MAKER Lecture

Holt C, Yandell M. BMC Bioinformatics. 2011

AED as a Measure of Annotation Quality

Page 43: GMOD 2014 MAKER Lecture

MAKER Annotations Match the Evidence Well

0 0.25 0.5 0.75 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

TAIR10 rep transcripts (27,206)MAKER de novo (25,956)MAKER update of TAIR10 (26,885)

AED

Cum

ulat

ive

Frac

tion

of A

nnot

atio

ns

0 0.25 0.5 0.75 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

chr10 rep transcripts (2,688)MAKER de novo (3,056)MAKER update of v3 (2,661)

AED

Cum

ulat

ive

Frac

tion

of A

nnot

atio

ns

A. thaliana Z. mays

Campbell et al, 2013 submitted

Page 44: GMOD 2014 MAKER Lecture

Protein Domain ContentAs a Measure of Annotation Quality

Holt C, Yandell M. BMC Bioinformatics. 2011

Page 45: GMOD 2014 MAKER Lecture

MAKER vs. Predictors

Holt C, Yandell M. BMC Bioinformatics. 2011

Page 46: GMOD 2014 MAKER Lecture

• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER

MAKER

Page 47: GMOD 2014 MAKER Lecture

http://derringer.genetics.utah.edu/cgi-bin/mwas/maker.cgi

Page 48: GMOD 2014 MAKER Lecture
Page 49: GMOD 2014 MAKER Lecture

MAKER Installation• Automated query/answer based installation

script.• Installs Perl prerequisites.• Installs necessary executables

– RepeatMasker (RepBase)– BLAST+– Exonerate– SNAP

• Even installs MWAS and MPICH2

Page 50: GMOD 2014 MAKER Lecture

MAKER Runtime Features

• Fill out a config file with input data and parameters

• Parallelize:– Running with MPI– Simply start multiple instances in the same

directory.• Re-run MAKER in the same directory and it

won't redo completed work.• Restart aborted jobs without losing any work.

Page 51: GMOD 2014 MAKER Lecture

Accessory ScriptsOver 30 accessory scripts:

•cegma2zff•chado2gff3•cufflinks2gff3•gff3_2_gtf•gff3_preds2models•gff3_to_eval_gtf•maker2chado•maker2jbrowse•maker2zff•tophat2gff3•compare•evaluator•gff3_merge•fasta_merge•fasta_tool

•fix_fasta•genemark_gtf2gff3•ipr_update_gff•iprscan2gff3iprscan_batch•iprscan_wrap•maker_functional•maker_functional_fasta•maker_functional_gff•maker_map_ids•map2assembly•map_data_ids•map_fasta_ids•map_gff_ids•split_fasta

Page 52: GMOD 2014 MAKER Lecture

• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER

MAKER

Page 53: GMOD 2014 MAKER Lecture
Page 54: GMOD 2014 MAKER Lecture
Page 55: GMOD 2014 MAKER Lecture

Acknowledgements• Mark Yandell

– Carson Holt– Mike Campbell– Daniel Ence– Steven Flygare– Zev Kronenberg– Qing Li– Marc Singleton– Bretty Kennedy– Brandi Cantarel– Hadi Islam

• Karen Eilbeck– Shawn Reynearson– Nicole Ruiz– Keith Simmons– Bret Heale

• Alejandro Alvarado– Eric Ross

• Jason Stajich• Sophia Robb• Kevin Childs• Shin-Han Shui• Ning Jiang• Yanni Sun

NSF IOS-1126998


Recommended