MAKER 2014 What It Is Where It’s Been Where It’s Going

Post on 24-Feb-2016

31 views 0 download

Tags:

description

MAKER 2014 What It Is Where It’s Been Where It’s Going. Daniel Ence Yandell Lab University of Utah. What Are Annotations?. Annotations are descriptions of features of the genome Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes - PowerPoint PPT Presentation

transcript

MAKER 2014What It Is

Where It’s BeenWhere It’s Going

Daniel EnceYandell Lab

University of Utah

What Are Annotations? Annotations are descriptions of features of the genome

Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes

Annotations should include evidence trail Assists in quality control of genome annotations

Examples of evidence supporting a structural annotation: Ab initio gene predictions ESTs Protein homology

Secondary Annotation Protein Domains and Families

InterPro Pfam

GO and other ontologies Pathways

Genome Project Overview

Genome Project Overview

Genome Project Overview

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

Genome Project Overview

>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS

MAKERAn annotation pipeline and genome-database management tool for “next-generation” genome projects

MAKERUser Requirements:

Can be run by a single individual with little bioinformatics experience

MAKERUser Requirements:

Can be run by a single individual with little bioinformatics experience

System Requirements: Can run on Linux or Mac OS X based systems

MAKERUser Requirements:

Can be run by a single individual with little bioinformatics experience

System Requirements: Can run on Linux or Mac OS X based systemsProgram Output:

Output is compatible with popular annotation tools like Web-Apollo and JBrowse

MAKERUser Requirements:

Can be run by a single individual with little bioinformatics experience

System Requirements: Can run on Linux or Mac OS X based systemsProgram Output:

Output is compatible with popular annotation tools like Web-Apollo and JBrowse

Availability: Free for the academic community (including source code)

Beyond de novo annotation

• mRNA-seq integration

• Integrating new evidence into existing databases

• Update/revise legacy annotation sets

Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n

new data

• Identify legacy annotation most consistent with new data• Automatically revise it in light of new data• If no existing annotation, create new one

current assembly

Beyond de novo annotation

Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n

new data

• Identify legacy annotation most consistent with new data• Automatically revise it in light of new data• If no existing annotation, create new one

current assembly

Beyond de novo annotation

Distributed Parallelization

• Supports Message Passing Interface (MPI), a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.

Data throughput

What happened in 2013?

What happened in 2013? MAKER-P

What happened in 2013? MAKER-P

Plant

What happened in 2013? MAKER-P

Plant Parallelized

What happened in 2013? MAKER-P

Plant Parallelized Publication

What happened in 2013 Publication:MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations

Campbell, Law, Holt et al., Plant Phys. 2013

MAKER-P at iPlant Atmosphere

MPI enabled for parallel computation Maximum instance size 16 CPU http://www.iplantcollaborative.org

TACC Lonestar Supercomputer with 22,656 CPU MPI enabled for parallel computation Can complete entire rice genome in ~2 hrs (1,152

cores) 96 CPU per chromosome

Currently being integrated into the iPlant Discovery Environment http://www.iplantcollaborative.org

XSEDE https://www.xsede.org

Data throughputPerformance on Zea maize genome (~ 2Gb)

Pinus taeda

8,640 cpus on TACC ~37 hours with queue (runtime 14 hours 37 minutes) Throughput of > 1 Gb/hour

Assembly & Annotation at iPlant

Added to MAKER-P non-coding RNA support better repeat annotation better pseudogene annotation

non-coding RNA annotation

tRNAscan support Will run from inside MAKER Doesn’t install automatically

snoScan support Can supply data file for annotation Will run from inside automatically Doesn’t install automatically

Better Repeat Annotation In the past:

Custom Repeat library de novo generated RepeatModeler

Now: RepeatModeler, but better. Step-by-step guide available at:

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic

To be automated in the future

What’s Coming in 2014? Expanded ncRNA support MAKER-EVM Expanded Augustus/bam support Better integration with iPlant’s Discovery

environment

Expanded ncRNA annotation More of a feeling than a to-do list lncRNAs

MAKER Evidence Modeler

Haas et al., Genome Biology 2008

MAKER Evidence Modeler

Cantarel et al., 2008; Holt and Yandell, 2010

MAKER Evidence Modeler

Cantarel et al., 2008; Holt and Yandell, 2010

EVM

Better Augustus support MAKER gives Augustus hints Augustus can take better hints from a

bam file Users will be able to supply a bam file in

the MAKER control file Bam files open up a world of possibilities!

Assembly & Annotation at iPlant

Future Annotations• Trichmonas

vaginalis• Pinus taeda• Apis dorsata• Cronartium

quercuum• Common Pigeon• Cardiocondyla

obscurior

• Southern right whale

• Tardigrade• Spotted Gar• Gibbon• Turkey• 9 spined

stickelback• Golden Eagle

Acknowledgements• I’d like to thank and recognize all contributions from Mark Yandell at the University of Utah,

as well as lab members Barry Moore, Michael Campbell, Daniel Ence, and former lab member Meiyee Law.

• Special thank you to Scott Cain, Robert Buels, and Amelia Ireland.• I would also like to recognize collaborators Ian Korf at UC Davis• MAKER-P and integration into iPlant

infrastructure:• Josh Stein (CSHL)• Kevin Childs (MSU)• Gaurav Moghe (MSU)• David Hufnagel (MSU)• Jikai Lei (MSU)• Rujira Achawanantakun (MSU)• Carolyn Lawrence (USDA-ARS CICGRU)• Doreen Ware (CSHL)• Shin-Han Shiu (MSU)• Yanni Sun (MSU)• Ning Jiang (MSU)• Matt Vaughn (TACC)• Dian Jiao (TACC)• Zhenyuan Lu (CSHL)• Nirav Merchant (U. Arizona)

• Pinus taeda genome project:• Jill Wegrzyn (UConn)• John Liechty (UC Davis)• Kristian Stevens (UC Davis)• Carol Loopstra (Texas A&M)• Hans Vasquez-Gross (UC Davis)• Brian Lin (UC Davis)• Matt Dougherty (UC Davis)• Jacob Zieve (UC Davis)• Pedro J Martinez-Garcia (UC Davis)• James A Yorke (U. Maryland(• Marc Crepeau (UC Davis)• Daniela Puiu (Johns Hopkins)• Steven L Salzberg (Johh Hopkins)• Pieter J. deJong (CHORI-BACPAC Resources Center)• Keithanne Mockaitis (Indiana University)• Dorrie Main (Washington State)• Chuck Langley (UC Davis)• David Neale (UC Davis)• MAKER-devel community

• Funding from the NHGRI through an RO1 grant entitled Software for the creation and quality control of genome annotations.

Get in Touch!Mailing List:maker-devel at yandell-lab.org

Download:http://yandell-lab.org/software/maker.html

Email me:dence at genetics.utah.edu