Post on 24-Feb-2016
description
transcript
MAKER 2014What It Is
Where It’s BeenWhere It’s Going
Daniel EnceYandell Lab
University of Utah
What Are Annotations? Annotations are descriptions of features of the genome
Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes
Annotations should include evidence trail Assists in quality control of genome annotations
Examples of evidence supporting a structural annotation: Ab initio gene predictions ESTs Protein homology
Secondary Annotation Protein Domains and Families
InterPro Pfam
GO and other ontologies Pathways
Genome Project Overview
Genome Project Overview
Genome Project Overview
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
Genome Project Overview
>Smg5MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFVSUCCESS
MAKERAn annotation pipeline and genome-database management tool for “next-generation” genome projects
MAKERUser Requirements:
Can be run by a single individual with little bioinformatics experience
MAKERUser Requirements:
Can be run by a single individual with little bioinformatics experience
System Requirements: Can run on Linux or Mac OS X based systems
MAKERUser Requirements:
Can be run by a single individual with little bioinformatics experience
System Requirements: Can run on Linux or Mac OS X based systemsProgram Output:
Output is compatible with popular annotation tools like Web-Apollo and JBrowse
MAKERUser Requirements:
Can be run by a single individual with little bioinformatics experience
System Requirements: Can run on Linux or Mac OS X based systemsProgram Output:
Output is compatible with popular annotation tools like Web-Apollo and JBrowse
Availability: Free for the academic community (including source code)
Beyond de novo annotation
• mRNA-seq integration
• Integrating new evidence into existing databases
• Update/revise legacy annotation sets
Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n
new data
• Identify legacy annotation most consistent with new data• Automatically revise it in light of new data• If no existing annotation, create new one
current assembly
Beyond de novo annotation
Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n
new data
• Identify legacy annotation most consistent with new data• Automatically revise it in light of new data• If no existing annotation, create new one
current assembly
Beyond de novo annotation
Distributed Parallelization
• Supports Message Passing Interface (MPI), a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.
Data throughput
What happened in 2013?
What happened in 2013? MAKER-P
What happened in 2013? MAKER-P
Plant
What happened in 2013? MAKER-P
Plant Parallelized
What happened in 2013? MAKER-P
Plant Parallelized Publication
What happened in 2013 Publication:MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations
Campbell, Law, Holt et al., Plant Phys. 2013
MAKER-P at iPlant Atmosphere
MPI enabled for parallel computation Maximum instance size 16 CPU http://www.iplantcollaborative.org
TACC Lonestar Supercomputer with 22,656 CPU MPI enabled for parallel computation Can complete entire rice genome in ~2 hrs (1,152
cores) 96 CPU per chromosome
Currently being integrated into the iPlant Discovery Environment http://www.iplantcollaborative.org
XSEDE https://www.xsede.org
Data throughputPerformance on Zea maize genome (~ 2Gb)
Pinus taeda
8,640 cpus on TACC ~37 hours with queue (runtime 14 hours 37 minutes) Throughput of > 1 Gb/hour
Assembly & Annotation at iPlant
Added to MAKER-P non-coding RNA support better repeat annotation better pseudogene annotation
non-coding RNA annotation
tRNAscan support Will run from inside MAKER Doesn’t install automatically
snoScan support Can supply data file for annotation Will run from inside automatically Doesn’t install automatically
Better Repeat Annotation In the past:
Custom Repeat library de novo generated RepeatModeler
Now: RepeatModeler, but better. Step-by-step guide available at:
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic
To be automated in the future
What’s Coming in 2014? Expanded ncRNA support MAKER-EVM Expanded Augustus/bam support Better integration with iPlant’s Discovery
environment
Expanded ncRNA annotation More of a feeling than a to-do list lncRNAs
MAKER Evidence Modeler
Haas et al., Genome Biology 2008
MAKER Evidence Modeler
Cantarel et al., 2008; Holt and Yandell, 2010
MAKER Evidence Modeler
Cantarel et al., 2008; Holt and Yandell, 2010
EVM
Better Augustus support MAKER gives Augustus hints Augustus can take better hints from a
bam file Users will be able to supply a bam file in
the MAKER control file Bam files open up a world of possibilities!
Assembly & Annotation at iPlant
Future Annotations• Trichmonas
vaginalis• Pinus taeda• Apis dorsata• Cronartium
quercuum• Common Pigeon• Cardiocondyla
obscurior
• Southern right whale
• Tardigrade• Spotted Gar• Gibbon• Turkey• 9 spined
stickelback• Golden Eagle
Acknowledgements• I’d like to thank and recognize all contributions from Mark Yandell at the University of Utah,
as well as lab members Barry Moore, Michael Campbell, Daniel Ence, and former lab member Meiyee Law.
• Special thank you to Scott Cain, Robert Buels, and Amelia Ireland.• I would also like to recognize collaborators Ian Korf at UC Davis• MAKER-P and integration into iPlant
infrastructure:• Josh Stein (CSHL)• Kevin Childs (MSU)• Gaurav Moghe (MSU)• David Hufnagel (MSU)• Jikai Lei (MSU)• Rujira Achawanantakun (MSU)• Carolyn Lawrence (USDA-ARS CICGRU)• Doreen Ware (CSHL)• Shin-Han Shiu (MSU)• Yanni Sun (MSU)• Ning Jiang (MSU)• Matt Vaughn (TACC)• Dian Jiao (TACC)• Zhenyuan Lu (CSHL)• Nirav Merchant (U. Arizona)
• Pinus taeda genome project:• Jill Wegrzyn (UConn)• John Liechty (UC Davis)• Kristian Stevens (UC Davis)• Carol Loopstra (Texas A&M)• Hans Vasquez-Gross (UC Davis)• Brian Lin (UC Davis)• Matt Dougherty (UC Davis)• Jacob Zieve (UC Davis)• Pedro J Martinez-Garcia (UC Davis)• James A Yorke (U. Maryland(• Marc Crepeau (UC Davis)• Daniela Puiu (Johns Hopkins)• Steven L Salzberg (Johh Hopkins)• Pieter J. deJong (CHORI-BACPAC Resources Center)• Keithanne Mockaitis (Indiana University)• Dorrie Main (Washington State)• Chuck Langley (UC Davis)• David Neale (UC Davis)• MAKER-devel community
• Funding from the NHGRI through an RO1 grant entitled Software for the creation and quality control of genome annotations.
Get in Touch!Mailing List:maker-devel at yandell-lab.org
Download:http://yandell-lab.org/software/maker.html
Email me:dence at genetics.utah.edu