Date post: | 13-Jun-2015 |
Category: |
Health & Medicine |
Upload: | jonathan-eisen |
View: | 315 times |
Download: | 1 times |
TIGRTIGRTIGRTIGR
Tetrahymena thermophila macronuclear genome project
TIGRTIGRTIGRTIGR
Acknowledgements
• Ed Orias• Members of Tetrahymena steering
committee• Members of Tetrahymena Genome
Advisory Board• NSF/Pat Dennis• NIGMS/Tony Carter• Tetrahymena research community
TIGRTIGRTIGRTIGR
Genome Project Planning - coordinated by Ed Orias at UCSB
• 8/99 Workshop in Ciliate Genomics
• 10/99 First Meeting of Tetrahymena Genome Project Steering Committee
• 10/00 Second Meeting of Tetrahymena Genome Project Steering Committee
• 8/01 Third Meeting of Tetrahymena Genome Project Steering Committee
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
Details of Project
• Collaboration between– TIGR (Jonathan Eisen, Malcolm Gardner, Steven
Salzberg, others)– Stanford (Mike Cherry)– UCSB (Ed Orias)
• Funding– NSF Microbial Genome Program– NIH-NIGMS
TIGRTIGRTIGRTIGR
Major Goals of Project
• ~8x coverage of macronuclear genome of strain SB210
• Generation of genome assemblies
• Creation and maintenance of two genome databases– Sequence and automated-annotation - TIGR– Tetrahymena Genome Database - Stanford
TIGRTIGRTIGRTIGR
Eukaryotic Phylogeny
TIGRTIGRTIGRTIGR Baldauf et al. 2001
TIGRTIGRTIGRTIGR
Why Tetrahymena?• Model alveolate and ciliate• Free living, pure culture, non pathogenic• Genetic unicellular eukaryotic model:• Processes and cellular components not found in
yeasts• Organelle function: cilia, phagosome, nucleoli,
centrosomes
• Robust and novel molecular genetic tools• Large research community• Heterologous expression of alveolate genes
TIGRTIGRTIGRTIGR
Major Discoveries Using Tetrahymena
• Dynein and its unidirectional motor activity
• Ribozymes, self-splicing RNA• Telomere structure, telomerase &
telomerase RNA• Role of histone acetylation in control
of gene expression• Role of RNAi in developmental DNA
rearrangements
TIGRTIGRTIGRTIGR
Tools in Tetrahymena• Genetic tools
– Conjugation, genetic-crossing, inducible self-fertilization
– Transformation, gene disruption, gene replacement
– Gene overexpression, ribosome antisense repression
• Many genomic resources– Genetic maps (for mic and mac)
– Physical maps
– EST projects
• Ease of use– Grows fast (1.5 h doubling) in pure culture
– Large cell size
– Large T° range for growth
– Storage in liquid N2
– Large scale sub-cellular compartment fractionation
TIGRTIGRTIGRTIGR
Tetrahymena’s two nuclear genomes
Micronucleus (MIC) Germline Genome (Silent) 5 pairs of chromosomes
Macronucleus (MAC) Somatic genome (Expressed) 250-300 chromosomes @ ~45 copies each
TIGRTIGRTIGRTIGR
Macronuclear Differentiation
TIGRTIGRTIGRTIGR
Macronuclear Genome
• Little repetitive DNA• 180 Mbp genome• Little evidence for large duplications• No centromeres• Few and small introns• No alternative splicing reported• Genes are lower AT (63%) than rest of the
genome (83%)
TIGRTIGRTIGRTIGR
Major Achievements
• 8x coverage achieved September 20, 2003
• Shotgun assembly finished September 25, 2003
• Sequence and assembly Data released to TIGR web site October 1, 2003
• Traces released to NCBI trace archive October 15, 2003
TIGRTIGRTIGRTIGR
Why sequence the Mac?
• Advantages:– It contains all the genes and control elements
required for life– IES loss removes the vast majority of the
germline’s repeated sequences
• Special challenges– Assembling a highly fragmented genome.– Relating the MAC genome sequence to the MIC
genome.
TIGRTIGRTIGRTIGR
Macronuclear DNA Libraries
Size of DNA used
% Good Sequences
% No insert
TTAAA 1.5-2.0 95 0
TUAAA 2.0-3.0 90 0
TXAAA 3.0-4.0 88 1
TYAAA 4.0-6.0 85 1
TQAAA 6.0-10.0 45 27
Made by Bill Nierman at TIGR
TIGRTIGRTIGRTIGR
Sequencing
• Sequencing done at the J. Craig Venter Science Foundation’s Joint Technology Center
• 1,197,106 million reads primarily from 4-6 kb library
• Average edited length 815 bp
TIGRTIGRTIGRTIGR
Assembly
Scaffolds 2988
Contigs 4223
Bases in Scaffolds
106,196,540
Largest contig 715,652
Largest scaffold
2,217,035
Coverage 9.01
N50 Scaffolds 464,449
• Celera Assembler with modifications by Mihai Pop, Art Delcher, Steven Salzberg, et al.
TIGRTIGRTIGRTIGR
Data Release
• All raw data is in the NCBI Trace Archive
• Sequences and assemblies are available at (http://www.tigr.org/tdb/e2k1/ttg/ and will be available in Genbank
• Assemblies will be released monthly if there are any improvements
TIGRTIGRTIGRTIGR
Feature StatNumber of “capped” scaffolds 114Fraction of the genome residing in capped scaffolds 40%Fraction of the genome residing in scaffolds capped on at least one end 75%Post-genomic estimate of the number of MAC chromosomes 292Number of sequenced RAPS found in single scaffolds 93/94 testedLongest single-contig scaffold 716 kbLongest scaffold 2.2 MbLongest capped scaffold (on both ends) 1.1 MbShortest capped scaffold (on both ends) 37.5 kbEstimated fold-redundancy of MIC sequence in the TIGR sequence database 0.1 fold
Assorted statistics
TIGRTIGRTIGRTIGR
Accuracy?
• No scaffolds are larger than the corresponding MAC chromosomes
• All independently assorting loci match different scaffolds and all co-assorting loci match either same scaffold or the sum of the scaffolds is < than the size of cognate MAC chromosome
• Previously obtained Cbs-adjacent sequences that match to untelomerized scaffolds invariably do so at scaffold ends.
TIGRTIGRTIGRTIGR
Scaffold to MAC Chromosome Size Ratio
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
0 0.5 1 1.5 2 2.5 3 3.5
MAC Chromosome Ratio (Mb)
Scaffold to Chromosome Ratio
Observed "0.9 & 1.1 Lines"
TIGRTIGRTIGRTIGR
Estimating the number of MAC chromosomes
• 114 “closed” scaffolds (= MAC chromosomes) encompass 40% of the genome sequence in scaffolds.
• If the size distribution of these scaffolds is representative, then, by proportionality,
• The entire genome is estimated to contain ~290 MAC chromosomes.
• This number falls within the range of earlier estimates, suggesting that few, if any, MAC chromosomes are missing from the TIGR Tetrahymena sequence
TIGRTIGRTIGRTIGR
Assembly Issues
• rRNA and mitochondrial contigs are considered “repetitive” due to the higher depth of coverage
• Reran assembly in three subsets– rRNA– mitochondrial – other sequences
TIGRTIGRTIGRTIGR
Assembly 2rRNA Mitochondria Major
chromosomes
Scaffolds 2 1 1971
Contigs 2 1 2955
Bases in Scaffolds
12,166 45,538 103,927,049
Largest contig 45,538 715,652
Largest scaffold
12,166 45,538 2,214,258
Coverage 635x 17.85x 9.08x
TIGRTIGRTIGRTIGR
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
TIGRTIGRTIGRTIGR
Tetrahymena Genome Database
• Phenotypes associated with gene knockouts, replacements and other types of mutations.
• Gene regulation information from the literature.• Post-translational modifications.• Linkage & physical maps • DNA polymorphisms • Experimental protocols• Links to other sites
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
Paul Doerder, Cleveland StateImmobilization antigens (i-ag)
Major GPI-linked cell surface protein o related to surface proteins of disease-causing protistso encoded by at least 8 families of paralogs expressed
under different conditions of temperature and salinityo members of H, L, J and S families already sequenced
Tetrahymena Genome Project:o additional H, L, J and S paralogs and pseudogenes have
been identified
o candidate I, T, M and P i-ag genes currently being tested by RT-PCR and real-time PCR
TIGRTIGRTIGRTIGR
Todd Hennesey, Buffalo
• Identified ecto-ATPase that he’s been trying to clone for the past 7 years
• Making a knockout• Identified "lysozyme receptor" that he’s been
trying to clone for the past 5 years• We screened some antisense ribosome mutants,
got an interesting phenotype (extended backward swimming in Ba++), BLASTed the short antisense sequence into the database and now have 1.7kb of sequence to use to make a knockout
TIGRTIGRTIGRTIGR
Kathleen Karrer, Marquette
• We have just today had a paper accepted by Eukaryotic Cell, pending revisions, which was significantly enhanced by analysis of the data base. There are two undergraduate co-authors on the paper.
TIGRTIGRTIGRTIGR
Cliff Brunk, UCLAT. thermophila genes detected by CUI
CUI versus Gene Position
0
10
20
30
40
50
60
70
23500 28500 33500 38500 43500Nucleotide Position
1000/CUI Nucleotide
TIGRTIGRTIGRTIGR
Davis Asai, Harvey Mudd College
• Dynein heavy chains are very large ORFs (ca. 16 kb) and traditional cloning etc. has been a slow go.
• We were able to use the database to complete the determination of the sequence of the major cytoplasmic dynein heavy chain gene, DYH1, and we are extending our information on the second cytoplasmic dynein heavy chain, DYH2.
• Further, we have been able to walk "in silico" upstream of the DYH1 gene in order to make constructs for the N- terminal tagging of the heavy chain.
TIGRTIGRTIGRTIGR
TIGR – sequences
TIGR – scaffolds
Translate in 6 reading frames using ciliate code
Use these files as databases of all known proteins in Tetrahymena thermophila in these two mass
spectrometry related searching programs (in-house):
J. Smith, K. Belay, S. Beeser, A. Keuroghlian, R.E.
Pearlman, K.W.M. Siu
TIGRTIGRTIGRTIGR
ExciseDigest with trypsin
Gel approach…
Identify based ontryptic fingerprint usingtranslated T. thermophiladatabase (MS-FIT).
Sequence individualpeptides and identifyusing MASCOT andtranslated T. thermophiladatabase.
Ciliary axonemal proteins from Tetrahymena thermophila
Ciliary axonemal proteins from Tetrahymena thermophila
Digest with trypsin
Divide into 30 fractions using SCX
Run each fraction on a 1.5 hourreverse phase gradient (C18 column) into a mass spectrometer, acquiring a CIDspectrum of each peptide in thesolution.
Identify using MASCOT andtranslated T. thermophiladatabase.2D LC/MS/MS approach…
TIGRTIGRTIGRTIGR
(These are different gels,not a magnification of the
same gel)
TIGRTIGRTIGRTIGR
Preliminary Summary (using Gel approach):Axonemal proteins found:
• Alpha Tubulin• Beta Tubulin• Unnamed protein product• Axoneme central apparatus protein• Chain A, Tryparedoxin Ii / Thioredoxin Peroxidase / Peroxiredoxin 2 / Natural Killer Cell Enhancing Factor• Hypothetical Protein• Dynein, 70 kDa intermediate chain• Calmodulin like protein / Outer dynein arm-docking complex• Axonemal leucine-rich repeat protein• Testes specific A2 / Meichroacidin / phosphatidylinositol-4-phosphate• invl / putative ankyrin repeat protein / Ankyrin 3• Calmodulin• Radial spokehead-like protein• Flagellar Radial Spoke protein• ABC transporter
Membrane proteins found (tubulins found in previous experiments):• Hypothetical Protein• Xenobiotic reductase• SerH3 immobilization antigen• NADH:flavin oxidoreductase
TIGRTIGRTIGRTIGR
OralApparatus
Preliminary Analysis of the Tetrahymena Phagosome ProteomeL. Klobutcher (Univ. Connecticut Health Ctr.) & R. Pearlman (York Univ.)Preliminary Analysis of the Tetrahymena Phagosome ProteomeL. Klobutcher (Univ. Connecticut Health Ctr.) & R. Pearlman (York Univ.)
PROTEINS IDENTIFIED:*1. Vacuolar-type H+-ATPase*2. Cathepsin B*3. HSP 70*4. 14-3-3 protein 5. Cytochrome b5-related protein 6. Two novel proteins
*Components of the mouse phagosome proteome (Garin et al. J. Cell Biol. 152:165, 2001)
TIGRTIGRTIGRTIGR
Doug Chalker, Wash. U.
Using the genome sequence to predict genes that we are going to use this semester as the focus of an undergraduate lab class.
We are going to knockout these genes and study the phenotypes. This will bring up to the date research techniques into the undergraduate classroom.
TIGRTIGRTIGRTIGR
Marty Gorovsky, Rochester
• Expansion of a family of cystein proteases
• Two new histone H3 genes
• One new histone H2A gene
TIGRTIGRTIGRTIGR
Kapler: Gene Amplification and DNA Replication Control rDNA minichromosome (21 kb) Macronuclear development: amplified 5,000-fold Vegetative replication: once per cell cycle Biochemically purified trans-acting factors: TIF1, TIF4
TIGR genome sequencing project: Bioinformatics
Immediate impact on two funded research projects• Kapler: NIH (GMS) (Cis- and trans-acting determinants for replication and amplification
of the rDNA minichromosome) Strong candidates identified for orthologs of Orc1,2,4,5,6, Cdc6, Mcm2-6, Cdt1
• Kapler and Orias (co-PIs): NSF (Eukaryotic Genetics) (Genetic dissection of replicons in non-rDNA chromosomes)Complete sequence of 16 non-rDNA minichromosomes (size range 37.4-99.5 kb)
TIGRTIGRTIGRTIGR
ID new genes by blasting
3 new histones, including a cen-P homolog Gorovsky16 new ciliogenesis-induced genes with known homologs Gorovsky51 novel ciliogenesis-induced genes with no known homologs Gorovsky55 new cysteine protease genes – only one in GenBank Gorovsky8 strong candidates for proteins involved in replication and amplification ofthe rDNA minichromosome
Kapler
Completing the very long (~16 kb) dynein heavy chain ORFs AsaiOrthologues of light chains and light intermediate chains characterized inother systems
Asai
2-3 families of homing endonucleases Karrer20 nuclear transport proteins; interest, MIC vs. MAC JahnNew heat shock proteins MiceliNew stress response proteins (oxidative and UV), including some neverreported in protozoa
Miceli
Subunits of heterotrimeric G-proteins MiceliTetrahymenol (cholesterol surrogate) cyclase; bacterial-related, possible LGT MatsudaMany snoRNA candidate genes NielsenNew alternative family of U1-3 spliceosomal RNAs NielsenGlutamic-dehydrogenase; regulation-wise, “missing link” between bacterialand animal GDH; lacks “off” switch, just like mutant GDH that in childrencauses insulin hypersecretion
Smith
16 complete minichromosomes (37.5 to 99 kb) for a study of origins ofreplication
Kapler
TIGRTIGRTIGRTIGR
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
TIGRTIGRTIGRTIGR
Other Ciliate Projects• Paramecium genomic survey (Dr. Linda Sperling,
Centre de Genetique Moleculaire, CNRS, France)• European rumen ciliate cDNA project (C. Jamie
Newbold, Rowett Research Institute, Aberdeen, UK)
• Oxytricha (Spirotrich ciliate) micronuclear BAC project (Laura Landweber, Princeton University);
• Ichthyophthirius EST sequencing proposal (Theodore G. Clark, Cornell University
TIGRTIGRTIGRTIGR
Relating MIC and MAC genomes
• Paired sequence tags from MAC chromosome ends adjacent to Cbs junctions
• MIC:MAC relational genetic and physical maps of sequenced DNA polymorphisms (not shown)
TIGRTIGRTIGRTIGR
Physically Relating the MIC and MAC Genomes
MICCbs
Cbs Library
MAC
Cbs Cbs
TIGRTIGRTIGRTIGR
Chromosome Breakage Junction Sequence
Scaffold Sequence
Ordering and Orienting Tetrahymena MAC Chromosome DNA in the Micronuclear
Genome: Genominoes
TIGRTIGRTIGRTIGR
Current state of MIC Genominoes
I’m sending you a Word document with the status before I tel-linked the 273 additional scaffold ends.
Their tel-adjacent sequence was blasted against our paired Cbs tags on Friday.
I should be able to send you a slide with longer “contigs” of scaffolds within the next couple of days (please let me know what the hard deadline is).
TIGRTIGRTIGRTIGR
Fraction of the genome in Tel-linked Scaffolds
Scaffold Number % gemome
-----------------------------
Both tels 114 40
One tel 120 35
No tel 289 25
-----------------------------
Total tel-linked
scaffold ends: 348