Annotation of Sarcocystis neurona scaffolds
Nigel AustinTurgay IbrikciLiliana Lopez KleineMarton Megyeri
Caribbean Training Programme on Bioinformatics January 2010
2
Sarcocystis neurona• Genus: Sarcocystis - parasitic protozoa
• occur as sporocysts in the muscle of mammals, birds, and reptiles.
• In humans – asymptomatic
• Sarcocystis neurona causes equine protozoal myoencephalitis
3
S. neurona & Related Apicomplexa
Sarcocystis neurona Eimeria
Neospora
Toxoplasma
4
Life Cycle of S. neurona
5
About Data
• Data cordially supplied by Dr. Jessica Kissinger who very recently acquired the genome sequence
• First 120,000 bp in 4 scaffolds – analysis
• Then 400,000 bp in 4 scaffolds - analysis
6
Objectives
• To annotate novel DNA sequences of S. neurona.
• Detection of coding sequences by:– comparison with other sequences in data
bases
• NB: No reference genome or other info was available since sequences were novel
7
Strategy for Scaffolds• BLASTX in nr db: search of translated
sequence in protein databases
• TBLASTX in est db: search of translated sequence in translated sequence databases
• Comparison in ACT with most closely related organisms (Toxoplasma gondii and Neospora caninum)
8
Results – Blast Search
9
Results BLASTBLAST DB Start End Similarity E-value Subject
BLASTX nr 41446 42924 71 2.00E-16 Conserved hypothetical protein Toxoplasma gondii
BLASTX nr 41464 42942 42 2.00E-44 Conserved hypothetical protein Plasmodium falciparum
BLASTX nr " " 44 2.00E-42 Conserved hypothetical protein Plasmodium vivax
BLASTX nr " " 41 1.00E-37 Conserved hypothetical protein Plasmodium berghei
BLASTX nr " " 40 1.00E-37 Conserved hypothetical protein Cryptosporidium muris
BLASTX nr " " 43 1.00E-22Conserved hypothetical protein Cryptosporidium
parvum
BLASTX nr 10632 10992 69 6.00E-33 Putative lectin doman protein Toxoplasma gondii
BLASTX nr 32690 32968 66 7.00E-18 Transcript GF18541 Drosophila melanogaster
BLASTX nr " " 66 6.00E-17 Putative acylphosphatase Aedes aegypti
BLASTX nr " " 69 4.00E-16 Putative acylphosphatase Toxoplasma gondii
TBLAST est 1538 1840 45 5.00E-08 Xenopus mRNA (cDNA library)
TBLAST est " " 51 1.00E-07 Cyprinus carpio mRNA (cDNA library)
TBLAST est 10986 10967 82 5.00E-10 T. gondii mRNA (cDNA library)
TBLAST est 14716 14904 87 2.00E-33 T. gondii mRNA (cDNA library)
10
ACT ResultsMatch of region with a conserved gene in Neospora caninum and Toxoplasma gondii
Neospora caninum
scaffolds
11
Hmmm….
• No genes in 400,000 bp DNA???
• And then….• Expertise, experience• He was able to locate
a gene
12
Gene Discovered!Match of region with a conserved gene in Neospora caninum
13
Discovered Gene - Gene1
• The discovered gene was expanded on both the 5’ and 3’ end
• Start and stop codons were identified
• Protein sequence was determined
• BLAST – hypothetical protein with high similarity to one found in Neospora and Toxoplasma
14
Gene ComparisonMatch of region with a conserved gene in Neospora caninum and Toxoplasma gondii
Neighbouring genes are not present in the scaffold.
15
Results – Uniprot SearchPerformed with GENE1
16
Further Protein Info
• Characterize our protein product– Membrane protein? High regions of
hydrophobicity – Domains and motifs– Secondary structures
17
No transmembrane motifs present
Hydrophobicity Graph
18
Domains & Motifs
19
Conclusion• Various blast searches may assist in location of
orthologous genes in other genomes
• ACT very useful tool for gene discovery and annotation (along with experience & expertise)
• One gene (Gene1) was found in 400 Kb of DNA – scaffolds perhaps in a gene poor region of genome
• Gene1 is perhaps orthologous with a gene in Toxoplasma and Neurospora
• Hypothetical gene – no function prescribed to it
20
Thank You!!!