+ All Categories
Home > Documents > Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS.

Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS.

Date post: 21-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
86
Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS
Transcript

Elements of Bioinformatics (14F001)

TP2: Gene prediction22 October 2012

CORRECTIONS

Notice:

During this practical, you will need to use ‘raw’ and ‘fasta’ sequence formats.

For additional information on the different sequence formats available, please have a look athttp://www.genomatix.de/online_help/help/sequence_formats.html

nc RNA gene prediction

Choose: eukaryotic tRNA; does not give any result with general tRNA model !

CpG island prediction

CpG island in the C. Elegans cosmid

Lenght 219 pb; position 21’954 to 22’172

cgttttctgtggtcaca cacgagtatc cggatcttct ggatcaactt gttctcgtct gcaacgtctt tgcaagaatg gcaccagaac agaaacaact actcgtggaa caccttcaag acgttgggca gacggtcgct atgtgtggcg atggagctaa tgattgtgct gctctgaaag cagctcacgc gggaatctca ctatcggagg ctgaagcatc ga

To confirm that this sequence could be part of a promoter sequence (> 80 % of CpG islands extend in the 5’ flanking region of the associated genes), check - according to its positions - if this CpG island is located in a gene promoter region(see later).

Gene prediction

with HMM on the complete cosmid sequence

Gene 1

Gene 2

Gene 3

Gene 4

Wrong CDS ?

3 HMM models: firstex, exon_n, lastex

1

4

3

2

tRNA 169 238

Predicted CpG island: 21954 22172 -> in the middle of CDS4: not a ‘classical’ CpG (not in the 5’ of a gene)

Summary:

Gene 1

Gene 1 prediction with HMMgene

One gene found

Gene 1 prediction with HMMgene

With ‘human’: 2 genes found, one on each strand, (strand minus with less good scores)The programs are ‘trained’ with sequence from specific organisms. The ‘codon bias’ for example, is not the same for the different species.

Example of codon usage tables (-> codon bias)http://www.kazusa.or.jp/codon/

Gene 1 prediction with Netgene2

Netgene 2 gives the positions of the first and last nucleotide of the intron (donnor and acceptor splice sites)

GTdonnor

AG

acceptor

intron

Gene 1 prediction with GeneBuilder(organism: no choice….human; option: first and last exon disabled)

Matrix: miscellaneous

One gene found

Gene 1 prediction with GenScan!! No choice except: vertebrate, maize and arabidobsis !

Two genes found

!! No choice except: vertebrate, maize and arabidobsis !

Two genes found

FGENESH

One gene found

Summary (gene prediction)

3 ’5 ’

108310031305

14061452 1661

2000

DO 1084 (1.00)

AC 1304 (0.77)

DO 1407 (0.89)

AC 1451 (0.90)

DO 1662 (1.00)

AC 1913 (1.00)

HMMgene Genebuilder Netgene2 DO:donnor site AC: acceptor site

19141997

and GenScan (organism = human !!)

1557

(organism = human !!)

977

GeneMark: finds a second gene in 3’!!!

163211

FGENESH

+ another potential genefrom positions 2000 to 2900

One gene

ID FGENESH Unreviewed; 159 AA.SQ SEQUENCE 159 AA; 17780 MW; F9A2C7DE9614425C CRC64;

MKVETCVYSG YKIHPGHGKR LVRTDGKVQI FLSGKALKGA KLRRNPRDIR WTVLYRIKNK KGTHGQEQVT RKKTKKSVQV VNRAVAGLSL DAILAKRNQT EDFRRQQREQ AAKIAKDANK

AVRAAKAAAN KEKKASQPKT QQKTAKNVKT AAPRVGGKR//

ID GENESCAN1 Unreviewed; 159 AA.SQ SEQUENCE 159 AA; 17780 MW; F9A2C7DE9614425C CRC64;

MKVETCVYSG YKIHPGHGKR LVRTDGKVQI FLSGKALKGA KLRRNPRDIR WTVLYRIKNK KGTHGQEQVT RKKTKKSVQV VNRAVAGLSL DAILAKRNQT EDFRRQQREQ AAKIAKDANK

AVRAAKAAAN KEKKASQPKT QQKTAKNVKT AAPRVGGKR//

ID GENESCAN2 Unreviewed; 202 AA.SQ SEQUENCE 202 AA; 23684 MW; 98A69FA21823F2F3 CRC64;

MRTLRIAQYS VLTVGFAIYM YRLIEEIPID IRNLNSDSLE GIINSDELCD VTVSNRNRGL LVRNDSLDLD ILKAKFTTFF SKRYLTRFLS EQVPFLHVID EALLVKRFVM CACFMVFCLT VIWFLVIRRM GNLIKRLSVL NQLEDAESVE WARCIREFTQ EKLAVLCFCI VPPFAQTDKL

VSDKIKLFRE HKILRIRSVQ HI//

ID GENEMARK1 Unreviewed; 184 AA.SQ SEQUENCE 184 AA; 20255 MW; 85BB0234E6C14EA0 CRC64;

MGRCGSSGKR DGYGAKDSSS EGLSTMKVET CVYSGYKIHP GHGKRLVRTD GKVQIFLSGK ALKGAKLRRN PRDIRWTVLY RIKNKKGTHG QEQVTRKKTK KSVQVVNRAV AGLSLDAILA KRNQTEDFRR QQREQAAKIA KDANKAVRAA KAAANKEKKA SQPKTQQKTA KNVKTAAPRV

GGKR//

ID GENEMARK2 Unreviewed; 183 AA.SQ SEQUENCE 183 AA; 21336 MW; 64F65D472A58046E CRC64;

MRTLRIAQYS VLTVGFAIYM YRLIEEIPID IRNLNSDSLE GIINSDELCD VTVSNRNRGL LVRNDSLDLD ILKAKFTTFF SKRYLTRFLS EQVPFLHVID EALLVKRFVM CACFMVFCLT VIWFLVIRRM GNLIKRLSVL NQLEDAESVE WARCIREFTQ EKLAVLCFCI VPPFAQTDNV

QHI//

For fun…

Compare the predictions with the same program (GenMark) with different

parameters (HMM trained with eukaroyta or prokaroyta)

Two genes found

Gene 1 prediction with GeneMark (prokaryota specific; E.coli K12)

Protein 1Protein 2

Protein 1

Protein 2

Gene 1 prediction with GeneMark (prokaryota specific)

CDS corresponds ~ to ‘exon’ : there is no intron in prokaryota !

Summary (prokaryota gene prediction)

3 ’5 ’

108310031305

14061452

1661

2000DO

1084 (1.00)

AC 1304 (0.77)

DO 1407 (0.89)

AC 1451 (0.90)

DO 1662 (1.00)

AC 1913 (1.00)

HMMgene Genebuilder Netgene2

DO:donnor site

AC: acceptor site

1914 1997

GenScan

1437 1688

Gene Mark (proka)

1254 1433Protein 1Protein 2

1557

GenMark (euka)

Alignment between the ‘eukaryota and prokaryota’ predicted sequences

Gene prediction: similarity searches with ESTs

ESTs: Expressed sequence tags (cDNAs which are rapidly and badly sequenced)

Blast 2012

Gene A Gene B

Two genes found

Blast 2010

Gene A Gene B

EST1  >gi|47590759|gb|BJ750997.1|BJ750997 BJ750997 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 5', mRNA sequenceGGTTTAATTACCCAAGTTTGAGATTCGTCAAGCGAGGGCCTATCAGCAATGAAGGTCGAAACCTGCGTTTACTCCGGATACAAGATCCACCCAGGACACGGAAAGAGACTTGTCCGTACTGACGGAAAGGTGAGTTCAGTTTCTCTTTGAAAGGCGTTAGCATGCTGTTAGAGCTCGTAAGGTATATTGTAATTTTACGAGTGTTGAAGTATTGCAAAAGTAAAGCATAATCACCTTATGTATGTGTTGGTGCTATATCTTCTAGTTTTTAGAAGTTATACCATCGTTAAGCATGCCACGTGTTGAGTGCGACAAACTACCGTTTCATGATTTATTTATTCAAATTTCAGGTCCAAATCTTCCTCAGTGGAAAGGCACTCAAGGGAGCCAAGCTTCGCCGTAACCCACGTGACATCAGATGGACTGTCCTCTACAGAATCAAGAACAAGAAGGGAACCCACGGACAAGAGCAAGTCACCAGAAAGAAGACCAAGAAGTCCGTCCAGGTTGTTAACCGCGCCGTCGCTGGACTTTCCCTTGATGCTATCCTTGCCAAGAGAAACCAGACCGAAGACTTCCGTCGCCAACAGCGTGAACAAGCCGCTAAGATCGCCAA      EST2 >gi|47646579|gb|BJ775052.1|BJ775052 BJ775052 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 3', mRNA sequenceATAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTCTTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCTTGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCTCTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTCTTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATCTGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTGAAATTTGAATAAATAAATCATGAAACGGTAGTTTGTCGCACTCAACACGTGGCATGCTTAACGATGGTATAACTTCTAAAAACTAGAAGATATAGCACCAACACATACATAAGGTGATTATGCTTTACTTTTGCAATACTTCAACACTCGTAAAATTACAATATACCTTACGAGCTCTAACAGCATGCTAACGCCTTTCAAAGAGAAACTGAACTCACCTTTCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCGACCTTCATTGCTGATANGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCCCA  

EST3

>gi|47727995|gb|BJ818152.1|BJ818152 BJ818152 unpublished oligo-capped cDNA library, stage L4 Caenorhabditis elegans cDNA clone yk1685h11 3', mRNA sequence TAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTC TTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCT TGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCT CTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTC TTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATC TGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTT TCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCG ACCTTCATTGTTGATAGGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCTACAAATAAAAATG AGATAAAGCATACTGCCATTCTACAACCGGAGAATAAGAAAACCGAAAACGAGAAAATTATTCTATTATG ACAGATAGAATAAGTTAAAATGGGAAGAGTGCATTTGTCACTGATTTACTTGGTGACTTGGTGGAGAGCG TGGGCAAGGTAAGCGACATTGTTCGATGAA

Gene A

975-1407 1450-1615 1692-1865

Blast result with EST1

BUT: Blast does not take care of the intron-exon boundaries when aligning DNA with RNA -> we have to use a specific tool : SIM4

The 3rd part of the EST1 is of very bad quality

SIM4 alignment

Example withEST 1 BJ750997

(partial)

The 3rd part of the EST1 is of very bad quality: not align by SIM4 -> EST1 is considered as partial !

EST 3 BJ818152

SIM4 alignment results

EST 1 BJ750997(partial)

EST 2 BJ775052

summary (ESTs)

3 ’5 ’

108310031305

14061452

1661

1914 1997

1615EST1BJ750997.1

EST2 BJ775052.1

EST3 BJ818152.1

Alternative splicing event (intron retention)-> 2 different mRNAs

(EST BJ750997.1 is partial)

Gene A

Translation and BLASTpTranslation

(beware the EST sequence orientation !)

>gi|47590759|gb|BJ750997.1|BJ750997 BJ750997 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 5', mRNA sequence

GGTTTAATTACCCAAGTTTGAGATTCGTCAAGCGAGGGCCTATCAGCAATGAAGGTCGAAACCTGCGTTT

ACTCCGGATACAAGATCCACCCAGGACACGGAAAGAGACTTGTCCGTACTGACGGAAAGGTGAGTTCAGT

TTCTCTTTGAAAGGCGTTAGCATGCTGTTAGAGCTCGTAAGGTATATTGTAATTTTACGAGTGTTGAAGT

ATTGCAAAAGTAAAGCATAATCACCTTATGTATGTGTTGGTGCTATATCTTCTAGTTTTTAGAAGTTATA

CCATCGTTAAGCATGCCACGTGTTGAGTGCGACAAACTACCGTTTCATGATTTATTTATTCAAATTTCAG

GTCCAAATCTTCCTCAGTGGAAAGGCACTCAAGGGAGCCAAGCTTCGCCGTAACCCACGTGACATCAGAT

GGACTGTCCTCTACAGAATCAAGAACAAGAAGGGAACCCACGGACAAGAGCAAGTCACCAGAAAGAAGAC

CAAGAAGTCCGTCCAGGTTGTTAACCGCGCCGTCGCTGGACTTTCCCTTGATGCTATCCTTGCCAAGAGA

AACCAGACCGAAGACTTCCGTCGCCAACAGCGTGAACAAGCCGCTAAGATCGCCAA   

EST1

MIYLFKFQVQIFLSGKALKGAKLRRNPRDIRWTVLYRIKNKKGTHGQEQVTRKKTKKSVQ

VVNRAVAGLSLDAILAKRNQTEDFRRQQREQAAKIA

Blastp results

>gi|47646579|gb|BJ775052.1|BJ775052 BJ775052 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 3', mRNA sequence

ATAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGT

CTTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCC

TTGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTC

TCTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGT

CTTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCAT

CTGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCT

GAAATTTGAATAAATAAATCATGAAACGGTAGTTTGTCGCACTCAACACGTGGCATGCTTAACGATGGTA

TAACTTCTAAAAACTAGAAGATATAGCACCAACACATACATAAGGTGATTATGCTTTACTTTTGCAATAC

TTCAACACTCGTAAAATTACAATATACCTTACGAGCTCTAACAGCATGCTAACGCCTTTCAAAGAGAAAC

TGAACTCACCTTTCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAA

ACGCAGGTTTCGACCTTCATTGCTGATANGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCCC

A  

EST2

MIYLFKFQVQIFLSGKALKGAKLRRNPRDIRWTVLYRIKNKKGTHGQEQVTRKKTKKSVQ VVNRAVAGLSLDAILAKRNQTEDFRRQQREQAAKIAKDANKAVRAAKAAANKEKKASQPK

TQQKTAKNVKTAAPRVGGKR

Blastp results

>gi|47727995|gb|BJ818152.1|BJ818152 BJ818152 unpublished oligo-capped cDNA library, stage L4 Caenorhabditis elegans cDNA clone yk1685h11 3', mRNA sequence TAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTC TTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCT TGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCT CTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTC TTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATC TGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTT TCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCG ACCTTCATTGTTGATAGGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCTACAAATAAAAATG AGATAAAGCATACTGCCATTCTACAACCGGAGAATAAGAAAACCGAAAACGAGAAAATTATTCTATTATG ACAGATAGAATAAGTTAAAATGGGAAGAGTGCATTTGTCACTGATTTACTTGGTGACTTGGTGGAGAGCG TGGGCAAGGTAAGCGACATTGTTCGATGAA EST3

EST1 is partial in C-ter

Gene A

EST1 is partial.EST3 corresponds to the UniProtKB/Swiss-Prot RL24_CAEEL sequence

Gene A

Some prediction programs give the correct protein sequenceNone have predicted the alternative splicing event (EST2; intron 1084-1304 retention)

Gene A

summary (ESTs)

3 ’5 ’

108310031305

14061452

1661

1914 1997

EST BJ775052.1

EST BJ818152

Alternative splicing events (intron retention)-> 2 different mRNAs

MKVET…..1010

MIYLF…..1284

Gene A

Gene 1 is on C.elegans chromosome I

BLAT results

Isoform 2EST2

Gene BGene A

>NP_491399 length=159 MKVETCVYSGYKIHPGHGKRLVRTDGKVQIFLSGKALKGAKLRRNPRDIR WTVLYRIKNKKGTHGQEQVTRKKTKKSVQVVNRAVAGLSLDAILAKRNQT EDFRRQQREQAAKIAKDANKAVRAAKAAANKEKKASQPKTQQKTAKNVKT AAPRVGGKR

RefSeq sequence

InterPro scan results: the protein contains a ribosomal L24e domain

Conclusions (1)

There are 2 different protein sequences due to alternative splicing (intron retention; the shortest isoform is due to a intron retention and is rarely expressed – only 2 ESTs)

Gene A

Conclusions (2)

Gene prediction programs can not predict an alternative splicing event(it can only predict the alternative splice junction)

The protein (Gene A) is a ribosomal protein which belongs to the ribosomal protein L24e family (UniProtKB/Swiss-Prot O01868).

The alternatively spliced sequence is not yet in the protein sequence databases, because it is ‘derived’ from ESTs sequenceswhich are submitted to public DNA/RNA databases without annotated CDS

Non coding region analysis

3’end of chromosome Y 

EMBL #AJ271736

Example of Alu sequence

Gene 2

Schema recapitulatif

5 ’3 ’

11117891410 1636

1688 1845

AC 1112 (0.56)

DO 1409 (0.92)

DO 1556 (0.96)

AC 1637 (0.61)

HMMgene

Netgene2DO:donneur AC: accepteur

5 ’ 3 ’

1557 Exon 1Exon 2Exon 3

1112 1407 1637 1688

GeneBuilder prediction is not confirmed anywhere else

CDS2 (3 exons)

RefSeq NP_491393 (AF272397)UniProtKB/TrEMBL: G5EC89

237 AA; 3 exonsMMMEYGGYFS SSAVAQQSGD VPTTAPSAVT NSFFYTPQSH NIYHQYATPY LQSGRALTTA HNTSSSSAGN STSSSSSSSN YRNTTHDSLQ AFFNTGLQYQ LYQKSQLIGS DTIQRTSSNV LNGLPRSSLV GALCSTGGAP LNPAERRKQR RIRTTFTSGQ LKELERSFCE THYPDIYTRE EIAMRIDLTE ARVQVWFQNR RAKYRKQEKI RRVKDEEEDP LKKEPGQISL EEIIDQI

A probable nuclear protein with a DNA binding domain (homeobox)

Gene 3

Numérotation « direct strand »

CDS3

>tr|O01864|O01864_CAEEL Hypothetical protein - Caenorhabditis elegans. METEVMKSFNNELSSLFDSKNMSKNKIQDITKAAIKAKSQYKHVVFSVEKLINKCKPDQR LNVLYVIDSIVRASKHQLKEKDTFGPRFMKQFDKFLMPLLKCGQKEKMRTVRTLNLWMSN KVFKESEIQPLREMCKASGLTIDFEEVELAVKGKQADMSIYSGVYKKKPKRSSSSSQPKS RTPTNPHPDDGLLGAGPSSALRSVPDIPNFVLSEDYFLGTISEREMLELVQKFGIDRSGV LSKDKNLLQRALQIFAGSLSQKVEEVLAENNRINGSSIQNVLTKDFEYSDDEEEKEKEPQ PEKQKNLPHAQVLLLAQSLLTQPQILAKLAEVLIPQGNPFGLPFPGEHIVPTSSAALTLG APPPNLMALQQSLPPGFPNQQLGLPNLSGLNQAQLMNVQNAQNMLQLQQRAAQLQALQGN PNAQRNLLMLGNPLLNPFALQHGVNPMLNDLQAAAAAQQQAMLNEAAQSPEKKILELSGG NSGINNSGDVERARLREKEKERESKERRRMGLPPVRIGFTIIASRTLWLKKIPTNIVEND LKQAVESCGEASRVKVIGNRACAYITMENRRSANDVVSKMREVSVAKKMVKVYWARSPGM DSDQFSDLWDSNRGVLEIPYEKLPLDLVALCEGAMLDIESLPIEKKLLYKETGETVISIP PPNIQPPVPHPPPMGFPFQHQLTQLPGQPRPAGLPPGVPPMFNLNAPPPPGIPGYPPAPP PPGVGPPPPQGIPPMGFDPNKPPPPMFQQGFNAGAPPPPFGRGAGPMSSFPPPPRGGMHH MPPPPSFRGGRGGHGGPPPPHFDRRGGGGPPFRPENGRGRLLDQSEMWNREQREMRGGGG AGRDGGREHRDYDRDRSQIDRRRQDDMGARRRSRWGDDDRRDDDRRDDRRDDRRESRRRS PRSPRSPDRRTRRSPSYEREEPPVKKTSVEEETVSSTTLDELKPSVEPTPVPAPIPAPAP

ELKAAEEPVKIVAEHHEDQTDEVPMDLE

Gene 4

Removed from gene 4:1412-1691, 1795-5682, 5842-6048, 6865-6907, 7133-7413,7518-7589, 7754-7999, 7912-7958, 8154-8222, 8414-8496,8660-8709, 9043-9114, 9529-9573, 9706-9769, 9943-9996

EST HMMgene WebGene Netgene2

1346 1411 (AG) (GT)1695 1794 1691 1795

5405 54495679 5841 5668 5859 5683 5841 5682 58426049 6080 6049 6864 6049 6864 6048 68656908 6993 6908 7132 6908 7132 6907 7133

7187 7328 7187 7328 7186 73297411 7520 7414 7517 7414 7517 7413 7518

7564 75897959 8153 7958 8154

7589 7753 7589 77547800 7911 7800 7911 7799 79127954 8113 7959 8135

8223 8413 8223 8413 8222 84148497 8659 8497 8659 8496 86608710 9042 8710 9042 8709 90439115 9528 9115 9528 9114 9529

9631 9705 9574 9705 9574 9705 9573 97069770 9943 9770 9946 9770 9942 99439997 10350 9996

Protein Q3N323

>tr|Q9N323|Q9N323_CAEEL Hypothetical protein - Caenorhabditis elegans. MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEHEFDIIAYKTT FWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVENADLVLVVDNHNRYDICNVY YRNKSGTDHTVVANTDGNLAELDELRWFKYRKLQYTWIDGEWSTPSRAYSHVTPENLASS APTTGLKADDVALRRTYFGPNVMPVKLSPFYELVYKEVLSPFYIFQAISVTVWYIDDYVW YAALIIVMSLYSVIMTLRQTRSQQRRLQSMVVEHDEVQVIRENGRVLTLDSSEIVPGDVL VIPPQGCMMYCDAVLLNGTCIVNESMLTGESIPITKSAISDDGHEKIFSIDKHGKNIIFN GTKVLQTKYYKGQNVKALVIRTAYSTTKGQLIRAIMYPKPADFKFFRELMKFIGVLAIVA FFGFMYTSFILFYRGSSIGKIIIRALDLVTIVVPPALPAVMGIGIFYAQRRLRQKSIYCI SPTTINTCGAIDVVCFDKTGTLTEDGLDFYALRVVNDAKIGDNIVQIAANDSCQNVVRAI ATCHTLSKINNELHGDPLDVIMFEQTGYSLEEDDSESHESIESIQPILIRPPKDSSLPDC QIVKQFTFSSGLQRQSVIVTEEDSMKAYCKGSPEMIMSLCRPETVPENFHDIVEEYSQHG YRLIAVAEKELVVGSEVQKTPRQSIECDLTLIGLVALENRLKPVTTEVIQKLNEANIRSV MVTGDNLLTALSVARECGIIVPNKSAYLIEHENGVVDRRGRTVLTIREKEDHHTERQPKI VDLTKMTNKDCQFAISGSTFSVVTHEYPDLLDQLVLVCNVFARMAPEQKQLLVEHLQDVG QTVAMCGDGANDCAALKAAHAGISLSEAEASIAAPFTSKVADIRCVITLISEGRAALVTS YSAFLCMAGYSLTQFISILLLYWIATSYSQMQFLFIDIAIVTNLAFLSSKTRAHKELAST PPPTSILSTASMVSLFGQLAIGGMAQVAVFCLITMQSWFIPFMPTHHDNDEDRKSLQGTA IFYVSLFHYIVLYFVFAAGPPYRASIASNKAFLISMIGVTVTCIAIVVFYVTPIQYFLGC LQMPQEFRFIILAVATVTAVISIIYDRCVDWISERLREKIRQRRKGA

Prediction of mitochondrial genes (human)

NC_012920.1

Mitochondrial genomeNC_012920.1 annotation

tRNA scan prediction

tRNA scan lists 1- all the tRNAs in the current strand2- all the tRNAs in the complement strandThis tRNA is found at the end of the list

Conclusion

• Good tRNA prediction• If you try: very bad protein-coding gene

prediction….– Mitochondrial genome has not the same sequence

content (codon biais, signals) compare to the nuclear genome.

– You might try with ‘prokaryota’-like gene model, but the results are not perfect… !


Recommended