+ All Categories
Home > Education > In silico analysis for unknown data

In silico analysis for unknown data

Date post: 11-Apr-2017
Category:
Upload: santosh-rama-bhadra-tata
View: 207 times
Download: 0 times
Share this document with a friend
36
Transcript
Page 1: In silico analysis for unknown data
Page 2: In silico analysis for unknown data

In-silico Analysis for Unknown Data

-Tata Santosh Rama Bhadra RaoAgri Biotech Foundation

Page 3: In silico analysis for unknown data

What is Bioinformatics?

Mathematics and Statistics

Biology

Computer Science

Page 4: In silico analysis for unknown data

"All aspects of gathering, storing, handling, analyzing, interpreting and spreading vast amounts of biological information in databases. The information involved includes gene sequences, biological activity/function, pharmacological activity, biological structure, molecular structure, protein-protein interactions, and gene expression. Bioinformatics uses powerful computers and statistical techniques to accomplish research objectives, for example, to discover a new pharmaceutical or herbicide."

What is bioinformatics?

Page 5: In silico analysis for unknown data

Task flow• Data what we have• Search for simlar data in available data base• Clustal- W• Phylogenetic analysis• Classification• Structural analysis• Functional analysis• Reporting

Page 6: In silico analysis for unknown data

Data Outcome

• That may be a nucleotide sequence such as m-RNA or gene or genome or protein sequence.

• Mostly 16s m-RNA is used to classify a gene or species.

• With Forward and reverse sequences it will more accurate.

• We can check with protein also.

Page 7: In silico analysis for unknown data

Genetic code table

Page 8: In silico analysis for unknown data

Sample for DNA isolation

1

DNA

2 3

Page 9: In silico analysis for unknown data

DNA

Symbol Meaning Explanation

G G Guanine

A A Adenine

T T Thymine

C C Cytosine

R A or G puRine

Y C or T pYrimidine

N A, C, G or T Any base

Double helix

5’

3’

3’

5’

A C G T C A T G

T G C A G T A C

RNA5’ 3’A C G U C A U G

template

U U Uracil

Page 10: In silico analysis for unknown data

Isolation of the gene of interest from unknown sample

cDNA library construction kit from Stratagene

1st strand cDNA preparationand mRNA removal

AAAA

AAAAAAAATTTT

AAAATTTT

Removal of commonly hybridized population bymagnetic separation

Differentially up-regulatedmRNA population

Commonly expressed mRNA population

Control mRNA

AAAATTTT

TTTTAAAATTTT

AAAA

AAAATTTT

AAAAAAAA

AAAAAAAA

TTTTTTTT

TTTTTTTT

stress mRNA

Hybridization of stress mRNA with excess ofcomplementary 1st strand control cDNA

TTTT TTTT

Page 11: In silico analysis for unknown data

Gene and protein of EIF4A

ATGGCGGCGSCCACCACSTCCCGCCGCGGCGCCGGCGCCTCCCGCAGCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCGGGTGTCGAGGTCGTCAGCAGCTTCGACCAGATGGGGATCAAGGACGACCTCCTCCGCGGCATCTACGGCTACGGGTTCGAGAAGCCCTCCGCCATCCAGCAGCGCGCCGTCCTCCCCATCATCAACGGACGCGACGTCATCGCGCAGGCCCAGTCCGGCACCGGGAAGTCATCCATGATCTCACTCACCGTATGCCAGATCGTCGACACCGCAGTCCGCGAGGTCCAGGCTCTGATCCTCTCACCCACCAGGGAGCTCGCTTCGCAGACAGAGAAGGTTATGCTGGCTGTCGGCGACTACCTCAATATCCAAGTGCACGCTTGCATTGGTGGGAAAAGTATCAGCGAGGATATCAGGAGGCTTGAGAACGGAGTCCATGTTGTCTCTGGGACTCCGGGCAGAGTCTGCGATATGATCAAGAGGAGGACCCTGCGGACAAGAGCCATCAAGCTTCTAGTTCTGGATGAGGCTGATGAGATGTTGAGCAGAGGCTTTAAGGATCAGATTTACGATGTCTACAGATACCTCCCACCCGAACTTCAGGTCGTTTTGATCTCCGCCACTCTTCCTCACGAGATCCTAGAGATGACTAGCAAGTTCATGACCGAACCAGTTAGGATCCTTGTGAAGCGTGATGAGTTGACCCTGGAGGGTATCAAACAATTCTTCGTTGCTGTTGAGAAAGAGGAATGGAAGTTTGATACGCTGTGTGATCTTTATGATACGTTGACCATCACCCAAGCTGTTATTTTCTGCAATACTAAGAGAAAGGTGGATTGGCTTACTGAAAGAATGCGCAGCAATAACTTCACAGTATCAGCTATGCATGGTGACATGCCCCAACAGGAAAGGGATGCCATCATGACAGAGTTCAGGTCTGGTGCAACTCGTGTGCTAATCACTACGGATGTTTGGGCTCGAGGGCTGGATGTTCAGCAGGTTTCACTTGTCATAAATTATGATCTCCCAAATAATCGTGAGCTTTACATCCATCGCATCGGTCGCTCTGGTCGTTTTGGGCGCAAGGGTGTGGCGATCAATTTTGTGCGCAAGGATGACATCCGTATCCTGAGGGATATAGAACAGTACTACAGCACACAAATTGATGAGATGCCAATGAATGTTGCTGATCTAATTTGA

"MAAXTTSRRGAGASRSMDDENLTFETSPGVEVVSSFDQMGIKDDLLRGIYGYGFEKPSAIQQRAVLPIINGRDVIAQAQSGTGKSSMISLTVCQIVDTAVREVQALILSPTRELASQTEKVMLAVGDYLNIQVHACIGGKSISEDIRRLENGVHVVSGTPGRVCDMIKRRTLRTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEPVRILVKRDELTLEGIKQFFVAVEKEEWKFDTLCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQQERDAIMTEFRSGATRVLITTDVWARGLDVQQVSLVINYDLPNNRELYIHRIGRSGRFGRKGVAINFVRKDDIRILRDIEQYYSTQIDEMPMNVADLI"

Page 12: In silico analysis for unknown data

In-silico generated protein structures

Page 13: In silico analysis for unknown data

13

ABOUT THE GENE AND PROTEINE

GENE LENGTH : 1224bp

INTRONS NUMBER : 7

EXON NUMBER : 8

GENE MOLECULAR WEIGHT : 378411.66 - 378491.72 Daltons

PROTEIN LENGTH : 407 AA

MOLECULAR WEIGHT : 45.2KDA

ISO ELECTIC POINT : 6.10

Page 14: In silico analysis for unknown data

Search for simlar data in available data base

• The date will subjected for similar data search in NCBI or Phytozome or some more available databases with BLAST tool.

• Download the data from the data base.Note: • always keep data in notepad for working

convenience.• Now we are presenting unpublished data.

Page 15: In silico analysis for unknown data

BLAST

Page 16: In silico analysis for unknown data
Page 17: In silico analysis for unknown data

Clustal- W

• Now the finalized data will subject to Clustal alignment for sequence similarity.

• Clustal- W is the tool for searching and mapping more similarities in sequences.

• This may allow for nucleotide sequences and proteins.

• Mostly protein sequences are subjected for the alignment for accuracy.

Page 18: In silico analysis for unknown data

Mega

Page 19: In silico analysis for unknown data

SB4g RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232SACetif RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232ZEAMMB73 RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 231SIDb RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEITSKFMTEP 232PgeiF4a RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 232OS3g RTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDP 229H RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHDILEITSKFMTDP 237Phys RTRSIKLLILDESDEMLSRGFKDQIYDVYRYLPPELQVVLVSATLPHEILEMTNKFMTDP 222Jat RTRAIRLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 235RC RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 232GM RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 232PHAVU RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231CA RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231M RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231CS-EIF4A-3-like RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTNKFMTDP 235MD RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTNKFMTEP 227 ***:*::*:***:****.****************:*** *:*****::***:*.****:*

Alignment for retrieved sequences

Page 20: In silico analysis for unknown data

Phylogenetic analysis

• After alignment the data will subject for the phylogenetic analysis.

• Here the relation between the data source will be evaluated.

• Most similar sequence will place near the sequence less similar sequence will place in distance.

• By counting the distance we can measure the relation between data source.

Page 21: In silico analysis for unknown data

Phylogenetic tree

Page 22: In silico analysis for unknown data

22

Fig 1: 20-404: P-LOOP COTAINIG NUCLIOSIDE TRIOSE PHOSPATE HYDROLASE(ipr027417).34-62: RNA- HELICASE, DEAD BOX TYPE Q-MOTIF (IPR014014).246-407: HELICASE C-TERMINAL (IPR001650)183-186: REPRESENCE OF DEAD AMINO ACIDS

ATG GCG GCG SCC ACC ACS TCC CGC CGC GGC GCC GGC GCC TCC CGC AGC ATG GAC GAC GAG AAC CTC ACC TTC

M A A X T T S R R G A G A S R S M D D E N L T F 24

GAG ACC TCC CCG GGT GTC GAG GTC GTC AGC AGC TTC GAC CAG ATG GGG ATC AAG GAC GAC CTC CTC CGC GGC

E T S P G V E V V S S F D Q M G I K D D L L R G 48

ATC TAC GGC TAC GGG TTC GAG AAG CCC TCC GCC ATC CAG CAG CGC GCC GTC CTC CCC ATC ATC AAC GGA CGC

I Y G Y G F E K P S A I Q Q R A V L P I I N G R

GAC GTC ATC GCG CAG GCC CAG TCC GGC ACC GGG AAG TCA TCC ATG ATC TCA CTC ACC GTA TGC CAG ATC GTC

D V I A Q A Q S G T G K S S M I S L T V C Q I V

GAC ACC GCA GTC CGC GAG GTC CAG GCT CTG ATC CTC TCA CCC ACC AGG GAG CTC GCT TCG CAG ACA GAG AAG

D T A V R E V Q A L I L S P T R E L A S Q T E K

GTT ATG CTG GCT GTC GGC GAC TAC CTC AAT ATC CAA GTG CAC GCT TGC ATT GGT GGG AAA AGT ATC AGC GAG

V M L A V G D Y L N I Q V H A C I G G K S I S E

GAT ATC AGG AGG CTT GAG AAC GGA GTC CAT GTT GTC TCT GGG ACT CCG GGC AGA GTC TGC GAT ATG ATC AAG

D I R R L E N G V H V V S G T P G R V C D M I K

AGG AGG ACC CTG CGG ACA AGA GCC ATC AAG CTT CTA GTT CTG GAT GAG GCT GAT GAG ATG TTG AGC AGA GGC

R R T L R T R A I K L L V L D E A D E M L S R G

TTT AAG GAT CAG ATT TAC GAT GTC TAC AGA TAC CTC CCA CCC GAA CTT CAG GTC GTT TTG ATC TCC GCC ACT

F K D Q I Y D V Y R Y L P P E L Q V V L I S A T

CTT CCT CAC GAG ATC CTA GAG ATG ACT AGC AAG TTC ATG ACC GAA CCA GTT AGG ATC CTT GTG AAG CGT GAT

L P H E I L E M T S K F M T E P V R I L V K R D

GAG TTG ACC CTG GAG GGT ATC AAA CAA TTC TTC GTT GCT GTT GAG AAA GAG GAA TGG AAG TTT GAT ACG CTG

E L T L E G I K Q F F V A V E K E E W K F D T L

TGT GAT CTT TAT GAT ACG TTG ACC ATC ACC CAA GCT GTT ATT TTC TGC AAT ACT AAG AGA AAG GTG GAT TGG

C D L Y D T L T I T Q A V I F C N T K R K V D W

CTT ACT GAA AGA ATG CGC AGC AAT AAC TTC ACA GTA TCA GCT ATG CAT GGT GAC ATG CCC CAA CAG GAA AGG

L T E R M R S N N F T V S A M H G D M P Q Q E R

GAT GCC ATC ATG ACA GAG TTC AGG TCT GGT GCA ACT CGT GTG CTA ATC ACT ACG GAT GTT TGG GCT CGA GGG

D A I M T E F R S G A T R V L I T T D V W A R G

CTG GAT GTT CAG CAG GTT TCA CTT GTC ATA AAT TAT GAT CTC CCA AAT AAT CGT GAG CTT TAC ATC CAT CGC

L D V Q Q V S L V I N Y D L P N N R E L Y I H R

ATC GGT CGC TCT GGT CGT TTT GGG CGC AAG GGT GTG GCG ATC AAT TTT GTG CGC AAG GAT GAC ATC CGT ATC

I G R S G R F G R K G V A I N F V R K D D I R I

CTG AGG GAT ATA GAA CAG TAC TAC AGC ACA CAA ATT GAT GAG ATG CCA ATG AAT GTT GCT GAT CTA ATT TGA

L R D I E Q Y Y S T Q I D E M P M N V A D L I *

Page 23: In silico analysis for unknown data

Structural analysis

• Structural analysis will conduct for protein through homology modeling & docking.

• The protein sequence secondary structure and tertiary structure analysis must be done.

• This structure analysis must be evaluated under Nuclear magnetic resonance score and X-Ray crystallographic score.

• Ramachandra plot is more important for structural validation.

Page 24: In silico analysis for unknown data

24

Insilco analysis of eIF4AHomology modeling: by using Modeller 9.12 version we have designed structure of eIF4A Pennisetum glaucum

α-helics

β- pleated sheets DEAD box

motif

Fig: Homology modeling of amino acid sequence of eiF4A from P. glaucum revealing the signature motifs of DEAD box and Mg2+ binding sites. eiF4A showed the ----helices and --------sheets.

Page 25: In silico analysis for unknown data

Nuclear magnetic resonance analysis for protein structure

Page 26: In silico analysis for unknown data

REPRESENTATION OF RAMA CHANDRAN PLOT FOR RICE AND PEARL MILLET EIF4A STRUCTURES DONE BY PROCHECK

Page 27: In silico analysis for unknown data

Peptide position and bonds

Page 28: In silico analysis for unknown data

Functional analysis

• Functional analysis will be done with domain and conserved motifs and active site analysis.

• These are evaluated with docking and amino acid composition.

• Depend on αhelices β-pleated sheets the protein structure can be obtained.

Page 29: In silico analysis for unknown data

29

Docking analysis and motif localization in Pennisetum glaccum EIF4A

Docking analysis was performed by using Sybil 6.7 version for motif analysis and structural stability.

Page 30: In silico analysis for unknown data

Rice and pearl millet Active sites, Motifs and Domains of eif4a respectively done by docking studies

Page 31: In silico analysis for unknown data

Classification

• Functional analysis and structural analysis can classify our protein.

• At first we got the relation of the protein through phylogenetic analysis.

• Now with structural and functional characters can be include and clear classification will be performed.

Page 32: In silico analysis for unknown data

Reporting

• Now the data which was evaluated in a way with accuracy you can publish or report.

• So many submissions and sequence uploads are taking place at various levels.

• Genes are reporting, proteins are reporting, genomes are also reporting to those databases.

• Those will be available for further research aspects.

Page 33: In silico analysis for unknown data

Conclusion• With In-silico studies you will get 60 to 70%

accuracy of the information regarding your work.

• With this you can confirm whether you are working on proper thing or not before starting your In-vitro studies.

• So you can proceed towards your work with 70% of In-silico information and complete the project with 100% success in .

Page 34: In silico analysis for unknown data

Acknowledgement

• Agri biotech foundation• Department of Biotechnology• Prof . G. Pakkireddy,• Dr. J. S. Bentur• Dr. G. Mallikarjun• My Friends and colleagues• Dearest participants (transformed with high

energy and patience)

Page 35: In silico analysis for unknown data
Page 36: In silico analysis for unknown data

Recommended