+ All Categories
Home > Documents > Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre,...

Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre,...

Date post: 21-Dec-2015
Category:
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
59
Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India Chandigarh, India & & Visiting Professor, Pohang Univ. of Science & Visiting Professor, Pohang Univ. of Science & Technology, Republic of Korea Technology, Republic of Korea Email: [email protected] Email: [email protected] Web: http://www.imtech.res.in/raghava Web: http://www.imtech.res.in/raghava / /
Transcript
Page 1: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Introduction to Bioinformatics

Presented By

Dr G. P. S. RaghavaDr G. P. S. Raghava

Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, IndiaCo-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India

&&Visiting Professor, Pohang Univ. of Science & Technology, Republic of KoreaVisiting Professor, Pohang Univ. of Science & Technology, Republic of Korea

Email: [email protected]: [email protected]

Web: http://www.imtech.res.in/raghavaWeb: http://www.imtech.res.in/raghava//

Page 2: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Hierarchy in BiologyAtoms

Molecules

Macromolecules

Organelles

Cells

Tissues

Organs

Organ Systems

Individual Organisms

Populations

Communities

Ecosystems

Biosphere

Page 3: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Animal cell

Page 4: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Human ChromosomesHuman Chromosomes

Page 5: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Genes are linearly arranged along Genes are linearly arranged along chromosomeschromosomes

Page 6: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Chromosomes and DNAChromosomes and DNA

Page 7: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

DNA can be simplified DNA can be simplified to a string of four to a string of four

lettersletters

GATTACA

Page 8: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

(RT)

Page 9: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Sequence to Structure:Sequence to Structure:It’s a matter of dimensions!It’s a matter of dimensions!

1D Nucleic acid sequence1D Nucleic acid sequence

AGT-TTC-CCA-GGG…AGT-TTC-CCA-GGG…

1D Protein sequence1D Protein sequence

Met-Ala-Gly-Lys-His…Met-Ala-Gly-Lys-His…M – A – G – K – H…M – A – G – K – H…

3D Spatial arrangement of atoms3D Spatial arrangement of atoms

Page 10: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Genome AnnotationGenome Annotation

The Process of Adding Biology Information andThe Process of Adding Biology Information and

Predictions to a Sequenced Genome FrameworkPredictions to a Sequenced Genome Framework

Page 11: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

What we are doing?What we are doing? FTG:FTG: A web server for locating probable protein coding region A web server for locating probable protein coding region

in nucleotide sequence using fourier tranform approach (Issac, in nucleotide sequence using fourier tranform approach (Issac, B., Singh, H., Kaur, H. and Raghava, G.P.S. (2002) B., Singh, H., Kaur, H. and Raghava, G.P.S. (2002) Bioinformatics 18:196). Bioinformatics 18:196).    

EGPred:SimilarityEGPred:Similarity Aided Aided AbAb InitioInitio Method of Gene Prediction Method of Gene Prediction This server allows to predict gene (protein coding regions) in This server allows to predict gene (protein coding regions) in eukaryote genomes that includes introns and exons, using eukaryote genomes that includes introns and exons, using similarity aided (double) and consensus Ab Intion methods similarity aided (double) and consensus Ab Intion methods (Issac B and Raghava GPS (2004) Genome Research (In (Issac B and Raghava GPS (2004) Genome Research (In press)). press)).    

SVMgeneSVMgene: : It is a support vector based approach to identify the It is a support vector based approach to identify the protein coding regions in human genomic DNA. protein coding regions in human genomic DNA.    

SRF: SRF: Spectral Repeat Finder (SRF) is a program to find repeats Spectral Repeat Finder (SRF) is a program to find repeats through an analysis of the power spectrum of a given DNA through an analysis of the power spectrum of a given DNA sequence. By repeat we mean the repeated occurrence of a sequence. By repeat we mean the repeated occurrence of a segment of N nucleotides within a DNA sequence. SRF is an ab segment of N nucleotides within a DNA sequence. SRF is an ab initio technique as no prior assumptions need to be made initio technique as no prior assumptions need to be made regarding either the repeat length, its fidelity, or whether the regarding either the repeat length, its fidelity, or whether the repeats are in tandem or not (Sharma et al. (2004) repeats are in tandem or not (Sharma et al. (2004) Bioinformatics, In Press).. Bioinformatics, In Press)..

Page 12: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Protein Sequence Alignment and Database Protein Sequence Alignment and Database SearchingSearching

Alignment of Two Sequences (Pair-wise Alignment)Alignment of Two Sequences (Pair-wise Alignment)– The Scoring Schemes or Weight MatricesThe Scoring Schemes or Weight Matrices– Techniques of AlignmentsTechniques of Alignments– DOTPLOTDOTPLOT

Multiple Sequence Alignment (Alignment of > 2 Multiple Sequence Alignment (Alignment of > 2 Sequences)Sequences)

–Extending Dynamic Programming to more sequencesExtending Dynamic Programming to more sequences–Progressive Alignment (Tree or Hierarchical Methods)Progressive Alignment (Tree or Hierarchical Methods)–Iterative TechniquesIterative Techniques

Stochastic Algorithms (SA, GA, HMM)Stochastic Algorithms (SA, GA, HMM) Non Stochastic AlgorithmsNon Stochastic Algorithms

Database ScanningDatabase Scanning– FASTA, BLAST, PSIBLAST, ISSFASTA, BLAST, PSIBLAST, ISS

Alignment of Whole GenomesAlignment of Whole Genomes– MUMmer (Maximal Unique Match)MUMmer (Maximal Unique Match)

Page 13: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

What we are doing?What we are doing?

GWFASTA: GWFASTA: Genome Wise Sequence Similarity Search Genome Wise Sequence Similarity Search using FASTA. It allow user to search their sequence using FASTA. It allow user to search their sequence against sequenced genomes and their product against sequenced genomes and their product proteome. This integrate various tools which allows proteome. This integrate various tools which allows analysys of FASTA search (Issac, B. and Raghava, G.P.S. analysys of FASTA search (Issac, B. and Raghava, G.P.S. (2002) Biotechniques 33:548-56) (2002) Biotechniques 33:548-56)

GWBLAST: GWBLAST: A genome wide blast server. It allow user to A genome wide blast server. It allow user to search ther sequence against sequenced genomes and search ther sequence against sequenced genomes and annonated proteomes. This integrate various tools annonated proteomes. This integrate various tools which allows analysys of BLAST SEARCH which allows analysys of BLAST SEARCH

Protein Sequence Analysis Protein Sequence Analysis -> This server allow user to -> This server allow user to analysis of protein sequence and present the analysis in analysis of protein sequence and present the analysis in Graphical and Textual format. This allows property plots Graphical and Textual format. This allows property plots of 36 parameter (like Hydrophobicity Plot, Polarity, of 36 parameter (like Hydrophobicity Plot, Polarity, Charge) of single aminoacid sequence and multiple Charge) of single aminoacid sequence and multiple sequence alignment (Raghava, G.P.S. (2001) Biotech sequence alignment (Raghava, G.P.S. (2001) Biotech Software and Internet Report, 2:255). Software and Internet Report, 2:255).

RPFOLD: Recognition of Protein Fold RPFOLD: Recognition of Protein Fold -> RPFOLD server -> RPFOLD server allows to predict top 5 similar fold in PDB (Protein allows to predict top 5 similar fold in PDB (Protein DataBank) for a ginen protein sequence (query)DataBank) for a ginen protein sequence (query)

OXBench:OXBench: Evaluation of protein multiple sequence Evaluation of protein multiple sequence alignment (Raghava et al. alignment (Raghava et al. BMC Bioinformatics 4:47) BMC Bioinformatics 4:47) . .

Page 14: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Traditional ProteomicsTraditional Proteomics

1D gel electrophoresis (SDS-PAGE)1D gel electrophoresis (SDS-PAGE) 2D gel electrophoresis2D gel electrophoresis Protein ChipsProtein Chips

– Chips coated with proteins/AntibodiesChips coated with proteins/Antibodies– large scale version of ELISAlarge scale version of ELISA

Mass SpectrometryMass Spectrometry– MALDI: Mass fingerprintingMALDI: Mass fingerprinting– Electrospray and tandem mass Electrospray and tandem mass

spectrometryspectrometry Sequencing of Peptides (N->C)Sequencing of Peptides (N->C) Matching in Genome/Proteome DatabasesMatching in Genome/Proteome Databases

Page 15: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Overview of 2D Gel Overview of 2D Gel SDS-PAGE + Isoelectric focusing (IEF)SDS-PAGE + Isoelectric focusing (IEF)

– Gene Expression StudiesGene Expression Studies– Medical Applications Medical Applications – Sample ExperimentsSample Experiments

Capturing and Analyzing DataCapturing and Analyzing Data– Image AcquistionImage Acquistion– Image Sizing & OrientationImage Sizing & Orientation– Spot IdentificationSpot Identification– Matching and AnalysisMatching and Analysis

Page 16: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Comparision/Matcing of Gel Comparision/Matcing of Gel ImagesImages

Compare 2 gel imagesCompare 2 gel images– Set X and y axisSet X and y axis– Overlap matching spotsOverlap matching spots– Compare intensity of spotsCompare intensity of spots

Scan against databaseScan against database– Compare query gel with all gelsCompare query gel with all gels– Calculate similarity scoreCalculate similarity score– Sort based on score Sort based on score

Page 17: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Differential Differential Proteomics:Proteomics:

Fingerprints of Fingerprints of DiseaseDisease

PhenotypicPhenotypicChangesChanges

PhenotypicPhenotypicChangesChanges

Normal CellsNormal Cells

Disease CellsDisease Cells

•Differential protein expression• Protein nitration patterns•Altered phosporylation•Altered glycosylation profiles

•Differential protein expression• Protein nitration patterns•Altered phosporylation•Altered glycosylation profiles Utility

•Target discovery•Disease pathways•Disease biomarkers

Utility•Target discovery•Disease pathways•Disease biomarkers

Page 18: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Fingerprinting TechniqueFingerprinting Technique

What is fingerprintingWhat is fingerprinting– It is technique to create specific pattern for a given It is technique to create specific pattern for a given

organism/personorganism/person– To compare pattern of query and target objectTo compare pattern of query and target object– To create Phylogenetic tree/classification based on patternTo create Phylogenetic tree/classification based on pattern

Type of FingerprintingType of Fingerprinting– DNA FingerprintingDNA Fingerprinting– Mass/peptide fingerprintingMass/peptide fingerprinting– Properties based (Toxicity, classification)Properties based (Toxicity, classification)– Domain/conserved pattern fingerprinting Domain/conserved pattern fingerprinting

Common ApplicationsCommon Applications– Paternity and Maternity Paternity and Maternity – Criminal Identification and ForensicsCriminal Identification and Forensics– Personal Identification Personal Identification – Classification/Identification of organismsClassification/Identification of organisms– Classification of cellsClassification of cells

Page 19: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Fingerprinting TechniquesFingerprinting TechniquesWhat we are doing?What we are doing?

AC2DGel: AC2DGel: is a web server for analysis and comparison of two-is a web server for analysis and comparison of two-dimensional electrophoresis (2-DE) Gel images. It helps in dimensional electrophoresis (2-DE) Gel images. It helps in annotating the virual 2-D gel image proteins on the basis of annotating the virual 2-D gel image proteins on the basis of known molecular weight andpH scales of the markers. known molecular weight andpH scales of the markers.    

DNASIZE: Computation of DNA/Protein size DNASIZE: Computation of DNA/Protein size -> This web-server -> This web-server allow to compute the length of DNA or protein fragments from allow to compute the length of DNA or protein fragments from its electropheric mobility using a graphical method (Raghava, its electropheric mobility using a graphical method (Raghava, G. P. S. (2001) Biotech Software and Internet Report, 2:198) G. P. S. (2001) Biotech Software and Internet Report, 2:198)

GMAP: a multipurpose computer program to aid synthetic GMAP: a multipurpose computer program to aid synthetic gene design, cassette mutagenesis and introduction of gene design, cassette mutagenesis and introduction of potential restriction sites into DNA sequences (Raghava GPS potential restriction sites into DNA sequences (Raghava GPS (1994) Biotechniques 16: 1116-1123). (1994) Biotechniques 16: 1116-1123).

DNAOPT : A computer program to aid optimization of gel DNAOPT : A computer program to aid optimization of gel conditions of DNA gel electrophoresis and SDS-PAGE. conditions of DNA gel electrophoresis and SDS-PAGE. (Raghava GPS (1994) Biotechniques 18: 274-81). (Raghava GPS (1994) Biotechniques 18: 274-81).

Page 20: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Concept of Drug and VaccineConcept of Drug and Vaccine

Concept of DrugConcept of Drug– Kill invaders of foreign pathogensKill invaders of foreign pathogens– Inhibit the growth of pathogensInhibit the growth of pathogens

Concept of VaccineConcept of Vaccine– Generate memory cellsGenerate memory cells– Trained immune system to face Trained immune system to face

various existing disease agentsvarious existing disease agents

Page 21: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

VACCINESVACCINES

AA. SUCCESS STORY. SUCCESS STORY::• COMPLETE ERADICATION OF SMALLPOXCOMPLETE ERADICATION OF SMALLPOX• WHO PREDICTION : ERADICATION OF PARALYTICWHO PREDICTION : ERADICATION OF PARALYTIC

POLIO THROUGHOUT THE WORLD BY YEAR 2004POLIO THROUGHOUT THE WORLD BY YEAR 2004• SIGNIFICANT REDUCTION OF INCIDENCE OF DISEASES:SIGNIFICANT REDUCTION OF INCIDENCE OF DISEASES:

DIPTHERIA, MEASLES, MUMPS, PERTUSSIS, RUBELLA,DIPTHERIA, MEASLES, MUMPS, PERTUSSIS, RUBELLA,POLIOMYELITIS, TETANUSPOLIOMYELITIS, TETANUS

B.NEED OF AN HOURB.NEED OF AN HOUR1) SEARCH FOR NONAVAILABILE EFFECTIVE VACCINES FOR 1) SEARCH FOR NONAVAILABILE EFFECTIVE VACCINES FOR

DISEASES LIKE: DISEASES LIKE: MALARIA, TUBERCULOSIS AND AIDSMALARIA, TUBERCULOSIS AND AIDS

2) IMPROVEMENT IN SAFETY AND EFFICACY OF PRESENT2) IMPROVEMENT IN SAFETY AND EFFICACY OF PRESENTVACCINESVACCINES3) LOW COST3) LOW COST4) EFFICIENT DELIVERY TO NEEDY4) EFFICIENT DELIVERY TO NEEDY5) REDUCTION OF ADVERSE SIDE EFFECTS5) REDUCTION OF ADVERSE SIDE EFFECTS

Page 22: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Computer Aided Vaccine Computer Aided Vaccine DesignDesign

Whole Organism of PathogenWhole Organism of Pathogen– Consists more than 4000 genes and Consists more than 4000 genes and

proteinsproteins– Genomes have millions base pairGenomes have millions base pair

Target antigen to recognise pathogenTarget antigen to recognise pathogen– Search vaccine target (essential and non-Search vaccine target (essential and non-

self)self)– Consists of amino acid sequence (e.g. A-V-L-Consists of amino acid sequence (e.g. A-V-L-

G-Y-R-G-C-T ……)G-Y-R-G-C-T ……) Search antigenic region (peptide of Search antigenic region (peptide of

length 9 amino acids)length 9 amino acids)

Page 23: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Major steps of endogenous antigen processing

Page 24: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Computer Aided Vaccine Computer Aided Vaccine DesignDesign

Problem of Pattern RecognitionProblem of Pattern Recognition– ATGGTRDAR ATGGTRDAR EpitopeEpitope– LMRGTCAAYLMRGTCAAY Non-epitopeNon-epitope– RTTGTRAWR RTTGTRAWR EpitopeEpitope– EMGGTCAAYEMGGTCAAY Non-epitopeNon-epitope– ATGGTRKAR ATGGTRKAR EpitopeEpitope– GTCVGYATTGTCVGYATT EpitopeEpitope

Commonly used techniquesCommonly used techniques– Statistical (Motif and Matrix)Statistical (Motif and Matrix)– AI TechniquesAI Techniques

Page 25: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Why computational tools are required for prediction.

200 aa proteins

Chopped to overlapping peptides of 9 amino acids

192 peptides

invitro or invivo experiments for detecting which snippets of protein will spark an immune response.

10-20 predicted peptides

Bioinformatics Tools

Page 26: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Immunounformatics: Computer Aided Vaccine Immunounformatics: Computer Aided Vaccine DesignDesign

What we are doing?What we are doing?

MHC Class II binding peptide MHC Class II binding peptide -> Matrix Optimization Technique for Predicting -> Matrix Optimization Technique for Predicting MHC binding Core (Singh, H. and Raghava, G. P. S. (2002) Biotech Software and MHC binding Core (Singh, H. and Raghava, G. P. S. (2002) Biotech Software and Internet Report, 3:146) Internet Report, 3:146)    

MMBPredMMBPred Prediction of of MHC class I binders which can bind to wide range of Prediction of of MHC class I binders which can bind to wide range of MHC alleles with high affinity. This server has potential to develop sub-unit MHC alleles with high affinity. This server has potential to develop sub-unit vaccine for large population (Bhasin, M., and Raghava, G.P.S. (2003) Hybridoma vaccine for large population (Bhasin, M., and Raghava, G.P.S. (2003) Hybridoma and Hybridomics 22: 229) and Hybridomics 22: 229)    

nHLAPrednHLAPred: Prediction of MHC Class I Restricted T Cell Epitopes: Prediction of MHC Class I Restricted T Cell Epitopes -> This server -> This server allow to predict binding peptide for 67 MHC Class I alleles. This also allow to allow to predict binding peptide for 67 MHC Class I alleles. This also allow to predict the proteasome cleavage site and binding peptide that have cleavage predict the proteasome cleavage site and binding peptide that have cleavage site at C terminus (potential T cell epitopes). This uses the hybrid approach for site at C terminus (potential T cell epitopes). This uses the hybrid approach for prediction (Neural Network + Quantitative Matrix) prediction (Neural Network + Quantitative Matrix)    

ProPred1: Prediction of MHC Class I binding peptideProPred1: Prediction of MHC Class I binding peptide -> The aim of this server is -> The aim of this server is to predict MHC Class-I binding regions in an antigen sequence (Singh, H. and to predict MHC Class-I binding regions in an antigen sequence (Singh, H. and Raghava, G.P.S. (2003) Bioinformatics, 19: 1009) Raghava, G.P.S. (2003) Bioinformatics, 19: 1009)    

ProPredProPred: Prediction of MHC Class II binding peptide: Prediction of MHC Class II binding peptide -> The aim of this server is -> The aim of this server is to predict MHC Class-II binding regions in an antigen sequence (Singh, H. and to predict MHC Class-II binding regions in an antigen sequence (Singh, H. and Raghava, G. P. S. (2001) Bioinformatics 17: 1236) Raghava, G. P. S. (2001) Bioinformatics 17: 1236)    

CTLPredCTLPred:: Direct method of prediction of CTL Epitopes in an antigen sequence. Direct method of prediction of CTL Epitopes in an antigen sequence. This server utlize the machine learning techniques Support Vector This server utlize the machine learning techniques Support Vector Machine(SVM) and Aritificial Neural Network (ANN) for prediction (Bhasin, M. and Machine(SVM) and Aritificial Neural Network (ANN) for prediction (Bhasin, M. and Raghava, G. P. S. (2004) Vaccine (In Press)) Raghava, G. P. S. (2004) Vaccine (In Press))    

Page 27: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Immunounformatics: Computer Aided Vaccine Immunounformatics: Computer Aided Vaccine DesignDesign

What we are doing?What we are doing?

HLADR4Pred: HLADR4Pred: SVM and ANN based methods for predicting HLA-DRB1*0401 SVM and ANN based methods for predicting HLA-DRB1*0401 binding peptides in an Antigen Sequence (Bhasin, M. and Raghava, G.P.S. (2003) binding peptides in an Antigen Sequence (Bhasin, M. and Raghava, G.P.S. (2003) Bioinformatics 20:421Bioinformatics 20:421). ).    

TAPPredTAPPred: : TAPPred is an on-line service for predicting binding affinity of peptides TAPPred is an on-line service for predicting binding affinity of peptides toward the TAP transporter. The Prediction is based on cascade SVM, using toward the TAP transporter. The Prediction is based on cascade SVM, using sequence and properties of the the amino acids(Bhasin, M. and Raghava, G. P. sequence and properties of the the amino acids(Bhasin, M. and Raghava, G. P. S. (2004) Protein Science 13:596-607). S. (2004) Protein Science 13:596-607).        

ABCpredABCpred: : server is to predict linear B cell epitope regions in an antigen server is to predict linear B cell epitope regions in an antigen sequence, using artificial neural network. This server will assist in locating sequence, using artificial neural network. This server will assist in locating epitope regions that are useful in selecting synthetic vaccine candidates, epitope regions that are useful in selecting synthetic vaccine candidates, disease diagonosis and also in allergy research. disease diagonosis and also in allergy research.

MHCBN: MHCBN: The MHCBN is a curated database consisting of detailed information The MHCBN is a curated database consisting of detailed information about Major Histocompatibility Complex (MHC) Binding,Non-binding peptides about Major Histocompatibility Complex (MHC) Binding,Non-binding peptides and T-cell epitopes.The version 3.1 of database provides information about and T-cell epitopes.The version 3.1 of database provides information about peptides interacting with TAP and MHC linked autoimmune diseases (Bhasin, M., peptides interacting with TAP and MHC linked autoimmune diseases (Bhasin, M., Singh, H. and Raghava, G. P. S. (2003) Bioinformatics 19: 665). This databse is Singh, H. and Raghava, G. P. S. (2003) Bioinformatics 19: 665). This databse is also launched by European Bioinformatics Institute (EBI) Hinxton, Cambridge, also launched by European Bioinformatics Institute (EBI) Hinxton, Cambridge, UK. UK.    

BCIPepBCIPep: : is collection of the peptides having the role in Humoral immunity. The is collection of the peptides having the role in Humoral immunity. The peptides in the database has varying measure of immunogenicity.This database peptides in the database has varying measure of immunogenicity.This database can assist in the development of method for predicting B cell epitopes, can assist in the development of method for predicting B cell epitopes, desigining synthetic vaccines and in disease diagnosis. This databse is also desigining synthetic vaccines and in disease diagnosis. This databse is also launched by European Bioinformatics Institute (EBI) Hinxton, Cambridge, UK.launched by European Bioinformatics Institute (EBI) Hinxton, Cambridge, UK.

Page 28: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Drug Design Drug Design

History of Drug/Vaccine developmentHistory of Drug/Vaccine development– Plants or Natural ProductPlants or Natural Product

Plant and Natural products were source for medical Plant and Natural products were source for medical substancesubstance

Example: foxglove used to treat congestive heart failureExample: foxglove used to treat congestive heart failure Foxglove contain digitalis and cardiotonic glycosideFoxglove contain digitalis and cardiotonic glycoside Identification of active componentIdentification of active component

– Accidental ObservationsAccidental Observations Penicillin is one good examplePenicillin is one good example Alexander Fleming observed the effect of moldAlexander Fleming observed the effect of mold Mold(Penicillium) produce substance penicillinMold(Penicillium) produce substance penicillin Discovery of penicillin lead to large scale screeningDiscovery of penicillin lead to large scale screening Soil micoorganism were grown and testedSoil micoorganism were grown and tested Streptomycin, neomycin, gentamicin, tetracyclines etc.Streptomycin, neomycin, gentamicin, tetracyclines etc.

– Chemical Modification of Known DrugsChemical Modification of Known Drugs Drug improvement by chemical modificationDrug improvement by chemical modification Pencillin G -> Methicillin; morphine->nalorphinePencillin G -> Methicillin; morphine->nalorphine

Page 29: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

A simple exampleA simple example

Protein

Small molecule drug

ProteinProtein disabled … disease cured

Page 30: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

ChemoinformaticChemoinformaticss

ProteinSmall molecule drug

Bioinformatics

•Large databases

•Not all can be drugs

•Opportunity for data mining techniques

•Large databases

•Not all can be drug targets

•Opportunity for data mining techniques

Page 31: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Drug Discovery & DevelopmentDrug Discovery & DevelopmentIdentify disease

Isolate proteininvolved in disease (2-5 years)

Find a drug effectiveagainst disease protein(2-5 years)

Preclinical testing(1-3 years)

Formulation

Human clinical trials(2-10 years)

Scale-up

FDA approval(2-3 years)

File

IN

D

File

NDA

Page 32: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Techology is impacting this Techology is impacting this processprocess

Identify disease

Isolate protein

Find drug

Preclinical testing

GENOMICS, PROTEOMICS & BIOPHARM.

HIGH THROUGHPUT SCREENING

MOLECULAR MODELING

VIRTUAL SCREENING

COMBINATORIAL CHEMISTRY

IN VITRO & IN SILICO ADME MODELS

Potentially producing many more targetsand “personalized” targets

Screening up to 100,000 compounds aday for activity against a target protein

Using a computer topredict activity

Rapidly producing vast numbersof compounds

Computer graphics & models help improve activity

Tissue and computer models begin to replace animal testing

Page 33: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

1. Gene Chips1. Gene Chips ““Gene chips” allow Gene chips” allow

us to look for us to look for changes in protein changes in protein expression for expression for different people with different people with a variety of a variety of conditions, and to conditions, and to see if the presence of see if the presence of drugs changes that drugs changes that expressionexpression

Makes possible the Makes possible the design of drugs to design of drugs to target different target different phenotypesphenotypes

compounds administered

people / conditions

e.g. obese, cancer, caucasian

expression profile

(screen for 35,000 genes)

Page 34: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

BiopharmaceuticalsBiopharmaceuticals Drugs based on proteins, peptides or natural Drugs based on proteins, peptides or natural

products instead of small molecules products instead of small molecules (chemistry)(chemistry)

Pioneered by biotechnology companiesPioneered by biotechnology companies

Biopharmaceuticals can be quicker to discover Biopharmaceuticals can be quicker to discover than traditional small-molecule therapiesthan traditional small-molecule therapies

Biotechs now paring up with major Biotechs now paring up with major pharmaceutical companiespharmaceutical companies

Page 35: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

2. High-Throughput 2. High-Throughput ScreeningScreening

Screening perhaps millions of compounds in a corporate collection to see if any show activity against a certain disease protein

Page 36: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

High-Throughput High-Throughput ScreeningScreening

Drug companies now have millions of samples of Drug companies now have millions of samples of chemical compoundschemical compounds

High-throughput screening can test 100,000 High-throughput screening can test 100,000 compounds a day for activity against a protein compounds a day for activity against a protein targettarget

Maybe tens of thousands of these compounds Maybe tens of thousands of these compounds will show some activity for the proteinwill show some activity for the protein

The chemist needs to intelligently select the 2 - 3 The chemist needs to intelligently select the 2 - 3 classes of compounds that show the most classes of compounds that show the most promise for being drugs to follow-uppromise for being drugs to follow-up

Page 37: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Informatics ImplicationsInformatics Implications Need to be able to store chemical structure and biological Need to be able to store chemical structure and biological

data for millions of datapointsdata for millions of datapoints– Computational representation of 2D structureComputational representation of 2D structure

Need to be able to organize thousands of active Need to be able to organize thousands of active compounds into meaningful groupscompounds into meaningful groups– Group similar structures together and relate to activityGroup similar structures together and relate to activity

Need to learn as much information as possible from the Need to learn as much information as possible from the data (data mining)data (data mining)– Apply statistical methods to the structures and related Apply statistical methods to the structures and related

informationinformation

Page 38: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

3. Computational Models of 3. Computational Models of ActivityActivity

Machine Learning MethodsMachine Learning Methods– E.g. Neural nets, Bayesian nets, SVMs, Kahonen netsE.g. Neural nets, Bayesian nets, SVMs, Kahonen nets– Train with compounds of known activityTrain with compounds of known activity– Predict activity of “unknown” compoundsPredict activity of “unknown” compounds

Scoring methodsScoring methods– Profile compounds based on properties related to targetProfile compounds based on properties related to target

Fast DockingFast Docking– Rapidly “dock” 3D representations of molecules into 3D Rapidly “dock” 3D representations of molecules into 3D

representations of proteins, and score according to how representations of proteins, and score according to how well they bindwell they bind

Page 39: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

4. Combinatorial 4. Combinatorial ChemistryChemistry

By combining molecular “building blocks”, By combining molecular “building blocks”, we can create very large numbers of we can create very large numbers of different molecules very quickly.different molecules very quickly.

Usually involves a “scaffold” molecule, Usually involves a “scaffold” molecule, and sets of compounds which can be and sets of compounds which can be reacted with the scaffold to place different reacted with the scaffold to place different structures on “attachment points”.structures on “attachment points”.

Page 40: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Combinatorial Chemistry Combinatorial Chemistry IssuesIssues

Which R-groups to chooseWhich R-groups to choose

Which libraries to makeWhich libraries to make– ““Fill out” existing compound collection?Fill out” existing compound collection?– Targeted to a particular protein?Targeted to a particular protein?– As many compounds as possible?As many compounds as possible?

Computational profiling of libraries can Computational profiling of libraries can helphelp– ““Virtual libraries” can be assessed on computerVirtual libraries” can be assessed on computer

Page 41: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

5. Molecular Modeling5. Molecular Modeling

• 3D Visualization of interactions between compounds and proteins• “Docking” compounds into proteins computationally

Page 42: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

3D Visualization3D Visualization

X-ray crystallography and NMR Spectroscopy X-ray crystallography and NMR Spectroscopy can reveal 3D structure of protein and bound can reveal 3D structure of protein and bound compoundscompounds

Visualization of these “complexes” of proteins Visualization of these “complexes” of proteins and potential drugs can help scientists and potential drugs can help scientists understand the mechanism of action of the drug understand the mechanism of action of the drug and to improve the design of a drugand to improve the design of a drug

Visualization uses computational “ball and stick” Visualization uses computational “ball and stick” model of atoms and bonds, as well as surfacesmodel of atoms and bonds, as well as surfaces

Stereoscopic visualization availableStereoscopic visualization available

Page 43: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

““Docking” compounds into Docking” compounds into proteins computationallyproteins computationally

Page 44: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

6. In Vitro & In Silico ADME 6. In Vitro & In Silico ADME modelsmodels

Traditionally, animals were used for pre-human Traditionally, animals were used for pre-human testing. However, animal tests are expensive, time testing. However, animal tests are expensive, time consuming and ethically undesirableconsuming and ethically undesirable

ADME (Absorbtion, Distribution, Metabolism, ADME (Absorbtion, Distribution, Metabolism, Excretion) techniques help model how the drug Excretion) techniques help model how the drug will likely act in the bodywill likely act in the body

These methods can be experemental (These methods can be experemental (in vitroin vitro) ) using cellular tissue, or using cellular tissue, or in silicoin silico, using , using computational modelscomputational models

Page 45: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Size of databasesSize of databases Millions of entries in databasesMillions of entries in databases

– CAS : 23 millionCAS : 23 million– GeneBank : 5 millionGeneBank : 5 million

Total number of drugs worldwide: Total number of drugs worldwide: 60,00060,000

Fewer than 500 characterized Fewer than 500 characterized molecular targetsmolecular targets

Potential targets : 5,000-10,000Potential targets : 5,000-10,000

Page 46: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Protein Structure Protein Structure PredictionPrediction

Experimental TechniquesExperimental Techniques– X-ray Crystallography X-ray Crystallography – NMRNMR

Limitations of Current Experimental Limitations of Current Experimental TechniquesTechniques– Protein DataBank (PDB) -> 24000 protein structuresProtein DataBank (PDB) -> 24000 protein structures– SwissProt -> 100,000 proteinsSwissProt -> 100,000 proteins– Non-Redudant (NR) -> 1,000,000 proteinsNon-Redudant (NR) -> 1,000,000 proteins

Importance of Structure PredictionImportance of Structure Prediction– Fill gap between known sequence and structures Fill gap between known sequence and structures – Protein Engg. To alter function of a proteinProtein Engg. To alter function of a protein– Rational Drug DesignRational Drug Design

Page 47: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Protein StructuresProtein Structures

Page 48: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Techniques of Structure Techniques of Structure PredictionPrediction

Computer simulation based on energy Computer simulation based on energy calculationcalculation– Based on physio-chemical principlesBased on physio-chemical principles– Thermodynamic equilibrium with a minimum free Thermodynamic equilibrium with a minimum free

energyenergy– Global minimum free energy of protein surfaceGlobal minimum free energy of protein surface

Knowledge Based approachesKnowledge Based approaches– Homology Based ApproachHomology Based Approach– Threading Protein SequenceThreading Protein Sequence– Hierarchical MethodsHierarchical Methods

Page 49: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Energy Minimization TechniquesEnergy Minimization TechniquesEnergy Minimization based methods in their pure form, make Energy Minimization based methods in their pure form, make

no priori assumptions and attempt to locate global minma.no priori assumptions and attempt to locate global minma. Static Minimization MethodsStatic Minimization Methods

– Classical many potential-potential can be construtedClassical many potential-potential can be construted– Assume that atoms in protein is in static formAssume that atoms in protein is in static form– Problems(large number of variables & minima and Problems(large number of variables & minima and

validity of potentials)validity of potentials) Dynamical Minimization MethodsDynamical Minimization Methods

– Motions of atoms also consideredMotions of atoms also considered– Monte Carlo simulation (stochastics in nature, time is Monte Carlo simulation (stochastics in nature, time is

not cosider)not cosider)– Molecular Dynamics (time, quantum mechanical, Molecular Dynamics (time, quantum mechanical,

classical equ.)classical equ.) LimitationsLimitations

– large number of degree of freedom,CPU power not large number of degree of freedom,CPU power not adequate adequate

– Interaction potential is not good enough to modelInteraction potential is not good enough to model

Page 50: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Knowledge Based ApproachesKnowledge Based Approaches Homology ModellingHomology Modelling

– Need homologues of known protein Need homologues of known protein structurestructure

– Backbone modellingBackbone modelling– Side chain modelling Side chain modelling – Fail in absence of homologyFail in absence of homology

Threading Based MethodsThreading Based Methods– New way of fold recognitionNew way of fold recognition– Sequence is tried to fit in known structuresSequence is tried to fit in known structures– Motif recognitionMotif recognition– Loop & Side chain modellingLoop & Side chain modelling– Fail in absence of known exampleFail in absence of known example

Page 51: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Hierarcial MethodsHierarcial Methods

Intermidiate structures are predicted, instead of Intermidiate structures are predicted, instead of predicting tertiary structure of protein from amino predicting tertiary structure of protein from amino acids sequenceacids sequence

Prediction of backbone structurePrediction of backbone structure– Secondary structure (helix, sheet,coil)Secondary structure (helix, sheet,coil)– Beta Turn PredictionBeta Turn Prediction– Super-secondary structureSuper-secondary structure

Tertiary structure predictionTertiary structure prediction Limitation Limitation

Accuracy is only 75-80 %Accuracy is only 75-80 %

Only three state predictionOnly three state prediction

Page 52: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Helix formation is localHelix formation is localresidues

iandi+3

THYROID hormone receptor (2nll)

Page 53: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

-sheet formation is NOT -sheet formation is NOT locallocal

Erabutoxin (3ebx)

Page 54: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Definition of Definition of -turn-turnA A -turn is defined by four consecutive residues -turn is defined by four consecutive residues ii, , ii+1, +1, ii+2 and +2 and ii+3 that do not form a helix and have +3 that do not form a helix and have a Ca C((ii)-C)-C((ii+3) distance less than 7+3) distance less than 7Å and the turn Å and the turn lead to reversal in the protein chain. (Richardson, lead to reversal in the protein chain. (Richardson, 1981).1981).

The conformation of The conformation of -turn is defined in terms of -turn is defined in terms of and and of two central residues, of two central residues, ii+1 and +1 and ii+2 and +2 and can be classified into different types on the basis can be classified into different types on the basis of of and and ..

i

i+1 i+2

i+3H-bond

D <7Å

Page 55: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Protein Structure PredictionProtein Structure PredictionWhat we are doing?What we are doing?

APSSP2: Advanced Protein Secondary Structure Prediction -> This server APSSP2: Advanced Protein Secondary Structure Prediction -> This server allow to predict the secondary structure of protein's from their amino acid allow to predict the secondary structure of protein's from their amino acid sequence with high accuracy. It utilize the multiple alignment, neural sequence with high accuracy. It utilize the multiple alignment, neural network and MBR techniques. This server participates in number of world network and MBR techniques. This server participates in number of world wide competition like CASP, CAFASP and EVA. wide competition like CASP, CAFASP and EVA.    

Protein Structural Classes -> It predict weather protein belong to class Protein Structural Classes -> It predict weather protein belong to class Alpha or Beta or Alpha+Beta or Alpha/Beta (Raghava, G.P.S. (1999) J. Alpha or Beta or Alpha+Beta or Alpha/Beta (Raghava, G.P.S. (1999) J. Biosciences 24, 176) Biosciences 24, 176)    

BTeval: Benchmarking of Beta Turn prediction methos on-line via BTeval: Benchmarking of Beta Turn prediction methos on-line via Internet(Kaur, H. and Raghava G.P.S. Bioinformatics 18:1508-14). The user Internet(Kaur, H. and Raghava G.P.S. Bioinformatics 18:1508-14). The user can see the performance of their method or existing methods (Kaur, H. and can see the performance of their method or existing methods (Kaur, H. and Raghava, G.P.S. (2003) Journal of Bioinformatics and Computational Biology Raghava, G.P.S. (2003) Journal of Bioinformatics and Computational Biology 1:495-504 )1:495-504 )

    BetatTPred2: Prediction of Beta Turns in Proteins using Neural Network and BetatTPred2: Prediction of Beta Turns in Proteins using Neural Network and

multiple alignment techniques. This is highly accurate method for beta turn multiple alignment techniques. This is highly accurate method for beta turn prediction (Kaur, H. and Raghava, G.P.S. (2003) Protein Science 12:627). prediction (Kaur, H. and Raghava, G.P.S. (2003) Protein Science 12:627).    

GammaPred: Prediction of Gamma-turns in Proteins using Multiple GammaPred: Prediction of Gamma-turns in Proteins using Multiple Alignment and Secondary Structure Information (Kaur H. and Raghava, Alignment and Secondary Structure Information (Kaur H. and Raghava, G.P.S. (2003) Protein Science; 12:923). G.P.S. (2003) Protein Science; 12:923).    

AlphaPred: Prediction of Alpha-turns in Proteins using Multiple Alignment AlphaPred: Prediction of Alpha-turns in Proteins using Multiple Alignment and Secondary Structure Information (Kaur & Raghava (2004) Proteins and Secondary Structure Information (Kaur & Raghava (2004) Proteins 55:83-90. ( 55:83-90. (    

BetaTPred: A server for predicting Beta Turns in proteins using existing BetaTPred: A server for predicting Beta Turns in proteins using existing statistical methods. This allows consensus prediction from various methods statistical methods. This allows consensus prediction from various methods (Kaur H., and Raghava G.P.S. (2002) Bioinformatics 18:498) (Kaur H., and Raghava G.P.S. (2002) Bioinformatics 18:498)

Page 56: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Protein Structure PredictionProtein Structure PredictionWhat we are doing?What we are doing?

CHpredict: The CHpredict server predict two types of interactions: C-CHpredict: The CHpredict server predict two types of interactions: C-H...O and C-H...PI interactions. For C-H...O interaction, the server H...O and C-H...PI interactions. For C-H...O interaction, the server predicts the residues whose backbone Calpha atoms are involved in predicts the residues whose backbone Calpha atoms are involved in interaction with backbone oxygen atoms and for C-H...PI interactions, it interaction with backbone oxygen atoms and for C-H...PI interactions, it predicts the residues whose backbone Calpha atoms are involved in predicts the residues whose backbone Calpha atoms are involved in interaction with PI ring system of side chain aromatic moieties. interaction with PI ring system of side chain aromatic moieties.    

AR_NHPred: A web server for predicting the aromatic backbone NH AR_NHPred: A web server for predicting the aromatic backbone NH interaction in a given amino acid sequence where the pi ring of interaction in a given amino acid sequence where the pi ring of aromatic residues interact with the backbone NH groups. The method is aromatic residues interact with the backbone NH groups. The method is based on the neural network training on PSI-BLAST generated position based on the neural network training on PSI-BLAST generated position specific matrices and PSIPRED predicted secondary structure (Kaur,H. specific matrices and PSIPRED predicted secondary structure (Kaur,H. and Raghava G.P.S. (2004) Febs Lett. 564:47-57) and Raghava G.P.S. (2004) Febs Lett. 564:47-57)    

TBBpred: Transmembrane Beta Barrel prediction server predicts the TBBpred: Transmembrane Beta Barrel prediction server predicts the transmembrane Beta barrel regions in a given protein sequence. The transmembrane Beta barrel regions in a given protein sequence. The server uses a forked strategy for predicting residues which are in server uses a forked strategy for predicting residues which are in transmembrane beta barrel regions. Prediction can be done based only transmembrane beta barrel regions. Prediction can be done based only on neural networks or based on statistical learning technique - SVM or on neural networks or based on statistical learning technique - SVM or combination of two methods (Natt et al. (2004) Proteins 56: 11-8). combination of two methods (Natt et al. (2004) Proteins 56: 11-8).    

Betaturns: This server allows to predict the beta turns and type in a Betaturns: This server allows to predict the beta turns and type in a protein from their amino acid sequence (Kaur,H. and Raghava G.P.S. protein from their amino acid sequence (Kaur,H. and Raghava G.P.S. (2004)Bioinformatics (In press)) . (2004)Bioinformatics (In press)) .    

PEPstr: The Pepstr server predicts the tertiary structure of small PEPstr: The Pepstr server predicts the tertiary structure of small peptides with sequence length varying between 7 to 25 residues. The peptides with sequence length varying between 7 to 25 residues. The prediction strategy is based on the realization that ?-turn is an prediction strategy is based on the realization that ?-turn is an important and consistent feature of small peptides in addition to regular important and consistent feature of small peptides in addition to regular structures. structures.

Page 57: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Selection of Target and Classification of ProteinsSelection of Target and Classification of ProteinsWhat we are doing?What we are doing?

ESLpred: is a SVM based method for predicting subcellular ESLpred: is a SVM based method for predicting subcellular localization of Eukaryotic proteins using dipeptide composition localization of Eukaryotic proteins using dipeptide composition and PSIBLAST generated pfofile (Bhasin, M. and Raghava, G. P. and PSIBLAST generated pfofile (Bhasin, M. and Raghava, G. P. S., 2004, Nucleic Acid Res. (In Press)). Using this server user S., 2004, Nucleic Acid Res. (In Press)). Using this server user may know the function of their protein based on its location in may know the function of their protein based on its location in cell. cell.    

NRpred: is a SVM based tool for the classification of nuclear NRpred: is a SVM based tool for the classification of nuclear receptors on the basis of amino acid composition or dipeptide receptors on the basis of amino acid composition or dipeptide composition. The overall prediction accuracy of amino acid composition. The overall prediction accuracy of amino acid composition and dipeptide composition based methods is composition and dipeptide composition based methods is 82.6% and 97.2% (Bhasin, M. and Raghava, G. P. S., 2004, 82.6% and 97.2% (Bhasin, M. and Raghava, G. P. S., 2004, Journal of Biological Chemistry (In Press)). Journal of Biological Chemistry (In Press)).    

GPCRpred: is a server for predicting G-protein-coupled GPCRpred: is a server for predicting G-protein-coupled receptors and for classifying them in families and sub-families. receptors and for classifying them in families and sub-families. This server can play vital role in drug design, as GPCR are This server can play vital role in drug design, as GPCR are commonly used as drug targets (Bhasin, M. and Raghava, G. commonly used as drug targets (Bhasin, M. and Raghava, G. P. S., 2004, Nucleic Acid Res. (In Press)) P. S., 2004, Nucleic Acid Res. (In Press))    

GPCRSclass: is a dipeptide composition based method for GPCRSclass: is a dipeptide composition based method for predicting Amine Type of G-protein-coupled receptors. In this predicting Amine Type of G-protein-coupled receptors. In this method type amine is predicted from dipeptide composition of method type amine is predicted from dipeptide composition of proteins using SVMproteins using SVM..

Page 58: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

Important Database of HaptenImportant Database of HaptenWhat we are doing?What we are doing?

Hapten: Hapten: It is a small molecule, not immunogenic by itself, that can It is a small molecule, not immunogenic by itself, that can react with antibodies of appropriate specificity and elicit the formation react with antibodies of appropriate specificity and elicit the formation of such antibodies when conjugated to a larger antigenic molecule of such antibodies when conjugated to a larger antigenic molecule (usually protein called carrier in this context). These hapten molecules (usually protein called carrier in this context). These hapten molecules are of great importance in the production of antibodies of desired are of great importance in the production of antibodies of desired specificity as antibody production involves activation of B lymphocytes specificity as antibody production involves activation of B lymphocytes by the hapten and helper T lymphocytes by the carrier protein. by the hapten and helper T lymphocytes by the carrier protein.

HaptenDB: HaptenDB: It is a collection of haptens, information is collected and It is a collection of haptens, information is collected and

compiled from published literature and web resources. Presently compiled from published literature and web resources. Presently database have more than 1700 entries where each entry provides database have more than 1700 entries where each entry provides comprehensive detail about a hapten molecule that includecomprehensive detail about a hapten molecule that include

URL: http://www.imtech.res.in/ragahva/haptendb/URL: http://www.imtech.res.in/ragahva/haptendb/

Page 59: Introduction to Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre, IMTECH, Chandigarh, India & Visiting Professor, Pohang.

ThanksThanks


Recommended