An overview of
Bioinformatics
Cell and Central Dogma
Source: “Post-genome Informatics” by M Kanehisa
Source: “Post-genome Informatics” by M Kanehisa
Deduction and Analogy
Biological System(Organism)
Reductionistic SyntheticApproach Approach
(Experiments) (Bioinformatics)
Building Blocks(Genes/Molecules)
Source: “Post-genome Informatics” by M Kanehisa
Principles Known
Physics Chemistry Biology
Matter Compound Organism
Elementary Elements GenesParticles
Yes Yes No
Source: “Post-genome Informatics” by M Kanehisa
Searching and learning problems in biologyMethods in Informatics
Pairwise sequence alignment Optimization algorithms
Database search for similar sequences -Dynamic programmingMultiple sequence alignment -Simulated annealing
Phylogenetic tree reconstruction -Genetic alogrithmProtein 3D structure alignment -Hopfield neural network
RNA secondary structure prediction -Gibbs samplingRNA 3D structure prediction -Monte Carlo
Protein 3D sturcture prediction
Motif extraction Pattern recognition and learning algorithmFunctional site prediction -Discriminant analysisCellular localization prediction -Heirarchical neural networkCoding region prediction -Hidden Markov Model
Transmembrane segment prediction -Formal Grammar
Protein secondary structure prediction
Protein 3D sturcture predictionSuperfamily classification Clustering algorithm
Ortholog/paralog grouping of genes -Heirarchical cluster analysis
3D fold classification -Kohonen neural network
Gene Expression Clustering -Self Organization Map3D fold classification -Kohonen neural networkNetwork comparison -Graph theoryPathway construction -Network theoryDynamic analysis of network -Control theoryControl and design of system -System theory
Interaction and Pathway
Problems in Biology
Similarity search
Molecular classificatoin
Structure/function
prediction
ab initioprediction
Knowledgebased
prediction
Source: “Post-genome Informatics” by M Kanehisa
Sequence Comparison: Sequence Comparison: Algorithms and ApproachesAlgorithms and Approaches
Homology Search
New sequence
Similar sequences
Expert knowledge
Sequence interpretation
Sequence database(Primary data)
retrieval
Source: “Post-genome Informatics” by M Kanehisa
Pairwise sequence alignment by dynamic programming
Needleman Wunsch alogrithmSource: “Post-genome Informatics” by M Kanehisa
Database Search
for Similar Sequencesfor Similar Sequences
Web LabWeb Lab
MotifMotif
Source: “Introduction to Protein Structure” by Branden & Tooze
Web LabWeb Lab
Motif Search
New sequence Expert knowledge
Sequence interpretation
Sequence database(Primary data)
Motif library(Empirical rules)
inference
Source: “Post-genome Informatics” by M Kanehisa
Introduction toIntroduction to
Structural BiologyStructural Biology
Source: “Introduction to Protein Structure” by Branden & Tooze
Source: “Introduction to Protein Structure” by Branden & Tooze
Web LabWeb Lab
Web LabWeb Lab
Web LabWeb Lab
Genome ProjectGenome Project
Web LabWeb Lab
Genome SequencingGenome Sequencing
and and
Genome AnnotationGenome Annotation
A general model of the structure of genomic sequences
Source
: “Bio
info
rmatics” b
y D
W M
ount
MicroarrayMicroarray
Joe Sutliff for Science 291 p1224 (2001)
What kind of solution Genomics can provide with ? High Throughput Gene Discovery
165 genes are up-regulated in 75% tumors(MAPK pathway, APC, promotion of mitosis; 69 unknown)
170 genes are down-regulated in 65% tumors (hepatocyte-specific gene products, retinoid metabolism; 75 unknown)
Hierarchical ClusteringK-meansSelf Organization MapSupport Vector Single Value Decomposition
Gene Expression Gene Expression
andand
TranscriptomeTranscriptome
Web LabWeb Lab
Proteomics Proteomics
and and
Functional GenomicsFunctional Genomics
Source: “Post-genome Informatics” by M Kanehisa
Web LabWeb Lab
Integrative GenomicsIntegrative Genomics
Network of physical interactions between nuclear proteins
Attributes of generic network structures
Virtual Cell
Living Cell
PerturbationEnvironmental changeGene disruptionGene overexpression
Dynamic ResponseChanges in:Gene expression profiles,Etc.
BiologicalKnowledgeMolecular and CellularBiology,Biochemistry,Genetics, etc
Basic PrinciplesPractical Applications
Complete Genome Sequences
Source: “Post-genome Informatics” by M Kanehisa
Take Home Message
Define the biological problem.
Why is bioinformatics important ?A synthesis approach.
Prediction is a dangerous game. Always try your best to validate in the bench side.
The devil is in the detail. Always try different bioinformatic tools and databases.
Your knowledge rests on your own practice.
Reference Books you will find useful:
Bioinformatics -sequence and genome analysis by D W Mount
Introduction to Bioinformatics by A M Lesk
Post-genome Informatics by M Kanehisa
Evolution of molecular biology databases
Database category Data content Examples
1. Literature database Bibliographic citations MEDLINE(1971)On-line journals
2. Factual Database Nucleic acid sequences GenBank(1982)Amino acid sequences EMBL(1982)3D molecular structures DDBJ(1984)
SWISS_PROT(1986)PDB(1971)
3. Knowledge base Motif libraries PROSITE(1988)Molecular classification SCOP(1994)Biochemical pathways KEGG(1995)