Post on 28-Jan-2016
description
transcript
http://creativecommons.org/licenses/by-sa/2.0/
Integrating the Data
Prof:Rui Alvesralves@cmb.udl.es
973702406Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08Website of the Course:http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/
Course: http://10.100.14.36/Student_Server/
Outline
• Methods for reconstruction of functional protein networks– Why is it important?
• Methods for reconstruction of physical protein interactions
Proteins do not work alone!
Methods for network reconstruction
• Using text analysis
Publication databases are source of information
Meta text databases create network models from publication analysis
iHOP is a sofisticated context analysis motor
How does meta-text analysis create networks?
Literature database
Gene names
database
Language rules
database
scripts
Entry
Gene list Rule list
Server/
Program
Your genes
List of entries mentioning your gene
e.g Ste20e.g activate,
inhibit rescue
Methods for network reconstruction
• Meta text analysis
• Evolutionary based protein interaction prediction
Proteins that have coevolved share a function
• If protein A has co-evolved with protein B, they are likely to be involved in the same process
• Looking for proteins that coevolved will help prediction social networks of proteins
• There are many methods to look for co-evolution of proteins– Phylogenetic profiling, gene neighbourhoods,
gene fusion events, phylogenetic trees…
Using phylogenetic profiles to predict protein interactions
Your Sequence (A) Server/
Program
Database of profiles for each protein in each organism
Database of proteins in fully sequenced genomes
Protein id A
Target Genome
Homologue in Genome 1?
Homologue in Genome 2?
…
A
B
C
…
Y
N
Y
…
N
Y
N
…
…
…
…
…
A B
00i/number of genomes<1C
1j/number of genomes
A 1
C 0.9
… …
B 0.11
… …
Proteins (A and C) that are present and absent in the same set of genomes are likely to be involved in the same process and therefore interact
Similarly, if protein A is absent in all genomes in which protein B is present there is a likelihood that they perform the same function! 2
Calculate coincidence index
Syntheny/Conservation of gene neighborhoods
Genome 1
Genome 2
Genome 3
Genome …
Protein A Protein B Protein C Protein D
Protein A Protein BProtein C Protein D
Protein AProtein B Protein CProtein D
…
Protein A Protein B Protein C Protein D
Which of these proteins interact?
Proteins A and B are in a conserved relative position in most genomes which is an
indication that they are likely to interact
Gene fusion events
Genome 1
Genome 2
Genome 3
Genome …
Protein A Protein DProtein C Protein B
Protein A Protein B Protein C Protein D
Protein AProtein B Protein CProtein D
…
Protein A Protein B Protein C Protein D
Which of these proteins interact?Proteins A and B have suffered gene fusion
events in at least some genomes, which is an indication that they are likely to interact
Building phylogenetic trees of proteins
Genome 1
Genome 2
Genome 3
Genome …
Protein A Protein B Protein C Protein D
Protein A Protein BProtein C Protein D
Protein AProtein B Protein CProtein D
…Get sequence of all homogues, align and
build a phylogenetic tree
Phylogenetic trees represent the evolutionary history of homologue
genes/proteins based on their sequence
Similarity of phylogenetic trees indicates interaction between proteins
A1
B2
C1 D1
A2
A3
… …
…
B1
B3
C2
C3
…
D3
D2Proteins A and B have similar evolutionary trees and thus are likely to interact
Methods for network reconstruction
• Using meta text analysis
• Using phylogenetic profiling
• Using omics data
Predicting gene functional interactions using micro array
datacells
cells
Stimulum
Purify cDNA
Purify cDNA Compare cDNA levels of
corresponding genes in the different
populations
Genes overexpressed
as a result of stimulus
Genes underexpressed
as a result of stimulus
Genes with expression
independent of stimulus
Group of genes/proteins
involved in response to the stimulus
Group of proteins involved in response
to the stimulus
Predicting protein functional interactions using mass spec data
cells
cells
Stimulum
Purify proteins
Purify proteins Identify Proteins and compare Protein
profiles/levels in the different populations
Proteins present
as a result of stimulus
Proteins absent
as a result of stimulus
Proteins Present
in both conditions
Predicting regulatory modules with CHIP-ChIp experiments
cells
Crosslink
Protein/DNA Break DNA
Break DNA
Reverse cross link & Purify DNA Pieces
Afinity Purification of Transcription factor
Reverse cross link & Purify DNA Pieces bound to TF
Compare in Microarray
Predicting protein activity modulation with NMR/IR/MS
Metabolomics
cells
Stimulus
Measuring Metabolitescells
Measuring Metabolites
Compare changes in metabolic levels to infer changes in protein activity
Methods for network reconstruction
• Using meta text analysis
• Using phylogenetic profiling
• Using protein docking
• Using omics data
• Using protein interaction data
Predicting protein networks using protein interaction data
Database of protein
interactions
Server/
Program
Your Sequence (A)
A
BC
D E
FContinue until you are satisfied
or completed the network
Outline
• Methods for reconstruction of functional protein networks
• Methods for reconstruction of protein interactions
How do proteins work within the network?
• Assume we now have the network our protein is involved in.
• How do we further analyze the role of the protein?
Proteins work by binding
EffectDNA
Proteins work by binding!
So what?
So, if we can predict how proteins DOCK to their ligands, then we will be able to understand how the binding allows them to work systemically
Design drugs to overcome mutations in binding sites
Design proteins to prevent/enhance other interactions
What is in silico protein docking?
• Given two molecules find their correct association using a computer:
+
=
Recep
tor Ligand
T
Complex
What types of in silico docking exist?
• Sequence Based Docking:
In silico two hybrid docking
E. coli
S. typhi
…
Y. pestis
AGGMEYW….
AA – CDWY…
…
AGG –DYW
Protein AE. coli
S. typhi
…
Y. pestis
VCHPRIIE….
VCH -KIIE…
…
VCH –KIIE…
Protein B
V C H P K I I E…
A
G
G
…
D
…
D/K or E/R may be involved in a salt bridge
Pearson Correlation
What types of in silico docking exist?
• Sequence Based Docking
• In silico structural protein docking
Structure based docking
• Protein-Protein docking
– Rigid (usually)
• Protein-Ligand docking
– Rigid protein, flexible ligand
Very demanding on computational resources
Structural docking in a nutshell
• Scan molecular surfaces of protein for best surface fit– First steric, then energetics – Can (and should) include biologically relevant
information (e.g. residue X is known from mutation experiments to be involved in the docking → discard any docking not involving this residue)
Atom based docking
• First, a surface representation is needed
Van der Waals Surface
Accessible (Connolly)Surface
Solvent accessibleSurface
Calculating the best docking
• Scan molecular surfaces of protein for best surface fit– Calculate the position where a largest number of
atoms fits together, factor in energy + biology and rank solutions according to that
Grid-based techniques
•Grid-based Techniques
–Alternative to calculating protein atom / ligand atom interactions. more efficient (number of grid points < number of atoms)
Grid based docking
Score 1
Score 2Score 3
Score 4
Place grid over protein
Calculate inter-molecular forces for each grid point
The docking function
• There are many and none is the best for all cases
•Scores will depend on the exact docking function you use
A docking function for surface matching
•Molecules a, b placed on l × m × n grid
•Match surfaces
•Fourier transform makes calculation faster
moleculetheofsurfacetheon
moleculetheinside
moleculetheoutside
ba nml
1
0
, ,,
', ', ' 1, 2, 3 ', ', ' 1, 2, 3' 1 ' 1 ' 1
,N N N
l m n l step m step n step l m n l step m step n stepl m n
Ca b a b
•Tabulate and rank all possible conformations
A docking function for electrostatics
• There are many
•they use different force field approximations to calculate energy of electrostatic interactions.
•The basics:
dVrrrrE bbaabbaaticelectrosta
Charge distributions for proteins
Potential for proteins
The full docking function
• Calculates a relative binding energy that integrates electrostatic and shape matching factors. For example:
tot Electrostatic Electrostatic shape matching shapematchingE c E c E
Overall process of docking
Overall process of docking
1, 2,,,
( , )i jp p
i j
Energy Form Matching Electrostatics
Mol 1 Mol 2
Rigid Body energy calculation
List of Complexes
Re-rank using statistics of residue contact, H/bond, biological information, etc
Re-rank using rotamers, flexibility in protein backbone angles, Molecular dynamics, etc.
Final list of solutions
Summary
• Methods for reconstruction of functional protein networks– Bibliomics
– Genomics
– Phenomics, etc
• Methods for reconstruction of protein interactions– Sequence based
– Structure based
Grid-based techniques
• Grid-based Techniques
– Notes:
• Grids spaced <1 Å
– Results show very little change in error for grids spacing between .25 and 1 Å
Problem Importance
• Computer aided drug design – a new drug should fit the active site of a specific receptor.
• Many reactions in the cell occur through interactions between the molecules.
• No efficient techniques for crystallizing large complexes and finding their structure.