Data integration: The STITCH database of protein–small molecule interactions

Post on 27-Jun-2015

889 views 4 download

Tags:

description

Chemoinformatics Course, Technical University of Denmark, Lyngby, Denmark, November 19, 2009.

transcript

Data integrationThe STITCH database of protein–small molecule interactions

Lars Juhl Jensen

Kuhn et al., Nucleic Acids Research, 2010

functional associations

protein–small molecule

protein–protein

parts lists

>2.5 million proteins

630 genomes

many databases

different formats

model organism databases

Ensembl

RefSeq

PubChem compounds

>74,000 small molecules

curated knowledge

complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

high confidence

many databases

MIPSMunich Information center

for Protein Sequences

Gene Ontology

KEGGKyoto Encyclopedia of Genes and Genomes

MetaCyc

PIDNCI-Nature Pathway Interaction Database

Reactome

different formats

different identifiers

partially redundant

interaction data

protein–small molecule

in vitro binding assays

protein–protein

yeast two-hybrid

affinity purification

fragment complementation

Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

gene coexpression

many databases

BindingDB

CTDComparative Toxicogenomics Database

DrugBank

GLIDAGPCR-Ligand Database

PDSP KiPsycoactive Drug Screening Program

PharmGKBPharmacogenomics Knowledge Base

BINDBiomolecular Interaction Network Database

BioGRIDGeneral Repository for Interaction Datasets

DIPDatabase of Interacting Proteins

IntAct

MINTMolecular Interactions Database

HPRDHuman Protein Reference Database

PDBProtein Data Bank

GEOGene Expression Omnibus

different formats

different identifiers

partially redundant

literature mining

>10 km

human readable

not computer readable

different names

text corpus

MEDLINE

SGDSaccharomyces Genome Database

The Interactive Fly

OMIMOnline Mendelian Inheritance in Man

dictionary

co-mentioning

NLPNatural Language Processing

restricted access

genomic context

gene fusion

Korbel et al., Nature Biotechnology, 2004

conserved neighborhood

operons

Korbel et al., Nature Biotechnology, 2004

bidirectional promoters

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

integration

many data types

not comparable

variable quality

spread over 630 genomes

quality scores

reproducibility

von Mering et al., Nucleic Acids Research, 2005

intergenic distances

Korbel et al., Nature Biotechnology, 2004

benchmarking

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

raw quality scores

probabilistic scores

orthology transfer

von Mering et al., Nucleic Acids Research, 2005

combine all evidence

Acknowledgments

Michael Kuhn

Monica Campillos

Christian von Mering

Manuel Stark

Samuel Chaffron

Philippe Julien

Tobias Doerks

Jan Korbel

Berend Snel

Martijn Huynen

Peer Bork

Predicting novel targets for existing drugs using side effect information

Lars Juhl Jensen

the problem

new uses for old drugs

drug–drug network

shared target(s)

chemical similarity

Campillos & Kuhn et al., Science, 2008

Campillos & Kuhn et al., Science, 2008

similar drugs share targets

only trivial predictions

the idea

chemical perturbations

phenotypic readouts

drug treatment

side effects

the implementation

information on side effects

package inserts

Campillos & Kuhn et al., Science, 2008

text mining

side-effect ontology

backtracking

Campillos & Kuhn et al., Science, 2008

side-effect correlations

Campillos & Kuhn et al., Science, 2008

GSC weighting

side-effect frequencies

Campillos & Kuhn et al., Science, 2008

raw similarity score

Campillos & Kuhn et al., Science, 2008

p-values

Campillos & Kuhn et al., Science, 2008

side-effect similarity

chemical similarity

Campillos & Kuhn et al., Science, 2008

reference set

drug–target pairs

Campillos & Kuhn et al., Science, 2008

drug–drug pairs

score bins

benchmark

Campillos & Kuhn et al., Science, 2008

fit calibration function

Campillos & Kuhn et al., Science, 2008

probabilistic scores

the results

drug–drug network

ATC codes

Campillos & Kuhn et al., Science, 2008

categorization

Campillos & Kuhn et al., Science, 2008

Campillos & Kuhn et al., Science, 2008

Campillos & Kuhn et al., Science, 2008

map onto score space

Campillos & Kuhn et al., Science, 2008

the experiments

20 drug–drug relations

in vitro binding assays

Campillos & Kuhn et al., Science, 2008

Campillos & Kuhn et al., Science, 2008

Campillos & Kuhn et al., Science, 2008

Ki<10 µM for 11 of 20

cell assays

Campillos & Kuhn et al., Science, 2008

9 of 9 showed activity

the future

SIDER

integration with STITCH

Acknowledgments

Monica Campillos

Michael Kuhn

Anne-Claude Gavin

Peer Bork

larsjuhljensen