+ All Categories
Home > Documents > Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin...

Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin...

Date post: 13-Jan-2016
Category:
Upload: scott-ryan
View: 219 times
Download: 4 times
Share this document with a friend
Popular Tags:
31
Structural Structural Genomics Genomics Peer Bork EMBL & MDC Heidelberg & Berlin [email protected] tp://www.bork.embl-heidelberg.de/
Transcript
Page 1: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

Prioritization of targets for Prioritization of targets for Structural GenomicsStructural Genomics

Peer Bork

EMBL & MDC

Heidelberg & Berlin

[email protected]://www.bork.embl-heidelberg.de/

Page 2: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Prioritising targets forPrioritising targets forStructural GenomicsStructural Genomics

Homology-based coverageHomology-based coverage

Complexes and functional modules Complexes and functional modules

Candidates for complex diseases Candidates for complex diseases

Associating genes to diseases Associating genes to diseases

Page 3: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

time

cove

rage

Intellectual challenge

Page 4: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

Human, <500aa, annotated in sequence databases:Human, <500aa, annotated in sequence databases:

32349

Xray selection protocol (Oct 1999)Xray selection protocol (Oct 1999)

20724

Filter for 98% redundancy, splice forms, fragments:Filter for 98% redundancy, splice forms, fragments:

Match to clones available at German resource center:Match to clones available at German resource center:

ProteinsProteinsFiltersFilters

6016EST match protein in N-terminal region:EST match protein in N-terminal region:

4755

Proteins have no homologue with known 3D (fast check):Proteins have no homologue with known 3D (fast check):

Distinct expression protocolDistinct expression protocol

Page 5: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

1827

1102

ProteinsProteinsFiltersFilters

602602

…….Proteins have no homologue with known 3D (fast check):.Proteins have no homologue with known 3D (fast check):

No transmembrane region or other composition bias:No transmembrane region or other composition bias:

Proteins have no homologue with known 3D (sensitive check):Proteins have no homologue with known 3D (sensitive check):

602602347 255

Functional featuresFunctional features

knownknown unknownunknown

Medical relevance likelyMedical relevance likely71

Distinct NMR protocolDistinct NMR protocol

Xray selection protocol (Oct 1999)Xray selection protocol (Oct 1999)

Page 6: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Criteria for target selection from sequenceCriteria for target selection from sequence

No similar sequence with known fold Everything that crystallizes in a given species Everything from certain pathways/ complexes/

compartments Everything with certain properties (e.g. thermophilic,

kinase-function) ‘All’ disease genes Everything else left over ….

Page 7: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Structural Biology and Structural Biology and BioinformaticsBioinformatics

Target prediction for Target prediction for structural genomicsstructural genomics

Zooming out: Protein interactions Zooming out: Protein interactions

Zooming in: SNPs and 3D structures Zooming in: SNPs and 3D structures

Target prediction for Target prediction for structural genomicsstructural genomics

Zooming out: Protein interactionsZooming out: Protein interactions

Page 8: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Rich Copley

Berend Snel

+Martijn Huynen

Interaction Interaction predictionprediction

Page 9: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Function prediction via Function prediction via genomic context informationgenomic context information

Gene context:Gene context:

- Pathway data (can overrule homology!)- Gene expression data (co-expression etc.)- Protein interaction /localisation - Scientific literature

- Gene fusion as distinct neighborhood subset - Conserved gene neighborhood in genomes - Conserved co-occurrence of genes in species (‘phylogentic profile’, ‘COG pattern’)- Surrounding and shared regulatory elements

Knowledge-based context:Knowledge-based context:

Page 10: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Context methods in Mycoplasma: Context methods in Mycoplasma: Fusion, neighborhood, co-occurrenceFusion, neighborhood, co-occurrence

MG total:MG total:480 genes480 genes

Presence in conserved Presence in conserved operons: 213operons: 213

Conserved Conserved neighborhoodneighborhood

27

54

FusionFusion

Co-occurrenceCo-occurrencein genomesin genomes

178

Page 11: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

STRING server for context retrievalSTRING server for context retrieval

Tryptophan Tryptophan biosynthesisbiosynthesis

ww

w.bork.em

bl-heidelberg.de/STRIN

Gw

ww

.bork.embl-heidelberg.de/STR

INGw

ww

.bor

k.em

bl-h

eide

lber

g.de

/STR

ING

ww

w.b

ork.

embl

-hei

delb

erg.

de/S

TRIN

G

Page 12: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Gene neighborhood reflects connections between Gene neighborhood reflects connections between Tryptophan and Shikimate biosynthesisTryptophan and Shikimate biosynthesis

Page 13: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

hemK

tyrA

aroB

aroEaroC

asdtruA

hyp

hyp

2c-rr

trpF

trpC

trpAtrpB

trpDtrpG

trpE

Modularity in “genomic association space” Modularity in “genomic association space”

Tryptophan synthesis pathway

Shikimate pathway

Networks based on conserved gene neighborhood reveal ‘natural’ subsystems

Page 14: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Applications of interaction predictionsApplications of interaction predictions

3885 interactions (involving 1995 genes) predicted based on genomic context, 27% overlap, complementary

Goal: Functional characterization of all multiprotein assemblies as fast as possible (2001), move to human

The Cellzome* yeast factory

Methods: TAP tagging/co-purification + mass-spec

Results based on ca 1400 human orthologues: 1700 genes in 230 complexes, ca 130 of them novel

*proteomics company founded at EMBL in June 2000, curr. >90 employeesData provided by

Page 15: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Predicting candidate genes for Predicting candidate genes for genetically inherited diseasesgenetically inherited diseases

Association of genes to diseases Association of genes to diseases

Analysis of non-synonymous SNPs Analysis of non-synonymous SNPs

Association of genes to diseases Association of genes to diseases

Analysis of non-synonymous SNPs Analysis of non-synonymous SNPs

Page 16: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.deShamil Sunyaev

Page 17: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

Growth of known 3D-structures

0

5000

10000

15000

1995 1996 1997 1998 1999 2000

Years

Nu

mb

er o

f P

DB

en

trie

s

Growth of the number of complete genomes

010203040

1995 1996 1997 1998 1999 2000

Years

Nu

mb

er o

f g

eno

mes

Growth of SNP data

0,00E+005,00E+051,00E+061,50E+062,00E+062,50E+06

1998

(3)

1998

(4)

1999

(1)

1999

(2)

1999

(3)

1999

(4)

2000

(1)

2000

(2)

2000

(3)

2000

(4)

Quarters of a year

Nu

mb

er o

f S

NP

su

bm

issi

on

sSNP data have SNP data have currently fastest currently fastest growth rategrowth rate

Integration with other data is the key to more understanding

Page 18: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

SNPs and mutationsSNPs and mutations90% of human genetic variation due to single 90% of human genetic variation due to single

nucleotide polymorphism (SNP)nucleotide polymorphism (SNP)••mapping toolmapping tool••association with complex phenotypesassociation with complex phenotypes (multifactorial diseases/ drug responses etc.)(multifactorial diseases/ drug responses etc.)••human evolutionhuman evolution

cSNPcSNP - SNP in coding region - SNP in coding regionnonsynonymousnonsynonymous SNP SNP - affects amino acid sequence- affects amino acid sequence

SNPSNP - allele frequency >1% - allele frequency >1%Disease mutationDisease mutation - usually allele frequency <<1% - usually allele frequency <<1%

Page 19: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

ESTs reveal SNPs and alternative splice sites...ESTs reveal SNPs and alternative splice sites...

mRNmRNAA

3’ UTR3’ UTRcodingcoding5’ UTR5’ UTR

EST1EST1

AA

AATTTT

CC

AA SNP predictionSNP prediction

Prediction of Prediction of alternative splicingalternative splicing

EST2EST2EST3EST3EST4EST4

EST5EST5EST6EST6

……but also lots of errors!!!but also lots of errors!!!

(>700 libraries!)(>700 libraries!)

(many different tissues (many different tissues and age groups!)and age groups!)

Page 20: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Mapping SNPs onto 3D: Mapping SNPs onto 3D: Identifying those that damage proteinsIdentifying those that damage proteins

Rules taken from protein engineeringand multiple sequence analysis

Page 21: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Selected polymorphic sites mapped onto 3DSelected polymorphic sites mapped onto 3D

High ( 5%)

Minor allelefrequency:

Low (<5%)

Page 22: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

Selection of Mutations for 3D mappingSelection of Mutations for 3D mapping

SWISSPROT

Data sourcesData sources

OMIM

HGBASE

Chakravati WEB

HSSP

FilterFilterKeywords: ‘3D STRUCTUREand ‘DISEASE MUTATION’

Resulting 3 setsResulting 3 sets

Keywords: ‘3D STRUCTURE’and ‘POLYMORPHISM’ but not‘DISEASE MUTATION’

Allelic variants with frequency >1% in a pool of ‘normal’ individuals

Blastx search against PDB

Check all proteins identified in the resulting sets above for close homologues in other species (>90% identity) and take mutations

11. 551disease mutations. 551disease mutations ((badiesbadies))

22. 86 allelic variants. 86 allelic variants (‘(‘don’t know’don’t know’))

33. 225 and 261 neutral . 225 and 261 neutral mutations between species mutations between species ((goodiesgoodies) in proteins of set ) in proteins of set 11 and and 22, respectively, respectively

Page 23: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

How many sites are in structurally and How many sites are in structurally and functionally “important” regions?functionally “important” regions?

Disease mutation sites (badies) 90%

Polymorphic sites (don’t know) 29%

Interspecies mutations (goodies) 8%

Hence: Predicting phenotypic effects of cSNPs!Hence: Predicting phenotypic effects of cSNPs!

‘important’=surface accessibility <10%, active site, S-S bond

Sunyaev/Ramensky/Bork, Trends Genet. 16(00)191Sunyaev/Ramensky/Koch/Lathe/Bork, unpubl.

Page 24: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

PredictionPrediction of risk factors of risk factors

GeneGene Disease riskDisease risk Frequency Frequency Mutation effectMutation effect

HFEHFE hemochromatosis hemochromatosis 6% 6% destroyed SS-bonddestroyed SS-bondFructose-Fructose- fructose intolerance fructose intolerance >1% >1% destroyed coredestroyed corebiphosphate biphosphate aldolasealdolaseNAD(P)H NAD(P)H benzene toxicity benzene toxicity 4-20% 4-20% Unfavorable Unfavorabledehydrogenase dehydrogenase (post-chemotherapy (post-chemotherapy substitution substitution

leukemia)leukemia)-1-anti--1-anti- familial obstructive familial obstructive >1% >1% destroyed core destroyed corechymotrypsin chymotrypsin lung disease lung disease-1-antitrypsin-1-antitrypsin emphysema emphysema 2-4% 2-4% destroyed core destroyed core

Of 36 SNPs with predicted phenotypic effects Of 36 SNPs with predicted phenotypic effects (from a well-characterized SNPs pool), 5 are (from a well-characterized SNPs pool), 5 are already known to be disease-associated:already known to be disease-associated:

Page 25: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Structural Biology and Structural Biology and BioinformaticsBioinformatics

Zooming out: Protein interactions Zooming out: Protein interactions

Zooming in: SNPs and 3D structures Zooming in: SNPs and 3D structures

Target prediction for Target prediction for structural genomicsstructural genomics

Zooming out: Protein interactionsZooming out: Protein interactions

Zooming in: SNPs and 3D structures Zooming in: SNPs and 3D structures

Page 26: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

www.bork.embl-heidelberg.de

Credits g2D

www.bork.embl-heidelberg.dewww.bork.embl-heidelberg.de

Carolina Perez Miguel Andrade

Page 27: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

MEDLINEMEDLINE

MeSH C MeSH D

article

phenotype chemistry

RefSeqRefSeq

Gene Ontology

article

Gene biochemistry

gene10 725 796 articles

6 023 924 pairs 98 969 pairs

10 329 sequences

6 992 terms 5 070 terms 2 379 terms

Literature mining for associatingLiterature mining for associatinggenotypes to phenotypesgenotypes to phenotypes

Page 28: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

Phenotype C MeSH

Acidosis, Renal Tubular

Acidosis

Hypokalemia

Nephrocalcinosis

Sjogren’s Syndrome

Alkalosis

Kidney Diseases

Kidney Failure, Chronic

Nephritis, Interstitial

Fanconi Syndrome

GO Gene Ontology

Carbonate dehydratase

Hydrogen-transporting ATP syntase

Hydrogen/potassieum-exchanging ATPase

Hydrogen-transporting two-sector ATPase

Proton transport

Vacuolar hydrogen-transporting ATPase (synonim: VATPase)

Pyruvate carboxylase

Aminobutyrate catabolism

Succinate-semialdehyde dehydrogenase

D MeSH

MEDLINE RefSeq

LocusLink

Golden Path

II

I

7q33-q34

Page 29: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

etc...

Association to Craniofrontonasal dysplasia

•Receptors, Fibroblast Growth Factor

Craniosynostoses [15]

Craniofacial Dysostosis [7]

Mental Retardation [4]

•0.0130 fibroblast growth factor receptor (function) MeSH

C

MeSH D

GO

Hypertelorism [6]

Bone Diseases, Developmental [3]

0.0905

0.0526

0.0014

0.0112

0.2500

•0.0241 FGF receptor signaling pathway (process)

0.4615

0.0285

•0.0061 MAPKKK cascade (process)

0.1176

•Fibroblast Growth Factor

0.0092

0.0058

0.0010

0.0588

•DNA probes0.0109

•0.0001 integral plasma membrane protein (component)

0.0153

•Chondroitin

0.0075 0.0046

•0.0011 skeletal development (process)

•Keratan sulfate0.0052

0.0119•Collagen

0.0546

0.0434

0.0215

•Bone morphogenetic proteins

0.0032

0.0017 0.0531

•0.0000 signal transduction (process)

0.0092

0.0322

symptoms,manifestations

chemicals,proteins, drugs

functions

Page 30: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

0.0123 NP_002002 fibroblast growth factor receptor 4, isoform 1 precursor - Human0.0130 fibroblast growth factor receptor (function) 0.0241 FGF receptor signaling pathway (process) 0.0000 integral plasma membrane protein (component)

0.0083 NP_006644 suc1-associated neurotrophic factor target 2 - Human0.0000 signal transduction (process) 0.0241 FGF receptor signaling pathway (process)0.0009 peripheral plasma membrane protein (component)

0.0075 NP_000595 fibroblast growth factor receptor 1, isoform 1 precursor - Human0.0130 fibroblast growth factor receptor (function) 0.0061 MAPKKK cascade (process) 0.0011 skeletal development (process) 0.0007 oncogenesis (process) 0.0241 FGF receptor signaling pathway (process) 0.0000 integral plasma membrane protein (component)

0.0026 NP_034336 fibroblast growth factor receptor 1 - Mouse0.0000 ATP binding (function)0.0000 membrane fraction (component) 0.0000 signal transduction (process) 0.0000 protein tyrosine kinase (function) 0.0130 fibroblast growth factor receptor (function)

band Xp22

chromosome X

RefSeq

...

From homology to disease associationFrom homology to disease association

GO-scores

Page 31: Prioritization of targets for Structural Genomics Peer Bork EMBL & MDC Heidelberg & Berlin bork@embl-heidelberg.de

0

10

20

30

40

50

60

70

80

1.E-04 1.E-03 1.E-02 1.E-01 1.E+00

Log R-score

Benchmark of 100 disease genesBenchmark of 100 disease genesR

ank

of tr

ue g

ene

Score correlates with prediction accuracyScore correlates with prediction accuracy

bench 10

bench 100

not annotated


Recommended