+ All Categories
Home > Documents > Seeing the Trees through the

Seeing the Trees through the

Date post: 21-Dec-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
23
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E RF-PPI interface prediction 20 May 2017 Seeing the Trees through the Forest: Sequence-based Homo- and Heteromeric Protein-protein Interaction sites prediction using Random Forest Qingzhen Hou, Paul de Geest, Wim Vranken, Jaap Heringa and K. Anton Feenstra CBSB 2017 – Cincinnati A B
Transcript

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

RF-PPI interface prediction20 May 2017

Seeing the Trees through the Forest: Sequence-based Homo- and Heteromeric Protein-protein Interaction sites prediction using Random Forest

Qingzhen Hou, Paul de Geest, Wim Vranken, Jaap Heringa and K. Anton Feenstra

CBSB 2017 – CincinnatiAB

[2] 20 May 2017 RF-PPI interface prediction[2] 20 May 2017 RF-PPI interface prediction[2] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Protein FunctionÞ understanding interactions, e.g

• SNP/SNV calling

• New viruses• every week helps!

Wilhelm et al. 2014 “... vesicle trafficking proteins.” Science 344:1023-1028 doi: 10.1126/science.1252884.

[3] 20 May 2017 RF-PPI interface prediction[3] 20 May 2017 RF-PPI interface prediction[3] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Levels of Protein Interaction

• Influence (like gene-gene interactions)• cascade

• mutual dependence

BA

C

B

A

C

B

A

• Direct/physical interaction• Contact

• Heteromeric (different proteins)

• Homomeric (same protein)

• Interface sitesA

A

B

A

[4] 12 may 2017 seminar (IB)2 Brussel[4] 12 may 2017 seminar (IB)2 Brussel[4] 12 may 2017 seminar (IB)2 Brussel

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Types of methods to calculate protein-protein interactions (PPIs)

• Sequence-based, e.g. Mirror tree• Fast

no information on interaction strength

• Protein-protein docking• Slow

no quantitative interaction strength

• Molecular dynamics simulations• Even slower

Quantitative calculation of interaction strength

AB

Kno

wle

dge-

base

dF

irst

Prin

cipl

es

(e.g. Pazos & Valencia 2001 Prot Eng 14:609)

A

B

[5] 20 May 2017 RF-PPI interface prediction[5] 20 May 2017 RF-PPI interface prediction[5] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Obtaining PPI interface from sequence data:

Juan, Pazos & Valencia Nat Rev Genet 2013

DCA (Direct Coupling Analysis):• Huge multiple testing problem

(all vs. all residues)• Predicts for protein family

(many sequences needed)

• Our approach is different: directly from one sequence

• May use predicted interface as filter for DCA method (future work)

[6] 20 May 2017 RF-PPI interface prediction[6] 20 May 2017 RF-PPI interface prediction[6] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Ingredients for a good classifier:

• Distinguish ‘exposed’ from ‘binding’ surface• ‘Buried’ is easy

• Features:• Conservation

• Solvent Accessibility –

• Secondary Structure – NetSurfP; Petersen et al. BMC Struct Biol 2009

• Backbone dynamics – Dynamine; Cilia et al. Nat Commun 2013 & N.A.R. 2014

• Protein length

• Dataset(s)• Homodimers –

• from dimers in the PDB (Hou et al. BMC Bioinf 2015)

• large set (1593) of high confidence

• Heteromers – • Murakami & Mizuguchi, Bioinf 2010

• high confidence, but smaller set (258)

BA

A

A

A

[7] 20 May 2017 RF-PPI interface prediction[7] 20 May 2017 RF-PPI interface prediction[7] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Encoding evolutionary information

• Homologs

• Alignment

• Each feature is calculated for each sequence• Feature value for Query sequence

• Also the average (‘typical’) and std.dev (variability) of features over the homologs in the alignment

PSI-BLAST(lenient:e<0.001)

Muscle(becauseit is fast)

[8] 20 May 2017 RF-PPI interface prediction[8] 20 May 2017 RF-PPI interface prediction[8] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Feature AUC ROC

Entropy (En) 0.480 ± 0.009

Dynamics (DM) 0.506 ± 0.006

En+len+win 0.536

DM+len+win 0.578 ± 0.008

En+DM+len+win 0.616 ± 0.015

Solvent Acc. (ASA) 0.587

En+DM+l+ASA+SS 0.666 ± 0.008

Which features work?

What next?

Hou, et al. Bioinformatics 2017 https://doi.org/10.1093/bioinformatics/btx005

**

En+DM+l+ASA 0.636*

*

En+DM+len 0.558*

*

[9] 20 May 2017 RF-PPI interface prediction[9] 20 May 2017 RF-PPI interface prediction[9] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

How to improve further

Features Training Test AUC ROC

All HM_479 train HM_479 val 0.666 ± 0.008

All + window HM_479 train HM_479 val 0.710 ± 0.011

All + window HM_479 train (balanced) HM_479 val 0.728 ± 0.008

All + window HM_479 train (balanced) HM_479 test 0.720 ± 0.007

Feat/Train/Test Accuracy Sensitivity Precision Specificity F1

All/train/val 0.790 0.025 0.487 0.992 0.047

All+W/train/val 0.795 0.016 0.896 0.999 0.032

All+W/bal/val 0.688 0.614 0.355 0.707 0.450

All+W/bal/test 0.695 0.581 0.373 0.722 0.454

Hou, et al. Bioinformatics 2017 https://doi.org/10.1093/bioinformatics/btx005

*

*

[10] 20 May 2017 RF-PPI interface prediction[10] 20 May 2017 RF-PPI interface prediction[10] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

How good is this?

But: this is the homodimer test-set; The other methods are trained on heteromeric interactions

We let them play ‘our’ game… (of course we are better at it)

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0A

Tru

e p

osit

ive

rate

False positive rate

RF_homo (0.720) SPPIDER (0.601) PSIVER (0.546)

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0B

Pre

cisi

on

Recall

RF_homo (0.436) SPPIDER (0.314) PSIVER (0.255)

* default threshold

Hou, et al. Bioinformatics 2017 https://doi.org/10.1093/bioinformatics/btx005

the onlytwo really

sequence-onlymethods (!)

[11] 20 May 2017 RF-PPI interface prediction[11] 20 May 2017 RF-PPI interface prediction[11] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

What if we play ‘their’ game – heteromers!

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0A

RF_homo (0.619) PSIVER (0.613) RF_hetero (0.652)

Tru

e p

osit

ive

rate

False positive rate

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35 B RF_homo (0.137) PSIVER (0.128) RF_hetero (0.162)

Pre

cisi

on

Recall

(so, we can play their game as well)But, since we’re now playing one game or the other,Can we play both?

* default threshold

Hou, et al. Bioinformatics 2017 https://doi.org/10.1093/bioinformatics/btx005

[12] 20 May 2017 RF-PPI interface prediction[12] 20 May 2017 RF-PPI interface prediction[12] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Predicting both Homo and Hetero interactions

Predictor Training Test Recall Precision

Specificity

MCC F1 AUC ROC

RF_homo HM_479 train HM_479 test 0.581 0.373 0.722 0.265 0.454 0.720

RF_hetero Dset_119 HM_479 test 0.343 0.263 0.727 0.064 0.297 0.552

RF_combined HM_479 train+Dset_119 HM_479 test 0.581 0.383 0.734 0.277 0.462 0.724

PSIVER Dset_186 HM_479 test 0.315 0.262 0.743 0.054 0.286 0.546

SPPIDER homo+hetero HM_479 test 0.073 0.361 0.958 0.062 0.121 0.601

Hou, et al. Bioinformatics 2017 https://doi.org/10.1093/bioinformatics/btx005

RF_homo HM_479 train Dset_48 0.446 0.140 0.716 0.103 0.213 0.619

RF_hetero Dset_119 Dset_48 0.547 0.146 0.667 0.131 0.230 0.652

RF_combined HM_479 train+Dset_119 Dset_48 0.500 0.146 0.695 0.122 0.226 0.636

PSIVER Dset_186 Dset_48 0.668 0.119 0.493 0.094 0.203 0.614

[13] 20 May 2017 RF-PPI interface prediction[13] 20 May 2017 RF-PPI interface prediction[13] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Complementary value

RF_heteroPSIVER

heteroDset48

3684

17551922

420

361224

187

True IF(PDB)

[14] 20 May 2017 RF-PPI interface prediction[14] 20 May 2017 RF-PPI interface prediction[14] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Why does it work? – Feature importance

leng

thdy

nam

ics

(2x)

Solvent accessibility

dyna

mic

sco

nser

vatio

nSecondary structure

α-helix – β-sheet – coil

[15] 20 May 2017 RF-PPI interface prediction[15] 20 May 2017 RF-PPI interface prediction[15] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

AB

Main points

• Prediction of protein interface from sequence• Including ‘evolutionary neighborhood’ (PSI-Blast)

• But to predict SS and ASA, we need to get profiles using Blast again – up to 500 times – this can be slow…

• We have a webserver: www.ibi.vu.nl/programs/serendipwww/ (but please be gentle – and a little bit patient ;-)

• Prediction performance stable, for homodimer as well as heteromeric interactions• As far as we know, this hasn’t been done before

• Better than other (sequence only) predictors.• Of course, if you can get structure information,

you’d be silly not to use that – there are many methods to use in that case

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35 B RF_homo (0.137) PSIVER (0.128) RF_hetero (0.162)

Pre

cisi

on

Recall

A

A

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

RF-PPI interface prediction20 May 2017

Seeing the Trees through the Forest: Sequence-based Homo- and Heteromeric Protein-protein Interaction sites prediction using Random Forest

Qingzhen Hou, Paul de Geest, Wim Vranken, Jaap Heringa and K. Anton Feenstra

CBSB 2017 – CincinattyAB

[19] 20 May 2017 RF-PPI interface prediction[19] 20 May 2017 RF-PPI interface prediction[19] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Random Forest training scheme and (external) predicted features

homodimer dataset 1

(1593)

Dset_72 2

(72)

Dset_186 2

(186)

1 Hou et al. BMC Bioinf. 16:325 2015; 2 Murakami & Mizuguchi. Bioinformatics 26:1841 2010; 3 Cilia et al. Nat. Commun. 4:2741 2013 & Cilia et al. Nucleic Acids Res. 42:W264 2014; 4 Petersen et al. BMC Struct. Biol. 9:51 2009

Dynamine Training set 3

NetsurfP Training set 4

CD-HIT<30%ID

BLASTClust<25%ID

BLASTClust<25%ID

BLASTClust<25%ID

homodimer dataset(610)

HM_479(479)

Dset_48(48)

Dset_119

(119)

Homomeric60% training20% validate

(5-fold)20% test

Hetero-meric test

Hetero-meric

training

27.763 IF101.917 non-IF

3.641 IF20.687 non-IF

1.313 IF12.743 non-IF

Hou, et al. Bioinformatics 2017 https://doi.org/10.1093/bioinformatics/btx005

⇒ predict backbone dynamics

⇒ predict secondary structure & solvent accessibility

[20] 20 May 2017 RF-PPI interface prediction[20] 20 May 2017 RF-PPI interface prediction[20] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

What works well?

homoH_479testing

heteroDset48

all sites

‘true’ IFsites

RF_combinedRF_hetero

RF_homo

[21] 20 May 2017 RF-PPI interface prediction[21] 20 May 2017 RF-PPI interface prediction[21] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

What works well?• Dset42

• 32 PDB structures

• Only three cases where no correct IF position is predicted (once for ‘homo’, and two for ‘hetero’ predictor)

• 17 with predictions for both chains

• HM_479 testing (20%)

• 95 PDB structures

• Only one case where no correct IF positions is predicted (for ‘combined’ predictor)

[22] 20 May 2017 RF-PPI interface prediction[22] 20 May 2017 RF-PPI interface prediction[22] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Are we biased to (a few) large proteins?

[23] 20 May 2017 RF-PPI interface prediction[23] 20 May 2017 RF-PPI interface prediction[23] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Are we biased to (a few) large protein families?

[24] 20 May 2017 RF-PPI interface prediction[24] 20 May 2017 RF-PPI interface prediction[24] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Which features to use ⇒ Conservation?

NO!Interface

Surface

Qingzhen Hou, et al., PLoS ONE (2016)

[25] 20 May 2017 RF-PPI interface prediction[25] 20 May 2017 RF-PPI interface prediction[25] 20 May 2017 RF-PPI interface prediction

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparing homodimers and homologous monomers (interacting vs. non-interacting) ⇒ Specificity?

YES! Interface

Surface

`

and better for longer alignments

Qingzhen Hou, et al., PLoS ONE (2016)


Recommended