+ All Categories
Home > Documents > Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular...

Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular...

Date post: 27-Sep-2018
Category:
Upload: lyhuong
View: 213 times
Download: 0 times
Share this document with a friend
14
Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, Ryan P. Adams
Transcript
Page 1: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Learning Molecular Fingerprintsfrom the Graph Up

David Duvenaud, Dougal Maclaurin,

Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli,

Timothy Hirzel, Alán Aspuru-Guzik, Ryan P. Adams

Harvard University

September 30, 2015

Page 2: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Motivation

• Want to do regression onmolecules

• For virtual screening ofdrugs, materials, etc.

• Problem: Molecules can beany size and shape

• Only know how to learnfrom fixed-size examples.

• How to take a molecule inand produce a fixed-sizevector?

Page 3: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Circular Fingeprints

• Standard method lists allsubstructures below acertain size

• Can do this bycombining hashes ofeach atom with andbonded neighbors

• Hash value indexes intoa fixed-sized vector

• Problem: can’t optimizewith gradients

Page 4: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

What would Ryan do?

• Maybe we can build amessage-passingnetwork

• same function is appliedto each node (atom) andits neighbors

• Like a convolutional net• At the top, add all node’s

vectors together• If we use a softmax, this

generalizes circularfingerprints

Page 5: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Continuous-izing Circular Fingerprints

Circular fingerprints1: Input: molecule, radius R, fingerprint

length S2: Initialize: fingerprint vector f← 0S3: for each atom a in molecule do4: ra ← g(a) . lookup atom features5: for L = 1 to R do . for each layer6: for each atom a in molecule do7: r1 . . . rN = neighbors(a)8: v← [ra, r1, . . . , rN ] . concatenate9: ra ← hash(v) . hash function10: i ← mod(ra,S) . convert to index11: fi ← 1 . Write 1 at index12: Return: binary vector f

Neural graph fingerprints1: Input: molecule, radius R, weights

H11 . . .H

5R , output weights W1 . . .WR

2: Initialize: fingerprint vector f← 0S3: for each atom a in molecule do4: ra ← g(a) . lookup atom features5: for L = 1 to R do . for each layer6: for each atom a in molecule do7: r1 . . . rN = neighbors(a)8: v← ra +

∑Ni=1 ri . sum

9: ra ← σ(vHNL ) . smooth function

10: i← softmax(raWL) . sparsify11: f← f + i . add to fingerprint12: Return: real-valued vector f

Every non-differentiable operation is replaced with adifferentiable analog.

Page 6: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Generalizing Circular Fingerprints

• If we generalize existingfingerprints, we can’t notwin (unless we overfit)

• large random weightsmakes neural nets act likehash functions

• Looked at similaritiesbetween pairwisedistances. 0.5 0.6 0.7 0.8 0.9 1.0

Circular fingerprint distances

0.5

0.6

0.7

0.8

0.9

1.0

Neu

ral

fin

gerp

rin

t d

ista

nce

s

Neural vs Circular distances, r=0:823

Page 7: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Generalizing Circular Fingerprints

• If we generalize existingfingerprints, we can’t notwin (unless we overfit)

• large random weightsmakes neural nets act likehash functions

• Looked at performance ofrandom weights. 0 1 2 3 4 5 6

Fingerprint radius

0.8

1.0

1.2

1.4

1.6

1.8

2.0

RM

SE

(lo

g M

ol/

L)

Circular fingerprints

Random conv with large parameters

Random conv with small parameters

Page 8: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Performance

Dataset Solubility Drug efficacy Photovoltaic efficiencyUnits log Mol/L EC50 in nM percent

Predict mean 4.29 ± 0.40 1.47 ± 0.07 6.40 ± 0.09Circular FPs + linear layer 1.84 ± 0.08 1.13 ± 0.03 2.62 ± 0.07Circular FPs + neural net 1.40 ± 0.15 1.24 ± 0.03 2.04 ± 0.07Neural FPs + linear layer 0.74 ± 0.09 1.16 ± 0.03 2.71 ± 0.13Neural FPs + neural net 0.53 ± 0.07 1.17 ± 0.03 1.44 ± 0.11

• Could also try varying depth of neural net on top(used one hidden layer here)

Page 9: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Interpretability

• Circular fingerprintsactivate for a singlesubstructure

• No generalization• No notion of similarity• Let’s put a linear layer on

top of neural fingerprintsand examine whichfragments activate mostpredictive features.

Page 10: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Interpretability: Solubility

Fragments activating feature most predictive of solubility:

OOH

O

NH

O

OH

OH

most predictive of insolubility:

Page 11: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Interpretability: Toxicity

Fragments most activated by toxicity feature on SR-MMPdataset:

Fragments most activated by toxicity feature on NR-AHRdataset:

Page 12: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Future Work

• Limitation: Slow because ofso many weight transforms

• Could use low-rank weightmatrices

• Limitation: All features arelocal

• Could learn to “parse”molecules

• But how to take gradients?

Page 13: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Delaney, John S. ESOL: Estimating aqueous solubilitydirectly from molecular structure. Journal of ChemicalInformation and Computer Sciences, 44(3):1000–1005,2004.

Gamo, Francisco-Javier, Sanz, Laura M, Vidal, Jaume,de Cozar, Cristina, Alvarez, Emilio, Lavandera,Jose-Luis, Vanderwall, Dana E, Green, Darren VS,Kumar, Vinod, Hasan, Samiul, et al. Thousands ofchemical starting points for antimalarial leadidentification. Nature, 465(7296):305–310, 2010.

Hachmann, Johannes, Olivares-Amaya, Roberto,Atahan-Evrenk, Sule, Amador-Bedolla, Carlos,Sánchez-Carrera, Roel S, Gold-Parker, Aryeh, Vogt,Leslie, Brockway, Anna M, and Aspuru-Guzik, Alán.The Harvard clean energy project: large-scalecomputational screening and design of organicphotovoltaics on the world community grid. The Journalof Physical Chemistry Letters, 2(17):2241–2251, 2011.

11 / 11

Page 14: Learning Molecular Fingerprints from the Graph Upduvenaud/talks/neuralfps.pdf · Learning Molecular Fingerprints from the Graph Up David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre,

Tox21 Challenge. National center for advancingtranslational sciences.http://tripod.nih.gov/tox21/challenge,2014. [Online; accessed 2-June-2015].

11 / 11


Recommended