Junction Tree Variational Autoencoder for Molecular Graph Generation
Wengong Jin, Regina Barzilay and Tommi Jaakkola
Presentation adapted from slides by: Wengong Jin Presenter: Yevgeny Tkach
[email protected]://qdata.github.io/deep2Read/
Executive Summary• Molecule generation using VAE. Encoding and
decoding is based on spacial graph message passing algorithm.
• Instead of generating the molecule node by node which can be looked at as “character level” generation, this work builds higher level vocabulary based on tree decomposition of the molecule graph.
• Using proper “words/parts of speech” helps to make sure that the final molecule is valid.
Drug Discovery
O
ClO
NH
NH
O
NH
5.30O
NH
O
NH
Cl
Cl
NH
SNH
Cl
4.93
O
NHNH
O
NH
F
4.49
O
Cl
Cl
Cl
4.45
NH NH
Cl
Cl
4.42
O
NH
NH
S
NH
Cl
Cl
4.42
ONH
NH
S
NH
O
NH
Cl
Cl
4.40
O
NH
NH
S
NH
Cl
Cl
4.37
Br
S
O
Cl
4.30
O
NH
NH
O
NH
Cl
Cl
4.26
F
S
NNH
S
N
F
NH
N
4.23
O NH
O
Cl
4.18
O
NH
O
S
4.17
NH
O
NH
NH
S
Cl
4.08
O
NH
Cl
NH
NHCl
Cl
4.07
S
NH
O
NH
NH
O
NH
Cl
Cl
4.07
O
NH
N
Br
S
Cl
Cl
4.04
O
S
O
S
Cl
S
4.04
Cl
S
Cl
Cl
4.03O
S
NH
Br
Cl
4.02
NH
O
NH
O
NH
Cl
Cl
4.01
O
NH
O
S
4.00
O
N
S
N NH
S
3.99
O
NH
S
NHNH
O
NH
Cl
Cl
3.98
O
NH
NH
Cl
NH
NH
O
Cl
Cl
3.96
N
Cl
NH
3.86
Br
Br N
OS
3.83
O
NH
O
NH
Cl
3.81
Cl
ClCl
NHCl
3.79
NH
O
N
Cl
3.72
O
I
N
Cl
N
Cl
3.69
Cl
Cl
Cl
NH
Cl
3.68
S
O
NH
O
O
3.64
O
S
N
S
3.61
N
Br
O
SS
3.61
N
Br
O
SS
3.61
O
Cl
N
Cl
3.60
NH
Cl
Cl
Cl
O
3.58
F
Cl
ClCl
3.58
S
FCl
N
3.58
Cl
S
N
3.57
N N
Cl
SN
O
F
3.56
N
S
O
O
NN
3.53
NH
O
NH
Cl
N+ O
O-
3.52
O
NH
N
S
Cl
Cl
3.51
O
NH
ClCl
Cl
3.51
O NH
F
3.50
O
S
3.50
O
OBr
Br
Br
3.50
OBr
3.50
Generate molecules with high potency
Drug Discovery
Modify molecules to increase potency
NH
N
-S NH+
S
NNBr
O
+H3N
O O-
OO
NH
OO
N+HN
O
N+O
O-
OH
S
OHN
OH
N
N+ O-O
O
O-ON O
N
OS
+HN
Cl NH+
N
OSN
HO
N
-2.056.516.696.80
O
NH
O
N
ON
+HN
O
O
HN
O
N
OO
N
O
NH
OO
H2N
O O-
OO
HN
NH
O
Br
SO
O
NN
OO O-
SO
O
NN
O
+HN NH2
O
NH+NH2
O
NH2
O
-2.214.524.945.69
NH
O
NH
SO
OO
O-
NH
O
NH
SO
OO
HN Br
O
NNH2+
N
O
NN
O
NH
O
O-
O
NN
O
NH
O
N
O
OH
N
NNN+
O
-O
OH
N
NN+HN
-1.922.693.034.00
Molecular Variational Autoencoder
Encoder Decoder
Potency Prediction
Bayesian optimization over latent space
Find “best” drugs
Gradient ascent over latent space
Make “better” drugs
[1] Gomez-Bombarelli et al.,Automatic chemical design using a data-driven continuous representation of molecules, 2016
How to generate graphs?
N
O
N S
O
N
O
N
O
N S
O
Valid ValidInvalid Invalid Invalid
Moresteps
Node by Node
• Not every graphs is chemically valid
• Invalid intermediate states hard to validate
• Very long intermediate steps difficult to train (Li et al., 2018)
[2] Li et al., Learning Deep Generative Models of Graphs, 2018
How to generate graphs?
N
O
N S
O
N
O
N
O
N S
O
Valid ValidInvalid Invalid Invalid
Moresteps
N
O
N S
O
Valid Valid
O
SNValid
Node by Node
Group by Group• Shorter action sequence
• Easy to check validity
Tree DecompositionMolecule Junction tree
NN
NN
NN
O ClSO SN
SNCluster label
Vocabulary……
Clusters
• Generate junction tree Generate graph group by group
• Vocabulary size: less than 800 given 250K molecules
Our Approach
Molecule
Encode
DecodezG
zT
Ci
Molecular Graph G
Junction Tree T
Tree Decomposition
CjEncode Decode
Clusters
Node feature
Graph Encoding
[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016
1-hop neighborhood graph
Graph Encoding
[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016
2-hop neighborhood graph
Graph Encoding
[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016
Graph Encoding
u
v⌫(t)uv
w
w⌫(t�1)wu
Node feature Edge featureMessages
[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016
hu
u
Graph Encoding
[3] Dai et al., Discriminative embeddings of latent variable models for structured data, 2016
Tree Decoder
1
5
63 4
2
7 8
Label Prediction
[4] Alvarez-Melis & Jaakkola, Tree-structured decoding with doubly-recurrent neural networks
Tree Decoder
1
5
63 4
2
7 8
1. Topological Prediction
2. Label Prediction
Topological Prediction: Whether to expand a child or backtrack?
Label Prediction: What is the label of a node?
S
Message vector
Tree Decoder
1
5
63 4
2
7 8Topological Prediction
Backtrack
Topological Prediction: Whether to expand a node or backtrack?
Label Prediction: What is the label of a node?
Tree Decoder
hki
hki
hij
j
i
k
k
hij = GRU(xi, {hki}k2Nt(i)\j) hij zT
Feedforward NN
Label Prediction
Encodes the entire subtree of current state
Graph Decoder
Gi
A
S
S
A
S
A
S
Enumerate how clusters are merged together
1
2
Encode each candidategraph by graph encoder
fai (Gi) = hGi · zG 3
Score each candidate:
Enumerated subgraphs
Graph encoder
Training? VAE?• The KL divergence part on
the latent space is not discussed in the paper.
• zG is only used for generated subgraphs ranking so not clear how it falls in the VAE paradigm.
• From the code, training is with KL annealing following “Generating Sentences from a continuous space” paper by Bowman et al.
Experiments
• Data: 250K compounds from ZINC dataset
• Molecule Generation: How many molecules are valid when sampled from Gaussian prior?
• Molecule Optimization
• Global: Find the best molecule in the entire latent space.
• Local: Modify a molecule to increase its potency
BaselinesSMILES string based:
1. Grammar VAE (GVAE) (Kusner et al., 2017);
2. Syntax-directed VAE (SD-VAE) (Dai et al., 2018)
Graph based:
1. Graph VAE (Simonovsky & Komodakis, 2018)
2. DeepGMG (Li et al., 2018)
[2] Li et al., Learning Deep Generative Models of Graphs, 2018[5] Kusner et al., Grammar Variational Autoencoder, 2017[6] Dai et al., Syntax-directed Variational Autoencoder for structured data, 2018[7] Simonovsky & Komodakis, GraphVAE: Towards generation of small graphs using variational autoencoders
Molecule Generation (Validity)
0
25
50
75
100
GVAE GraphVAE SD-VAE DeepGMG Ours (w/o Ours
10093.589.2
43.5
13.57.2
checking)
NH
O
O
Cl
N
N
NN
O
N
O
S
O
NNH2
NH
O
OO
NH
NNHNH+
HH
HO
OH
N
N
NH N
O
NH
OH
NH2+ NH+N
O
F
NH
O S
O
O
NH
OF
NH
N
O
N
N
O
OH
S
OO
NH
O
N
SO H S
O
O
NH
O
H3N+
NNH O
O-
N
O
NNH+
NH
O
N
NH
Cl
NH2+
NH
O
NS
O
O
N
S
O
NH
S
NH
O
O
N
NN
O
O
N
N
NN
Cl
NHO
N
F
NO
N
O
NO
H2NNH
ON
O
NNH+
NH
O
N
S
N
N
O
O
NH
S
NH3+
NH
O
O
SO
O
O
S
ONH
N
OH
N
O
NH
NH2
O
NHN
O
NHO
O
H2N
NH
S OO
O
O
NH
O
O
O
NH
NN
N
OH
S N
N
O
NH2+
OO
NN
NHO
NH+ N
O
S
O O
N
OH
H
NH2
N
O
NH
O
NH
N
S
N
N
Cl
NO
N
OH
SNHO
N
N
NHO
NH2+
H
H
N
O
N
NH
O
O
NH
O
N
S
O
NH3+
N
N
NH O
O
N+
O
O-
N
NH
O
SNONH
NH2N
O
O
NN
N
NNH2
O
N
O
S
O
NH+Cl
NH
O
NH2
OHO
N
O
N
OHNH+
H2N
O
N
H
N
O
NH2
NH
O
O
N
N
N N
O
O
NH
O
N
S
N+
O
O- NH2S NH2
O
O
NH
O
NH
O
N
N
NH
O
NH
OO
NH+
F
NHS
O
O
N
NH
O
Cl
S
NNH2
O
NH
H HN
NNO
O
O
O
N
O
S
O
O
N-
NH
O
N+ F
N
O
NH
Cl
N
O
N
O
NH
OH
O
SHO
O
NH+
N
N
O
O
NH
O
NH
H
O
NHN+
OH
NH+
NH2
N
NH+
NH
O
N
OH
OH
O S
O
ON
N
N
N
N
SO
N
OO
O
OO
S
N
N
N
H2N S
NH
S
O
O O
O
N
ONH H
H
Br O
NH
N NH
S
O
OO
NH
S
NH+
OH
NH2 O
N
S
N
NH
O
O
O
NH+
N
S
NH2+
N
O
F N
O
N
N
O
N
F
FF HO
O
NN
F
ONH
NH3+
NH2+
NH+
O
N
OO
O
N NH
N
O
NH
FS
S
O
O
NH
N+O
O-
O
F
F
NN
NN
O
NO
H
H ONH
O
N
S
O
ON
N
H3N+ NH3+
NNH+
O
NN
HS
NN
O
N
NH O
O
NH
N
S
NH2
N
O
O
NH
FBr
N
O
NH
O
O O
O NO
N
Br
N
O
NH
O
NHH
O
NH
S
NS
NH2O
NH2+
NH
N
NH2
O
F
S
O
O
NH
ONH
N
ONH
O Cl
O
NH
OOH
N
O
NH
O
NH
NH3+
NH2+
H
O
OH
N
S
O
NH
NNH
O
N
O
NHO
N
OO
N
O
NH+NH2
O
NH O
N S
N
N
O
N
N O
O
NH
H2N
NH
O
H3N+
OH
NH3+
NH2+
N
O ClN
N
O
NHO
N
O
NH
O
NH
N
N
O
O
NH
N
N
O
O
O
NH2
NH
N
NH
OHNH
H
NH
O
O
NN
N
F
ON
NH
O
H2N
NH2+
NH
NH3+
FN
S
O
N
O
NHS
O
O
NHN
NH2
O
NO
NN
OH
O
O
NNH
ONH
S
H
O
NH
O
NN
S
NH2
O O
O
NH
O
N
SNH
O
N
N
O
Br
NH+
H2N
NHNH+ NH
O
N
N
NH2+
S N
NHO
H
OF
NH
NNH
OO-
O
N
O
O
NHN
O
NHH
N
O
S NHN
O
O
NH
O
NH2
NH
H3N+NH2
+
N O
O
NH2+
OH
NH
SO
OO-
NO
NH2+
S
H
H
HO
NH
O
H3N+ NH2+
NH
O
N
H2N
N NH
O
SHO
O
N
N NH2
S
Cl
N
NH
N
O
OH
NO
S
O
ON
NH2+
N NH2+
NHO
ON
S
H
HO
NHO
NH
H2N
H3N+
NH3+
N
O
O
N
Cl
N
OO
O
N
F
N
O
NHO
NH
O
SH
O NH
NNH
NH
O
NH
O
F
N-S
O
O
NH2
NH
O
ONH
O
OH
O
NH3+ O
N
NH2+
NH+
O
NH
N O
O
O
N
O
NH
SO
O
NH
O
F
NN
O
N
N
O
NH
N
NNH
O
O
NH2NH
O
N ONH
O
F
O
N
Cl
SH
Br
NH
O
NHN
S
O
NH+
NH
ONH
N
S
O
NH
F
F
N
O
NH
S
O
O
Cl
N
NH
OO-
NO
NN
HO
H
H
H3N+
SNH2
Cl
S
NN
O
O
N
O
NN
O
O
N
N
N
OO
O
N
O
O
O
O
O
NH+
NH
O O
O
N
N
S
O
ONH
O
O
N
N
O
N
N
NH
O
N
ONH
NN
OO
N N N
NN-
N
OH
N
N
O O
N
N
O
NH
N ONH
NO
O
NHN
O NH
O
N
NH2+
O
NH
O
ON
NH2+
H2N N
O
NH
OH
N
O
O
S
O
NH
NH2
NH
OH
F
FS
H2N
O
NHN
NNH
N
O
NH
BrO
O
N
N
N
NH3+
NH
O
NH2+
N
O
O-
N
NH3+
S
NH
O
O
NH
H2NNH
NN+
O
O-
NH
O
N
N
S
O
NH+
NH+
N
OH
O
O
NN
O
N
NS
O
O
NH NH
O
O
S
NH2
N
OO
N
NH2+O
N
N
N
N
O O-
N
FS
O
NH
O
N
SO OS
O
O
NHO
N NH+
NH+
O
NH
S
N
S
O
NH
NH
O
O
F
H O
N
ON
O
NH
F
O
NH
NN
O
N NH+ NH
O O
N
N
O
N
O
O
S
N
O
NH
HON N
NNH
Cl
O
N
NH
H2N
OH
N O
O
NH
NH3+
NN
O
O
NH
O
H2N
NH2+ NH
O
N
H H
S
NH
O
NH2
F
F
NH
NH2
S
O
O
N
O
N
N
N
O
N
O
N O N
NH S
O
O
N
Cl
N
NO
OCl
NH
NH+
O
NHN
N H
H
O O
N
NH2+
N
NN
N
NH2
O
FN
N
N
N
NO
O OH2N
O
N
O
N
S
O
BrO
NH N
O
O
NO
NS
O
O
O NH
O
N O
O
N
O
N
N
N
Cl
S
OO
O
NHNH
N
O
ON
NN O
O N
S
O
O
NNH
S
O
O
NN
N
OH
N-S
O
O
ON
O
OHN
SO
NH
N
OHN
O
S
ClO
O
N
O
NHO
H2N O
NH
O
OH O
NH
OH
NH2+
NH2
S
O
O
O
NHOH
NH3+
O
NH
O
O
NHS
HO
N
NH+N
O
N
O
H
HO
O
NH NH
NH2N
BrN
O
N
O
N
NH S
O
O
O
N
SN
O N
N
O
S
O
NH
N
O
Cl
S
ON
N
O
O-
N O
N NH
O
O
H
O
S
NH O
O
NHS
ClNH
O
ON
FNH
O
N
O
HS O
O
ClNH
O
S
O
NH
S
NH
O
ON
N
N
NH
NH2
O N
NNH+
O
N
O
N
NH2+
NH3+
N
H
H
O
NH
O
OO
O
O
NH
N
O
Sampled Molecules
Molecule Optimization (Global)
0
1.5
3
4.5
6
CVAE GVAE SD-VAE Ours
5.3
4.04
2.94
1.98
Property Score of the Best Molecule
Property: Solubility + Ease of Synthesis
Encoder Decoder
Property Prediction
GaussianProcess
BayesianOptimization
Molecule Optimization (Global)
0
1.5
3
4.5
6
CVAE GVAE SD-VAE Ours
5.3
4.04
2.94
1.98
Property Score of the Best Molecule
5.30
4.49
4.93
Property: Solubility + Ease of Synthesis
Molecule Optimization (Local)
0
0.5
1
1.5
2
0 0.2 0.4 0.6
0.21
0.84
1.681.91
Preservation
Average Improvement
Encoder Decoder
Property Prediction
NeuralNetwork
Gradient Ascent
Molecule Optimization (Local)
Average Improvement
NH
N
-S NH+
S
NNBr
O
+H3N
O O-
OO
NH
OO
N+HN
O
N+O
O-
OH
S
OHN
OH
N
N+ O-O
O
O-ON O
N
OS
+HN
Cl NH+
N
OSN
HO
N
-2.056.516.696.80
O
NH
O
N
ON
+HN
O
O
HN
O
N
OO
N
O
NH
OO
H2N
O O-
OO
HN
NH
O
Br
SO
O
NN
OO O-
SO
O
NN
O
+HN NH2
O
NH+NH2
O
NH2
O
-2.214.524.945.69
NH
O
NH
SO
OO
O-
NH
O
NH
SO
OO
HN Br
O
NNH2+
N
O
NN
O
NH
O
O-
O
NN
O
NH
O
N
O
OH
N
NNN+
O
-O
OH
N
NN+HN
-1.922.693.034.00
Preservation ≈ 0.6 0
0.5
1
1.5
2
0 0.2 0.4 0.6
0.21
0.84
1.681.91
Preservation
Molecule Optimization (Local)
Average Improvement
NH
N
-S NH+
S
NNBr
O
+H3N
O O-
OO
NH
OO
N+HN
O
N+O
O-
OH
S
OHN
OH
N
N+ O-O
O
O-ON O
N
OS
+HN
Cl NH+
N
OSN
HO
N
-2.056.516.696.80
O
NH
O
N
ON
+HN
O
O
HN
O
N
OO
N
O
NH
OO
H2N
O O-
OO
HN
NH
O
Br
SO
O
NN
OO O-
SO
O
NN
O
+HN NH2
O
NH+NH2
O
NH2
O
-2.214.524.945.69
NH
O
NH
SO
OO
O-
NH
O
NH
SO
OO
HN Br
O
NNH2+
N
O
NN
O
NH
O
O-
O
NN
O
NH
O
N
O
OH
N
NNN+
O
-O
OH
N
NN+HN
-1.922.693.034.00
Preservation ≈ 0.4
0
0.5
1
1.5
2
0 0.2 0.4 0.6
0.21
0.84
1.681.91
Preservation
Discussion• “word level” prediction can offer significant
improvement by shortening the decision process. • Latent space optimization is an interesting and
powerful technique. • “Teacher forcing” introduces data bias which can be
reduced via RL techniques and the GAN complete graph valuation approach.
• Similar to SMILES this paper samples a random order in the graph tree structure when: using an arbitrary minimal spanning tree, choosing an arbitrary node to be the root of the tree, choosing a random ordering of the children of each tree node.
Thanks
Original code is available at: https://github.com/wengong-jin/icml18-jtnn
Tree & Graph Vector
Input Molecule
ReconstructedMolecule
Junction Tree
Reconstructed Tree
S
C