1
Electronic supplementary information
Content
Fig. S1 The distribution of MCC (A), overall accuracy (B), and AUC (C) values based
on different proportions of the training set and testing set using ECFP_4 and LCFP_4
fingerprints.
Fig. S2 Performance of the top-10 models with external test validation.
Fig. S3 Compounds from the NCI-60 actives that are closest to the structurally novel
anticancer agents identified in the present study. All calculations were done with
Discovery Studio 3.5.
Fig. S4 Cellular toxicity dose−response curves for compound G03 in HeLa and MDA-
MB-231 cells. Values are generated using at least seven concentrations of G03 (μM),
with triplicate determinations at each concentration.
Fig. S5 (A) Four distinct ligand binding sites located on tubulin. Crystal structures of
tubulin bound with taxol (PDB ID: 1JFF), vinblastine (PDB ID: 5BMV), colchicine
(PDB ID: 3E22), zampanolide (PDB ID: 4I4T), laulimalide (PDB ID: 4O4H), and
laulimalide (PDB ID: 4O4J) are superimposed. α-Tubulin, β-tubulin, and RB3 are
colored in green, red, and yellow ribbon, respectively. Vinblastine is represented by
blue space-filling models, GTP and GDP are represented by white and orange stick,
laulimalide and peloruside A are represented by white space-filling models, taxol and
zampanolide are shown in green space-filling representation, and colchicine is
represented by yellow space-filling models. Graphic images were prepared with MOE
(version 2017). (B) Chemical structures of antitubulin agents.
Fig. S6 Horizontal (A) and vertical (B) view of tubulin in blind docking box. The
tubulin protein was centroid placed in blind docking box (center_x = 18.442, center_y
= 56.715, center_z = 29.226) with a dimension of 140 × 140 × 140 (x × y × z, Å).
Fig. S7 Prediction binding sites of G03 on tubulin. A total of 20 docking poses of G03
generated with QVina-W are represented by green stick. Graphic images were prepared
with MOE (version 2017).
Fig. S8 The best hypothesized binding modes of G03 and Laulimalide/Peloruside site
Electronic Supplementary Material (ESI) for Organic & Biomolecular Chemistry.This journal is © The Royal Society of Chemistry 2019
2
(A and B), Site B (C and D) and Vinblastine site (E and F) of tubulin. Left and right
represent the corresponding two- and three-dimensional ligand-protein interactions.
Graphic images were prepared with MOE (version 2017).
Table S1. Comprehensive NCI-60 data set (Table S1.xlsx).
Table S2. Results of structural diversity comparison of the compounds from the NCI-
60 dataset and DrugBank and WDI databases.
Table S3. Top-ten in silico cell-based models used for evaluating the external test set.
Table S4. Four independent external testing sets (Table S4.xlsx).
Table S5. The 55 virtual hits were purchased for experimental validation (Table
S5.xlsx).
Table S6. Reported representative crystal structures of tubulin-ligand complexes.
Table S7. The distribution of 20 blind docking poses of G03 in six prediction binding
sites of tubulin.
3
Fig. S1 The distribution of MCC (A), overall accuracy (B), and AUC (C) values based
on different proportions of the training set and test set using ECFP_4 and LCFP_4
fingerprints.
4
Fig. S2 Performance of the top-10 models with external test validation.
5
Fig. S3 Compounds from the NCI-60 actives that are closest to the structurally novel
anticancer agents identified in the present study. All calculations were done with
Discovery Studio 3.5.
6
Fig. S4 Cellular toxicity dose−response curves for compound G03 in HeLa and MDA-MB-231 cells. Values are generated using at least seven concentrations of G03 (μM), with triplicate determinations at each concentration.
7
Fig. S5 (A) Four distinct ligand binding sites located on tubulin. Crystal structures of
tubulin bound with taxol (PDB ID: 1JFF), vinblastine (PDB ID: 5BMV), colchicine
(PDB ID: 3E22), zampanolide (PDB ID: 4I4T), laulimalide (PDB ID: 4O4H), and
laulimalide (PDB ID: 4O4J) are superimposed. α-Tubulin, β-tubulin, and RB3 are
colored in green, red, and yellow ribbon, respectively. Vinblastine is represented by
blue space-filling models, GTP and GDP are represented by white and orange stick,
laulimalide and peloruside A are represented by white space-filling models, taxol and
zampanolide are shown in green space-filling representation, and colchicine is
8
represented by yellow space-filling models. Graphic images were prepared with MOE
(version 2017). (B) Chemical structures of antitubulin agents.
9
Fig. S6 Horizontal (A) and vertical (B) view of tubulin in blind docking box. The
tubulin protein was centroid placed in blind docking box (center_x = 18.442, center_y
= 56.715, center_z = 29.226) with a dimension of 140 × 140 × 140 (x × y × z, Å).
10
Fig. S7 Prediction binding sites of G03 on tubulin. A total of 20 docking poses of G03
generated with QVina-W are represented by green stick. Graphic images were prepared
with MOE (version 2017).
11
Fig. S8 The best hypothesized binding modes of G03 and Laulimalide/Peloruside site
(A and B), Site B (C and D) and Vinblastine site (E and F) of tubulin. Left and right
represent the corresponding two- and three-dimensional ligand-protein interactions.
Graphic images were prepared with MOE (version 2017).
12
Table S2 Results of structural diversity comparison of the compounds from the NCI-60
dataset and DrugBank and WDI databases
Data set Compounds Scaffolds Diversity(Scaffolds/Compounds)
NCI-60 18219 7867 43.18%
DrugBank 6516 2784 42.70%
WDI 70555 24557 34.80%
13
Table S3 Top-ten in silico cell-based models used for evaluating the external test set
training set test setmodels
TP FN TN FP SE SP MCC Q AUC TP FN TN FP SE SP MCC Q AUC
NB-LCFP_8 5713 590 7121 240 0.906 0.967 0.878 0.939 0.904 1697 422 2164 272 0.801 0.888 0.694 0.848 0.91
NB-LCFP_10 5767 536 7121 240 0.915 0.967 0.886 0.943 0.902 1692 427 2156 280 0.798 0.885 0.688 0.845 0.908
NB-LCFP_6 5536 767 7131 230 0.878 0.969 0.855 0.927 0.902 1696 423 2152 284 0.800 0.883 0.688 0.845 0.909
NB-FCFP_8 5632 671 7062 299 0.894 0.959 0.858 0.929 0.898 1655 464 2187 249 0.781 0.898 0.686 0.843 0.906
NB-ECFP_12 5754 549 7176 185 0.913 0.975 0.893 0.946 0.901 1620 499 2216 220 0.765 0.910 0.685 0.842 0.907
NB-ECFP_8 5691 612 7168 193 0.903 0.974 0.883 0.941 0.905 1667 452 2175 261 0.787 0.893 0.686 0.843 0.911
RF-LCFP_6-38 5696 607 6763 598 0.904 0.919 0.823 0.912 0.9733 1744 375 2108 328 0.823 0.865 0.690 0.846 0.917
RF-LCFP_10-34 5685 618 6753 608 0.902 0.917 0.819 0.910 0.9722 1748 371 2102 334 0.825 0.863 0.689 0.845 0.917
RF-LCFP_12-34 5685 618 6753 608 0.902 0.917 0.819 0.910 0.9722 1748 371 2102 334 0.825 0.863 0.689 0.845 0.917
RF-LCFP_8-34 5688 615 6750 611 0.902 0.917 0.819 0.910 0.9723 1743 376 2103 333 0.823 0.863 0.687 0.844 0.916
αNB: naïve Bayesian; ST: single tree; RF: random forest. The best tree depth is 22 for ST models
(LPFP_8), 29 for ST models (LPFP_6), and 32 for ST models (EPFP_4). The best tree depth is 34
for RF models (LPFP_10 and LPFP_12) and 38 for RF models (LPFP_6). TP: true positives; TN:
true negatives; FP: false positives; FN: false negatives; SE: sensitivity; SP: specificity; Q: overall
predictive accuracy; MCC: Matthews correlation coefficient; AUC: area under the receiver
operating characteristic curve.
14
Table S6 Reported representative crystal structures of tubulin-ligand complexes
PDB ID ligand Resolution (Å) Binding site Mechanism of action
4I4T Zampanolide 1.8 Taxol site MT-stabilizing agents
3E22 Colchicine 3.8 Colchicine site MT-destabilizing agents
1JFF Taxol 3.5 Taxol site MT-stabilizing agents
5BMV Vinblastine 2.5 Vinblastine site MT-stabilizing/MT-destabilizing agents
4O4H Laulimalide 2.1 Laulimalide/Peloruside A site MT-stabilizing agents
4O4J Peloruside A 2.2 Laulimalide/Peloruside A site MT-stabilizing agents
15
Table S7 The distribution of 20 blind docking poses of G03 in six prediction binding
sites of tubulin
Docking pose Docking score (kcal/mol) Prediction binding site
1 -9.3 Laulimalide/Peloruside site
2 -8.8 Laulimalide/Peloruside site
3 -8.2 Site A
4 -8.2 Site B
5 -8.1 Laulimalide/Peloruside site
6 -8.1 Vinblastine site
7 -8.0 Site B
8 -8.0 Site B
9 -8.0 Vinblastine site
10 -7.8 Taxol site
11 -7.8 Vinblastine site
12 -7.8 Vinblastine site
13 -7.7 Site C
14 -7.6 Vinblastine site
15 -7.6 Laulimalide/Peloruside site
16 -7.6 Vinblastine site
17 -7.5 Site B
18 -7.5 Vinblastine site
19 -7.4 Vinblastine site
20 -7.3 Vinblastine site