Post on 31-Jan-2021
transcript
W O M B A TW O M B A TWOrld of Molecular BioAcTivityWOrld of Molecular BioAcTivity
Tudor OpreaSunset Molecular Discovery LLChttp://www.sunsetmolecular.com
Daylight MUG 18Santa Fe, NM, 02/24/04Copyright © Tudor I. Oprea 2004. All rights reserved
http://www.sunsetmolecular.com/
W O M B A TW O M B A TWOrld of Molecular BioAcTivityWOrld of Molecular BioAcTivity
Axes: MW, LogP, LogSw
0.001 – 638%
6 – 843%
8 - 14.418%
Inactives/SingleDose 1%
Bioactivity DistributionBioactivity DistributionBy Target TypeBy Target Type
Receptors (56.2%)
0.0 – 633%
6 – 844%
8 - 14.422%
Inactives/SingleDose 1%
Proteins (4.8%)
0.0 – 643%
6 – 845%
8 - 14.411%
Inactives/SingleDose 0%
Enzymes (39%)
0.0 – 644%
6 – 841%
8 - 14.414%
Inactives/SingleDose 2%
• Enzymes tend to have a higher rate of inactives/low actives• Receptors tend to have more medium/high actives
Target Type DistributionTarget Type DistributionBy ActivityBy Activity
enzyme55%
protein1%
receptor44%
enzyme45%
protein5%
receptor49%
enzyme37%
protein5%
receptor58%
enzyme29%
protein3%
receptor68%
Inactives (1%) Low Act. (38%) Medium Act. (43%) High Act. (18%)
• Enzymes dominate the inactive/low activity bins• Receptors clearly dominate the medium/high activity bins
WOMBAT HistoryWOMBAT History• SADB5 (May 2002):
• Project funded initially by AstraZeneca• 21700 structures (includes duplicates)• 36738 activities on 324 targets• 837 papers indexed (JMC 1996-1999)• 39.56% Ki, 53.52% IC50• 5.54% D2 or EC50
• WOMBAT 2003.2 (September ‘03):• 53126 entries (47872 unique structures) 98662 activities on 506 unique
targets, plus 236 inactives, 7982 ‘smaller than’ & 159 ‘greater than’ values• 2143 papers (2148 series) indexed (JMC 1994-1999)• 35.5% Ki, 56.6% IC50, 4.85% D2 or EC50• literature coverage included BMCL (2002), QSAR (2000-2001)
R o m a n i a n A c a d e m yInstitute of Chemistry Timisoara
WOMBAT 2004.1WOMBAT 2004.1• 76,165 entries (68,543 unique SMILES) covering 3039 series (over 3000
papers) and ~143,000 activities on ~630 targets
• Activities now include inactives (635), < (8916), > (259), @ (578 – single dose)
• 37.1% Ki (& variations), 55.85% IC50 (& variations), 4.44% D2 or EC50, 0.9% Kb and Kd , 0.1% MIC, 0.04% ED50
• Biochem. Pharmacol. 2001 [partial coverage], Bioorg. Med. Chem. Lett. 2002 [1-24], Chembiochem 2002 [partial], Eur. J. Med. Chem. 2001 [partial], J. Amer. Chem. Soc. 1975,1992,1993 [partial], J. Healt. Sci. 2003 [partial], J. Med. Chem. 1991 [partial], 1992-2000 [complete], 2003 [partial], Quant. Struct.-Act. Relat. 1998-2000 [partial]
• Fully integrated FEDORA server (Metaphorics LLC)
• New features include SwissProt IDs for most Targets and links (via the DOI format for 1737 entries) to PDF files for all literature entries
http://www.sunsetmolecular.com/products/?id=4
http://www.sunsetmolecular.com/products/?id=4http://www.sunsetmolecular.com/products/?id=4
Activity Profile for WOMBAT 2004.1Activity Profile for WOMBAT 2004.1Target Class Compounds (a) PercentG-Protein Coupled Receptors 28973 38.04Nuclear Hormone Receptors 688 0.90Integrins 1772 2.33Ion Channels 9008 11.83Aspartyl Proteases 3351 4.40Serine Proteases 1459 1.92Kinases 2842 3.73Cysteine Proteases 704 0.92Phosphodiesterases 1689 2.22Oxidoreductases 2010 2.64Oxygenases 2829 3.71Transporters 2264 2.97Others 18576 24.39
(a) number of structures active at least once/target, % of total entries
References Are Stored SeparatelyReferences Are Stored Separately
WOMBAT Quality ControlWOMBAT Quality Control• Chirality: What chemists can interpret, computers are not always able (the
“above/below the plane” must be strictly enforced)Not machine-readable Machine-readable
• Missing/altered atoms/substituents – overall error rate above 9%• Incorrectly drawn or written structures (3.4%); incorrect molecular formula or
molecular weight (3.4%);• Unspecified binding position for substituents or ambiguous numbering scheme
for the heterocyclic backbone (0.91%);• Structures with the incorrect backbone (0.71%);• Incorrect generic names or chemical names (0.24%);• Incorrect biological activity (0.34%);• Incorrect references (0.2%).
N
NRO
N
NH2N
N
N
OH OH
NH2
N
N
NNO
OH OH
R
WOMBAT Quality Control…WOMBAT Quality Control…
JMC Errors… 1JMC Errors… 1Reference Published Structure Corrected Structure Comment
JMC 37-476 chart 1
N
O
O
O NO O
O
rolipram: incorrect N atom position
JMC 43-2217 chart 1
N
N
O
N
N
O
A-85380: incorrect ring size
-||- & JMC 36-2645
NN
O
O
O N
N
O
tropisetron: methyl group in plus
-||-
N
N N
O
N O O N
O
N
ON
O
DAU-6285: missing methoxy; N instead O
JMC 37-758 chart 1 N
N
O
O
N
OH
N3
N
N
O
O
N
O
N3
Ro-15-4513: methyl group missing
JMC 37-787 figure 1
N
S
O
SO
O
OO
NS
O
S
epalrestat: E/Z config: E instead Z
JMC Errors… 2JMC Errors… 2Reference Published Structure Corrected Structure Comment
JMC 35-1969 chart 1
OO
O
O
O
O NHH
O
N
O
O
O
O
O
HH
bicuculline: incorrect chirality; incorrect ring fusion
-||-
N+
OO
H
O
HO
N+
O O
H
N+O
N+OOO
OO
HH
(+)-tubocurarine: incorrect N atom position; substitution position
JMC 37-1769 chart 1
NO
O
F
Br
NO
O
F
Cl
haloperidol: Br instead of Cl
JMC 38-16 scheme 1
O
N
NO
N
N
NOO
divaplon: missing nitrogen atom
JMC 43-71 figure 2
N
NN N
Cl
Cl
N
NN N
Cl
triazolam: missing chlorine atom
JMC Errors… 3JMC Errors… 3Reference Published Structure Corrected Structure Comment
JMC 43-1793 N
S NN
N OO
OO
N
N
N
S NN
N OO
OOO
N
N
argatroban: missing double bonded oxygen atom; missing chirality
JMC 41-1943 chart 1
ON
ClO
N N
N
O
NN
N
LY-297524: completely different structures
JMC 41-4196 N
N
NS
O
F
N
N
NS
O
F
SB-203580: missing S=O double bond
JMC 38-3645 figure 1
N
N
N
N
tacrine: missing two double bonds
JMC 38-3094 figure 1
N
O
O
O
O
N
O
O
O
levonantradol: methyl instead hydroxy,methyl & plus an extra double bond
JMC Errors… 4JMC Errors… 4
JMC 35-4509 table II O N
Othp
NY
X
O N
O NY
X
40, 41: THP in plus
JMC 35-3858 table IV
N
NX
O
NNX
53/R2: imidazoyl instead imidazolyl
JMC 43-236 table 1
X
O
FFF
O
O
FF
F
OX
6b: incorrect substituent
-||- X
NO
O
OF
FF
NO
O
X
F
FF
9: double bonded O atom in plus
JMC 39-3636 table 1
X
X
28xiii/R3: pent-4-yl instead hept-4-yl; confirmed from chemical name
JMC 40-1049 table 6
N
OO
X NO
X
69/R: wrong substituent; confirmed from chemical name
Other Errors… Other Errors… SciFinderSciFinder
Me
Ph
HO 2C
NH
NH
O
NH
O
NH
OS
SNH O
H2N
O
R
SS
S
N NN
N
N
S S
OO
O
O
O
O
N
O
WOMBAT: RB-370
Registry Number: 187454-94-0
The correct structure has a 13-member ring
Other Errors… Merck IndexOther Errors… Merck Index
N
N
O
O
N
O
NO N
O
ON
O
"Carisoprodol"Merck Index 13th ed #1854
Carisoprodol - correct structureMerck Index 13th ed has correct name
F E D O R AF E D O R A
http://www.metaphorics.com
http://www.metaphorics.com/
WOMBAT@FEDORAWOMBAT@FEDORA
WOMBAT PatternsWOMBAT Patterns• Dave Weininger wrote a SMARTS generator starting from a SMILES that was
hand-picked by Vera Povolna to match a specific (not the maximum common) substructure for each WOMBAT series
• These SMARTS are intended to capture the unique biological profile for each series – on occasion 2 such SMARTS were defined; note that hydrogens are matched exactly as defined in the series
[CH3]-[OH0]-[cH0]:1:[cH1,cH0]:[cH0]:2-[CH2]-[NH0](-[NH0]=[CH0](-[cH0]:2:[cH1]:[cH1]:1)-[CH2]-[cH0]:3:[cH0](:[cH1]:[nH0]:[cH1]:[cH0]:3-[ClH0])-[ClH0])-[CH0,SH0,CH1]=[OH0]
[CH2]-[CH2]-[NH0](-[CH2]-[CH2])-[CH2]-[CH2]-[OH0,SH0]-[cH0]:1:[cH1]:[cH1]:[cH0](:[cH1] :[cH1]:1)-[CH1]-2-[CH1](-[CH0,CH2]-[OH0]-[cH0]:3:[cH1]:[cH0](:[cH1]:[cH1]:[cH0]-2:3)-[OH0,OH1])-[cH0]:4:[cH1]:[cH1]:[cH1]:[cH1]:[cH1]:4
[OH1]-[CH0](=[OH0])-[CH2]-[CH1,CH2]-[NH1]-[CH0](=[OH0])-[CH2]-[NH1,NH0]-[CH0] (=[OH0])-[CH2,CH1,NH0]-[CH2]-[CH2]-[cH0]:1:[nH0]:[cH0]:2-[NH1]-[CH2]-[CH2]-[CH2]-[cH0]:2:[cH1]:[cH1]:1
[OH1]-[CH0](=[OH0])-[CH2]-[CH1,CH2]-[NH1]-[CH0](=[OH0])-[CH2]-[NH0]-1-[CH0](-[CH1](-[CH2]-[CH2]-1)-[CH2]-[CH2]-[cH0]:2:[nH0]:[cH0]:3-[NH1]-[CH2]-[CH2]-[CH2]-[cH0]:3:[cH1]:[cH1]:2)=[OH0]
[NH2]-[CH2]-[CH2]-[CH2]-[NH1]-[CH2]-[CH2]-[CH2]-[CH2]-[NH1]-[CH2]-[CH2]-[CH2]-[NH1]-[CH0,SH0]=[OH0]
• Provides interesting associations in FEDORA
Increased MW does not Increased MW does not warrant higher activitywarrant higher activity
67210 structures 138401 activities
MW
Increased Increased LogPLogP does not does not warrant higher activitywarrant higher activity
ELogP
66824 structures 137766 activities
How Small Can Active Compounds Be?How Small Can Active Compounds Be?Binned ELogP
Less -3 -1 1 3 More
10
20
30
40
50
60
70
80
90N
+O
NH2O
CH3
CH3
CH3NH2OH
OH
O
O
OHO
OHNH2
H
H
NNH
NH2
NCH3N N NH
CH3CH3
CH3
N
NH2
PO
OHNH2 N
NO
CH3
N
CarbacholMW = 143LogP = -3.8IC50 = 8.2 (m)
DopamineMW = 153LogP = -1.0IC50 = 8.7 (D2)
LY-379268MW = 187LogP = -4.6EC50 = 8.6 (mGLU2)
NicotineMW = 162LogP = 1.2Ki = 9.0 (nACh)
MedetomidineMW = 200LogP = 3.8EC50 = 8.5 (α2)
HistamineMW = 111LogP = -0.7Ki = 8.2 (H3)
CGP-27492MW = 123LogP = -1.7IC50 = 8.6 (GABA-B)
L-670548MW = 179LogP = 0.77Ki = 9.7 (m1)
TacrineMW = 198LogP = 2.7IC50 = 8.2 (BChE)
192 unique structures 46 targets 252 activities ≤ 10 nMMW ≤ 200 amu176 are likely to be charged at pH 7.4
AcknowledgmentsAcknowledgments• Maria Mracec, Liliana Ostopovici, Ramona Rad, Alina
Bora, Ionela Olah, Marius Olah, Magdalena Banda (Timisoara Institute of Chemistry of the Romanian Academy) and TIO introduced data in WOMBAT
• Marius Olah wrote the database interfaces• Maria Mracec, Marius Olah and TIO did the keyword
characterization• Marius Olah, Maria Mracec, Cristian Bologa and TIO
performed structural error checking• Vera Povolna and David Weininger (Metaphorics) for
implementing WOMBAT@FEDORA
The contents of this talk are copyright © Tudor I. Oprea 2004. All rights reserved
http://www.eurohttp://www.euro--qsar2004.orgqsar2004.org
15th European Symposium on15th European Symposium onQuantitative StructureQuantitative Structure--ActivityActivity RelationshipsRelationshipsIstanbul / Turkey 05Istanbul / Turkey 05--10 September 200410 September 2004
EuroQSAR 2004
Chair:Chair: Prof. Dr. Prof. Dr. EsinEsin AKI ŞENERAKI ŞENERsener@pharmacy.ankara.edu.trsener@pharmacy.ankara.edu.trCoCo--Chair:Chair: Prof. Dr. Prof. Dr. İsmailİsmail YALÇINYALÇINyalcin@pharmacy.ankara.edu.tryalcin@pharmacy.ankara.edu.trAddress for Correspondence:Address for Correspondence:ArmoriaArmoria CongressCongressarmoria@euroarmoria@euro--qsar2004.orgqsar2004.org
http://www.euro-qsar2004.org/http://www.euro-qsar2004.org/
W O M B A TWOrld of Molecular BioAcTivityW O M B A TWOrld of Molecular BioAcTivityBioactivity DistributionBy Target TypeTarget Type DistributionBy ActivityWOMBAT HistoryWOMBAT 2004.1Activity Profile for WOMBAT 2004.1References Are Stored SeparatelyWOMBAT Quality ControlWOMBAT Quality Control…JMC Errors… 1JMC Errors… 2JMC Errors… 3JMC Errors… 4Other Errors… SciFinderOther Errors… Merck IndexF E D O R AWOMBAT@FEDORAWOMBAT PatternsIncreased MW does not warrant higher activityIncreased LogP does not warrant higher activityHow Small Can Active Compounds Be?Acknowledgments
/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown
/Description >>> setdistillerparams> setpagedevice