Date post: | 24-Dec-2015 |
Category: |
Documents |
Upload: | cathleen-morrison |
View: | 214 times |
Download: | 0 times |
Private to Public Domain Transfer
• Five year strategic award from Wellcome Trust
• Large-scale Drug Discovery Structure Activity Relationship (SAR) data
• Linking small molecule structures to ‘targets’ and pharmacological activities – Chemogenomics/Chemical Biology
• ‘Open Access’, ‘User Friendly’, ‘Translational’, ‘Free’
• Multiple access mechanisms
• Full database download, web front-ends, web services
• Actively support ad hoc sabbaticals (academic and commercial) at EMBL-EBI
ChEMBL Research Strategy
• Comprehensively catalogue historical drug discovery
• Include successes and failures
• Drugs can be small molecules, recombinant proteins, siRNA, etc.
• Derive rules for drug discovery ‘success’ from these data
• Target selection and prioritisation
• Lead discovery, optimisation, candidate selection
Target Discovery
Lead Discovery
Lead Optimisatio
n
Preclinical Development
Phase 1
Phase 2
Phase 3
Launch
Drug Discovery Process (simplified)
>450,000 distinct compounds~25,000 distinct lead series
~12,000 candidates~1,300drugs
•Target identification•Microarray
profiling•Target
validation•Assay
development•Biochemistry
•Clinical/Animaldisease models
•High-throughput
Screening (HTS)•Fragment-
basedscreening•Focused libraries
•Screening collection
•Medicinal Chemistry•Structure-
baseddrug design•Selectivity
screens•ADMET screens•Cellular/Animaldisease models•Pharmacokineti
cs
•Toxicology•In vivo safety pharmacology•Formulation
•Dose prediction
PKtolerabili
ty
Efficacy
Safety&
Efficacy
IndicationDiscovery & expansion
Med. Chem. SAR Clinical Candidates
Drugs
Discovery Development Use
Clinical Trials
ChEMBL: Launched Drugs• Database of all approved drugs
• Chemistry and sequence ‘aware’
• Contents• Small molecules and biological therapeutics
• USANs, INNs, research codes, other synonyms
• Pharmaceutical properties, prodrugs, dosage, form, etc
• PK data and metabolites, black box warnings, etc.
• 1,378 chemically distinct ‘drugs’, 324 distinct molecular targets
• Controlled vocabulary indications dictionary and hierarchy
New Drugs 2006-2009
Enzyme
mAb
Peptide
Other
Protein
Natural
Product
Synthetic small
molecule
ChEMBL: Launched DrugsNat. Rev. Drug Disc., 5, pp. 993-996 (2006)
ChEMBL: Drug Dosage
Binned log10 mole dose-8.4 -8.08 -7.76 -7.44 -7.12 -6.8 -6.48 -6.16 -5.84 -5.52 -5.2 -4.88 -4.56 -4.24 -3.92 -3.6 -3.28 -2.96 -2.64 -2.32
0
10
20
30
40
50
60
70
80
mmolmolnmol
~150-200mol
Steroids, thyroids
Metformin,Hydroxyurea
Affinity Of Drugs For Their Targets• Retrieved Ki, Kd, IC50, EC50, pA2, … endpoints for
drugs against their ‘efficacy targets’
2 3 4 5 6 7 8 9 10 11 120
50
100
150
200
250
300
350
400
Fre
quen
cy
-log10 affinity
10mM 1mM 100M 10M 1M 100nM 10nM 1nM 100pM 10pM 1pM
Function for Drug Efficacy/Affinity
• Empirical function that estimates the probability of in vivo activity for a compound with acceptable PK characteristics as a function of target affinity
0 2 4 6 8 10 120.0
0.2
0.4
0.6
0.8
1.0
P(e
ffic
acy)
-log10 Affinity
mM M nM pM
ChEMBL: Clinical Candidates
• Database of clinical development candidates• Contains ~10,000 2-D structures
• Estimated size ~35-45,000 compounds
• Work in progress• Deeper coverage of key gene families
• e.g. Protein kinases, 184 distinct clinical candidates
0
10
20
30
40
50
60
70
80
90
Launched III II I
VEGFR
PDGFR
p38a
C-Kit
CDKErbBAurora
Kinase clinical candidatesby highest phase
Clinical candidates by target
Industry Productivity
File Registration number vs USAN date
0
100000
200000
300000
400000
500000
600000
700000
800000
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
Industry Productivity
0
10
20
30
40
50
60
70
1-
100,000
100,001-
200,000
200,001-
300,000
300,001-
400,000
400,001-
500,000
500,001-
600,000
600,001-
700,000
700,001,
800,000
File registration number range
64 USANs/100,000 compounds
1.9 USANs/100,000 compounds
16 Drugs/100,000 compounds
0.4 Drugs/100,000 compounds
USAN assignment typically at entry to phase 3
ChEMBL: SAR data• Bioactive compounds
• Link through to validated synthetic routes and assay protocols
• Bidirectionally linking compounds to/from targets
• Built from 12 primary journals•J.Med.Chem. Biorg.Med.Chem., PNAS, JBC, Bioorg.Med.Chem.Letts., Eur.J.Med.Chem., DMD, Xenobioitica, Nature, Science, AACR, J.Nat.Prod.
• StARlite 1 – June 2001
• StARlite 31 – August 2008
StARLITe
Bioactivity
StARLITe
Bioactivity
CompoundCompoundT
arg
etT
arg
et
Ki=4.5 nM
N
N
N
N
N
ON
O
N
O
H
H
H
H
H
>Thrombin (Homo sapiens) MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
>Thrombin (Homo sapiens) MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
4th generation3rd generation2nd generation1st generationPrototype
N
O
N
O
O
H
NN
N
Cl Cl
NN
N
O
N
O
N
O
O
H
NN
N
Cl Cl
N
O
N
O
O
O
H
N
N
Cl Cl
Drug Optimisation
N
N
N+
O
O
Azomycin
(1956)
Streptomyces
natural product
trichomonacidal
‘toxic’
Metronidazole 1962
N
N
N+
O
O
O
N
N
Cl
N
N
Cl
Cl
O
Cl
Cl
N
N
Cl
Cl
O
Cl
Clotrimazole 1970
Miconazole 1970
Econazole 1972
N
N
Cl
Cl
S
Cl
N
N
N+
O
O
SO O
N
N
Tinidazole 1970
Bifonazole 1981
Sulconazole 1980
Ketoconazole 1978 Itraconazole 1984
Terconazole 1980
Voriconazole 2002
N N
F
F
OH
N
N
N
F
Fluconazole 1988
OH
N
N
N
N
NN
F
F
Fosfluconazole 2004
O
O
N
N
NN
N
F F
NN
N
O
OH
Posaconazole 2005
triazoleImidazole
O
N
N
N
N
NN
F
F
PO
OHOH
N
N
N
NN
After W. Sneader
ChEMBL SAR Contents• Abstracted from 26,299 papers from 12
journals• Monthly update cycle - optimised curation pipeline
• Autocuration tools – clean up and index other large SAR datasets
• Updates and ongoing curation process all data, not simply new article data
• 521,237 compound records• 440,055 distinct compound structures
• 5,439 targets• 3,512 protein molecular targets• ~2,200 orthologous targets (1,644 human)
• 1,936,969 million experimental bioactivitiesCounts refer to StARlite release 31
Interface and Searching
Interface and Searching
Interface and Searching
Interface and Searching
Interface and Searching
Interface and Searching
Interface and Searching
NH
N
N NOH O
Rule-based Optimisation – Bioisosteres• Identify data-driven ‘rational’ lead-optimisation strategies
• Useful in automated design• e.g. Replacement of carboxylic acid
• Reflect synthetic ease and expectation for functional effect
IC50
Search StARLITe for
functional group
Search for all ‘contexts’ where
acid has been replacedStARLITeStARLITe
OH O
Retrieve assay value
N
N
NN
A
O
S
O
ANH2
O
S
O
OH A
O
A
O
tetrazole
sulphonamide
ester
sulphonic acid
Effect on affinity (-log10 IC50)
Fre
quen
cy (
%)
0 42 6-2-4-6
10
40
20
60
50
30
Typical Compound Collection - Novartis
Ertl, Koch and Roggo, Novartis
N
N
O
O
NN
N
N
N
N
N
O
N
O
N
N
S
N
N
N
S
N
N
O
O
N
N
N
N
O
N
NN
NN
N
N
N
N
N
N N
N
N
N
N
N
S
N
O
N
benzene pyridine piperidine piperazine cyclohexane pyrimidine indole
imidazole naphthalene morpholine thiophene pyrazole pyrrolidine thiazole
furan quinoline cyclopropane benzimidazole imidazoline pyrrole cyclopentane
pyran quinazoline benzthiazole benzodioxole isoxazole purine tetrahydrofuran
tetrazole triazine isoquinolinetetrahydroisoquinolinebenzofurantriazole adamantane
Screening File Comparison - Novartis
NStARLITe rank
No
vart
is r
ank
Enriched fragments
Depleted fragments
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35
benzene
pyridine
pyrrolidine
tetrahydrofuran
purine
tetrazole
pyrimidine
morpholine
pyrazole
N
N
O
N
NN
N
N
N
N
N
N
N
NN
N
O
piperidine
Genome-Scale Druggability Assessment
• Now possible to rapidly map chemical intervention points onto genomic data
• In ‘real time’ as gene model is developed• Develop therapeutic hypotheses for expert review/analysis/validation
• Reuse existing drugs/clinical candidates in new contexts• Anticipate required optimisation (comparative modelling, etc)
Nature 460, 352-358 (2009) Nat. Rev. Drug. Disc., 8, pp. 900-907 (2008)
Indication Discovery
Marks et al., Lancet, 367, pp. 668-678 (2006)
• Map chemical biology/pharmacology data onto microarray datasets• Rapid path to clinic and patient benefit
• Develop therapeutic hypotheses for expert review/analysis/validation• Reuse existing drugs/clinical candidates in new contexts
Marks et al., Lancet, 367, pp. 668-678 (2006)
The ChEMBL-og - www.chemblog.org