Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | shauna-benson |
View: | 214 times |
Download: | 2 times |
1
PharmID: A New Algorithm for Pharmacophore Identification
Stan YoungStan Young
Jun FengJun Feng and Ashish Sanil and Ashish Sanil
NISSNISS
MPDMMPDM
3 June 20053 June 2005
2
X-ray Structure
Protein surface
Bound drug
Zinc
H2O’sHidingAround
NoteZincion
3
Outline
BackgroundBackground
Computational Procedure and AlgorithmComputational Procedure and Algorithm
ExamplesExamples
ConclusionsConclusions
4
Conformation Generation
OMEGA® generates OMEGA® generates thousands of conformers thousands of conformers in a few seconds.in a few seconds.
It is able to reproduce It is able to reproduce bioactive conformations.bioactive conformations.
Boström, Greenwood, and Gottfries. J. Mol. Graph. Mod., 2003, 21, 449-462
5
Many feature combinations
Exhaustive enumeration of pharmacophore Exhaustive enumeration of pharmacophore hypotheseshypotheses
No. of Features No. of Features Possible combinationsPossible combinations
44 55
55 1616
66 4242
77 9999
6
Pharmacophore Identification
Active molecules are known, Active molecules are known, receptor unknownreceptor unknown.. AssumeAssume that all molecules bind in a common that all molecules bind in a common
manner to the biological target.manner to the biological target. Difficulties:Difficulties:
Conformational flexibilityConformational flexibility Many different combinations of Many different combinations of
pharmacophoric groups pharmacophoric groups
Two very large search spaces:Two very large search spaces: conformations and feature combinations.conformations and feature combinations.
7
Work Flow for Pharmacophore Identification
Single conformer SDF or SMILES
External Conformation GenerationProgram
PharmID
Different Pharmacophore Hypotheses
8
Our Strategy
To superimpose the molecules in 3D, we To superimpose the molecules in 3D, we first align the bit string for each conformer first align the bit string for each conformer in 1D.in 1D.
Ideally, the important features and best Ideally, the important features and best conformers will be picked out at the same conformers will be picked out at the same time. time.
Our search is a many to one, Our search is a many to one, not many to many!not many to many!
9
Computation Procedure
1.1. Pharmacophore Pharmacophore bit stringbit string generation generation
2.2. Bit stringBit string alignment/assessment alignment/assessment
3.3. Hypothesis generationHypothesis generation
4.4. RefinementRefinement
10
Feature Definition
Predefined pharmacophore features:Predefined pharmacophore features:HD : Hydrogen Bond DonorHD : Hydrogen Bond DonorHA : Hydrogen Bond AcceptorHA : Hydrogen Bond AcceptorPOS: Positive Charge CenterPOS: Positive Charge CenterNEG: Negative Charge CenterNEG: Negative Charge CenterARC: Aromatic CenterARC: Aromatic CenterHYP: Hydrophobic CenterHYP: Hydrophobic Center
User defined groups:User defined groups:Any Any functionalfunctional groups can be defined using groups can be defined using DaylightDaylight®® SMART strings. SMART strings.
11
Bit String Generation
1 0 1 0 1 ... 1 0 01 0 1 0 1 ... 1 0 0
....
..
Conf. 1
Conf. 2
Conf. 3
0 0 1 0 0 ... 1 0 00 0 1 0 0 ... 1 0 0
1 0 0 0 0 ... 1 0 01 0 0 0 0 ... 1 0 0
1 0 0 0 1 ... 1 0 01 0 0 0 1 ... 1 0 0
NH
N
3D Atom (group) – Distance – Atom (group) features.
F1………………Fm
12
Definition of Distance Bins
homogeneous non-overlapped. homogeneous non-overlapped. 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 11-12, 12 11-12, 12 Å Å and above.and above.
heterogeneous non-overlapped. heterogeneous non-overlapped. 1-2, 2-5, 5-8, 8-12, 13 1-2, 2-5, 5-8, 8-12, 13 Å and above.Å and above.
Overlapped. Overlapped. 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, 10-12 10-12 Å.Å.
13
Data Structure for Input
0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 00 0 11 0 1 ...0 1 ... 11 1 01 0 11 0 1 1 00 1 1 0 11 000 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 0......1 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 01 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 00 0 11 0 0 ...0 0 ... 11 0 00 0 11 0 0 0 00 0 0 0 11 000 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 0......
M1C1M1C1M1C2M1C2M1C3M1C3............M2C1M2C1M2C2M2C2M2C3M2C3............
14
The Trick
If you know the correct conformation for each molecule, then it is relatively easy to identify the key features.
If you know the correct features and distances, then it is easy to identify the correct conformation.
Guess one, predict the other, iterate.
15
Given the features, easy to find the conformations
M1C1M1C1M1C2M1C2M1C3M1C3............M2C1M2C1M2C2M2C2M2C3M2C3............
0 0 0 0 11 0 0 ... 0 0 ... 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 0 0
0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 0 0 11 0 1 ... 0 1 ... 11 1 0 1 0 11 0 1 1 0 0 1 1 0 11 0 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 0......1 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 01 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 11 0 0 ... 0 0 ... 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 0 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 0......
16
Given the conformations, easy to find the features.
M1C1M1C1M1C2M1C2M1C3M1C3............M2C1M2C1M2C2M2C2M2C3M2C3............
00110000..00001100..
0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 0 0 11 0 1 ... 0 1 ... 11 1 0 1 0 11 0 1 1 0 0 1 1 0 11 0 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 0......1 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 01 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 11 0 0 ... 0 0 ... 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 0 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 0......
17
Bioinformatics Motif Finding using Gibbs Sampling.
1.1. Remove one sequence. Remove one sequence.
2.2. Randomly select one position for each sequence. Randomly select one position for each sequence.
3.3. Calculate probabilities for all positions for the Calculate probabilities for all positions for the motif “window”.motif “window”.
4.4. Using the “window” compute probabilities for Using the “window” compute probabilities for removed sequence motif position. removed sequence motif position.
5.5. Repeat the above steps for all sequences until Repeat the above steps for all sequences until converged.converged.
This will be easier to see with pictures.
18
Objective Function
W
i
J
j j
jiji p
qcF
1 1
,, log
• W : bit string length
• ci,j : count of residue j in position i
• qi,j : residue frequencies, position i, residue j
• pj : residue background frequencies
• J: residue types, 20 for protein, 4 for DNA, RNA
W x 20
Window
19
Alignment Algorithm
Mostly used in sequence alignmentMostly used in sequence alignmentto find the common motif.to find the common motif. TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCTGCCTCAGGATCCAGCACACATTATCACAAACTTAGTGTCCACATTATCACAAACTTAGTGTCCATCCATCACTGCTGACCCT…………..…………..
Fast and sensitive, less likely to fall into local Fast and sensitive, less likely to fall into local minimum.minimum.
Lawrence, et al. (1993) Science, 262, 208-214
W x 20
20
PharmID Algorithm using Gibbs.
1.1. Remove one compound. Remove one compound.
2.2. Start with a random conformer for other Start with a random conformer for other compounds.compounds.
3.3. Calculate probabilities for feature importance.Calculate probabilities for feature importance.
4.4. Compute conformation probabilities for omitted Compute conformation probabilities for omitted compound. compound.
5.5. Repeat steps 1-4 until converges.Repeat steps 1-4 until converges.
Again, pictures will make this clear.
21
Gibbs Sampling: FingerprintsMovement
Conf_1 Conf_2 Conf_3 possible
Mol_1 010000000 100010001 010100100 0, 9, 18
Mol_2 000100000 010000100 100010001 0, 9, 18
Mol_3 101010001 000100100 0, 9,
Mol_4 001000100 100110001 010101001 0, 9, 18
1_2 010000000 100010001 010100100
2_3 000100000 010000100 100010001
3_1 101010001 000100100
4_2 001000100 100110001 010101001
22
Bit String Alignment
Only 2 residue types (0, 1)Only 2 residue types (0, 1)
Rigid molecules that have only 1 or a few Rigid molecules that have only 1 or a few conformers can speed up the alignment and conformers can speed up the alignment and help to determine the best set of features.help to determine the best set of features.
23
Hypothesis Generation Why? Why?
Features may not be part of the same pharmacophore.Features may not be part of the same pharmacophore.
How?How?
Clique Detection. (Bron-Kerbosch Algorithm)Clique Detection. (Bron-Kerbosch Algorithm)A clique is a set of ALL connected points. A clique is a set of ALL connected points.
24
Hypothesis Generation in Selected Conformers : Clique Detection
N
O
OH
Pharmacophore FeaturesTwo point Pharmacophores identifiedby Gibbs Sampling
A pharmacophore hypotheses should be an all-connected graph
Discarded two point pharmacophores
25
Hypothesis Generation: Output
Pharmacophore 1Pharmacophore 1Members: 1 2 3 5 …(Mol. ID)Members: 1 2 3 5 …(Mol. ID)Features: Hydrogen Bond Donor, Hydrogen Bond Features: Hydrogen Bond Donor, Hydrogen Bond Acceptor, …Acceptor, …
Pharmcophore 2Pharmcophore 2Members: 4 6 8 … Members: 4 6 8 … Features: …Features: …
…… ……
26
Refinement
For all molecules For all conformers
For all hypotheses generated Test each qualified conformer against each
hypothesis End ForIf new hypothesis found
Insert the new hypothesis into the listEnd For
End For
27
Benchmarking: Test Datasets
1. Bit string alignment1. Bit string alignment
20 20-bit strings20 20-bit strings
2. Single binding mode2. Single binding mode
Angiotensin-Converting Enzyme (ACE) inhibitorsAngiotensin-Converting Enzyme (ACE) inhibitors
3. 3. Multiple binding modesMultiple binding modes/mechanisms/mechanisms
Dopamine receptor inhibitors (D2/D4)Dopamine receptor inhibitors (D2/D4)
28
Example 1: A Toy Dataset (Gibbs Sampling Only)
20 x 20 bit strings,
mimic 20 molecules,
each with 20 conformers.
Each bit string is 20 bits long.
Computation time:
<1 sec.
Result:
1_14 10001000010000000100
2_14 10001000010000000100
3_15 10001000010000000100
4_12 10001000010000000000
5_15 10001000010010000100
6_7 10001000010000000000
7_8 10001000010000000000
8_19 10001000010000000100 …
29
Example 2: ACE Inhibitors
78 active compounds.78 active compounds.
OMEGAOMEGA® From OpenEye® is used to ® From OpenEye® is used to generate multiple conformers.generate multiple conformers.
Two RMSD cutoffs used:Two RMSD cutoffs used:2.0 Å : 4,613 conformers generated. 2.0 Å : 4,613 conformers generated. 1.0 Å : 1.0 Å : 46,26846,268 conformers generated. conformers generated.
30
ACE inhibitors Results
Using 4,613 conformers,Using 4,613 conformers,55/78 molecules contain expected 55/78 molecules contain expected pharmacophore.pharmacophore.
Using Using 46,26846,268 conformers, conformers,65/7865/78 molecules contain expected molecules contain expected pharmacophore.pharmacophore.
31
Example 2: ACE inhibitors:Best Identified Pharmacophore
O
Zn Binding SiteO
O
2.84 ~ 4.50 Å4.51 ~ 5.70 Å
4.99 ~ 6.77
32
Example 2: ACE inhibitorsOther possible pharmacophore
33
Example 3: Testing on Multiple Binding Modes (D2, D4 ligands)
34
Example 3: Dopamine antagonists
Two pharmacophores were extracted from one data set!
35
Conclusion
Traditional Methods:Traditional Methods:Exhaustive enumeration of pharmacophores, Exhaustive enumeration of pharmacophores, limited limited coverage of conformational spacecoverage of conformational space..
““Many to many” limits search.Many to many” limits search.
PharmID:PharmID:Selective enumeration of pharmacophores, Selective enumeration of pharmacophores, better coverage of conformational space.better coverage of conformational space.
Each search is “many to one”.Each search is “many to one”.
36
Acknowledgements
CoworkersCoworkersStan Young, Jun Feng, Ashish SanilStan Young, Jun Feng, Ashish Sanil
OMEGA is a product from OpenEye OMEGA is a product from OpenEye Scientific Software Inc.Scientific Software Inc.
Support from Hereditary Disease Support from Hereditary Disease Foundation.Foundation.
Become a NISS affiliate!