1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil...

1

PharmID: A New Algorithm for Pharmacophore Identification

Stan YoungStan Young

Jun FengJun Feng and Ashish Sanil and Ashish Sanil

NISSNISS

MPDMMPDM

3 June 20053 June 2005

2

X-ray Structure

Protein surface

Bound drug

Zinc

H2O’sHidingAround

NoteZincion

3

Outline

BackgroundBackground

Computational Procedure and AlgorithmComputational Procedure and Algorithm

ExamplesExamples

ConclusionsConclusions

4

Conformation Generation

OMEGA® generates OMEGA® generates thousands of conformers thousands of conformers in a few seconds.in a few seconds.

It is able to reproduce It is able to reproduce bioactive conformations.bioactive conformations.

Boström, Greenwood, and Gottfries. J. Mol. Graph. Mod., 2003, 21, 449-462

5

Many feature combinations

Exhaustive enumeration of pharmacophore Exhaustive enumeration of pharmacophore hypotheseshypotheses

No. of Features No. of Features Possible combinationsPossible combinations

44 55

55 1616

66 4242

77 9999

6

Pharmacophore Identification

Active molecules are known, Active molecules are known, receptor unknownreceptor unknown.. AssumeAssume that all molecules bind in a common that all molecules bind in a common

manner to the biological target.manner to the biological target. Difficulties:Difficulties:

Conformational flexibilityConformational flexibility Many different combinations of Many different combinations of

pharmacophoric groups pharmacophoric groups

Two very large search spaces:Two very large search spaces: conformations and feature combinations.conformations and feature combinations.

7

Work Flow for Pharmacophore Identification

Single conformer SDF or SMILES

External Conformation GenerationProgram

PharmID

Different Pharmacophore Hypotheses

8

Our Strategy

To superimpose the molecules in 3D, we To superimpose the molecules in 3D, we first align the bit string for each conformer first align the bit string for each conformer in 1D.in 1D.

Ideally, the important features and best Ideally, the important features and best conformers will be picked out at the same conformers will be picked out at the same time. time.

Our search is a many to one, Our search is a many to one, not many to many!not many to many!

9

Computation Procedure

1.1. Pharmacophore Pharmacophore bit stringbit string generation generation

2.2. Bit stringBit string alignment/assessment alignment/assessment

3.3. Hypothesis generationHypothesis generation

4.4. RefinementRefinement

10

Feature Definition

Predefined pharmacophore features:Predefined pharmacophore features:HD : Hydrogen Bond DonorHD : Hydrogen Bond DonorHA : Hydrogen Bond AcceptorHA : Hydrogen Bond AcceptorPOS: Positive Charge CenterPOS: Positive Charge CenterNEG: Negative Charge CenterNEG: Negative Charge CenterARC: Aromatic CenterARC: Aromatic CenterHYP: Hydrophobic CenterHYP: Hydrophobic Center

User defined groups:User defined groups:Any Any functionalfunctional groups can be defined using groups can be defined using DaylightDaylight®® SMART strings. SMART strings.

11

Bit String Generation

1 0 1 0 1 ... 1 0 01 0 1 0 1 ... 1 0 0

....

..

Conf. 1

Conf. 2

Conf. 3

0 0 1 0 0 ... 1 0 00 0 1 0 0 ... 1 0 0

1 0 0 0 0 ... 1 0 01 0 0 0 0 ... 1 0 0

1 0 0 0 1 ... 1 0 01 0 0 0 1 ... 1 0 0

NH

N

3D Atom (group) – Distance – Atom (group) features.

F1………………Fm

12

Definition of Distance Bins

homogeneous non-overlapped. homogeneous non-overlapped. 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 0-1, 1-2, 2-3, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 11-12, 12 11-12, 12 Å Å and above.and above.

heterogeneous non-overlapped. heterogeneous non-overlapped. 1-2, 2-5, 5-8, 8-12, 13 1-2, 2-5, 5-8, 8-12, 13 Å and above.Å and above.

Overlapped. Overlapped. 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, 1-3, 2-4, 3-5, 4-6, 5-7, 6-8, 7-9, 8-10, 9-11, 10-12 10-12 Å.Å.

13

Data Structure for Input

0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 00 0 11 0 1 ...0 1 ... 11 1 01 0 11 0 1 1 00 1 1 0 11 000 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 0......1 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 01 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 00 0 11 0 0 ...0 0 ... 11 0 00 0 11 0 0 0 00 0 0 0 11 000 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 0......

M1C1M1C1M1C2M1C2M1C3M1C3............M2C1M2C1M2C2M2C2M2C3M2C3............

14

The Trick

If you know the correct conformation for each molecule, then it is relatively easy to identify the key features.

If you know the correct features and distances, then it is easy to identify the correct conformation.

Guess one, predict the other, iterate.

15

Given the features, easy to find the conformations


0 0 0 0 11 0 0 ... 0 0 ... 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 0 0

0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 0 0 11 0 1 ... 0 1 ... 11 1 0 1 0 11 0 1 1 0 0 1 1 0 11 0 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 0......1 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 01 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 11 0 0 ... 0 0 ... 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 0 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 0......

16

Given the conformations, easy to find the features.


00110000..00001100..

0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 00 0 0 0 11 0 1 ... 0 1 ... 11 1 0 1 0 11 0 1 1 0 0 1 1 0 11 0 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 00 1 0 1 0 ... 1 0 0 0 1 1 0 0 1 0......1 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 01 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 0 ... 1 0 0 0 0 0 1 0 1 00 0 0 0 11 0 0 ... 0 0 ... 11 0 0 0 0 11 0 0 0 0 0 0 0 0 11 0 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 00 0 0 0 0 ... 0 1 0 0 0 0 1 0 1 0......

17

Bioinformatics Motif Finding using Gibbs Sampling.

1.1. Remove one sequence. Remove one sequence.

2.2. Randomly select one position for each sequence. Randomly select one position for each sequence.

3.3. Calculate probabilities for all positions for the Calculate probabilities for all positions for the motif “window”.motif “window”.

4.4. Using the “window” compute probabilities for Using the “window” compute probabilities for removed sequence motif position. removed sequence motif position.

5.5. Repeat the above steps for all sequences until Repeat the above steps for all sequences until converged.converged.

This will be easier to see with pictures.

18

Objective Function

W

i

J

j j

jiji p

qcF

1 1

,, log

• W : bit string length

• ci,j : count of residue j in position i

• qi,j : residue frequencies, position i, residue j

• pj : residue background frequencies

• J: residue types, 20 for protein, 4 for DNA, RNA

W x 20

Window

19

Alignment Algorithm

Mostly used in sequence alignmentMostly used in sequence alignmentto find the common motif.to find the common motif. TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCTGCCTCAGGATCCAGCACACATTATCACAAACTTAGTGTCCACATTATCACAAACTTAGTGTCCATCCATCACTGCTGACCCT…………..…………..

Fast and sensitive, less likely to fall into local Fast and sensitive, less likely to fall into local minimum.minimum.

Lawrence, et al. (1993) Science, 262, 208-214

W x 20

20

PharmID Algorithm using Gibbs.

1.1. Remove one compound. Remove one compound.

2.2. Start with a random conformer for other Start with a random conformer for other compounds.compounds.

3.3. Calculate probabilities for feature importance.Calculate probabilities for feature importance.

4.4. Compute conformation probabilities for omitted Compute conformation probabilities for omitted compound. compound.

5.5. Repeat steps 1-4 until converges.Repeat steps 1-4 until converges.

Again, pictures will make this clear.

21

Gibbs Sampling: FingerprintsMovement

Conf_1 Conf_2 Conf_3 possible

Mol_1 010000000 100010001 010100100 0, 9, 18

Mol_2 000100000 010000100 100010001 0, 9, 18

Mol_3 101010001 000100100 0, 9,

Mol_4 001000100 100110001 010101001 0, 9, 18

1_2 010000000 100010001 010100100

2_3 000100000 010000100 100010001

3_1 101010001 000100100

4_2 001000100 100110001 010101001

22

Bit String Alignment

Only 2 residue types (0, 1)Only 2 residue types (0, 1)

Rigid molecules that have only 1 or a few Rigid molecules that have only 1 or a few conformers can speed up the alignment and conformers can speed up the alignment and help to determine the best set of features.help to determine the best set of features.

23

Hypothesis Generation Why? Why?

Features may not be part of the same pharmacophore.Features may not be part of the same pharmacophore.

How?How?

Clique Detection. (Bron-Kerbosch Algorithm)Clique Detection. (Bron-Kerbosch Algorithm)A clique is a set of ALL connected points. A clique is a set of ALL connected points.

24

Hypothesis Generation in Selected Conformers : Clique Detection

N

O

OH

Pharmacophore FeaturesTwo point Pharmacophores identifiedby Gibbs Sampling

A pharmacophore hypotheses should be an all-connected graph

Discarded two point pharmacophores

25

Hypothesis Generation: Output

Pharmacophore 1Pharmacophore 1Members: 1 2 3 5 …(Mol. ID)Members: 1 2 3 5 …(Mol. ID)Features: Hydrogen Bond Donor, Hydrogen Bond Features: Hydrogen Bond Donor, Hydrogen Bond Acceptor, …Acceptor, …

Pharmcophore 2Pharmcophore 2Members: 4 6 8 … Members: 4 6 8 … Features: …Features: …

…… ……

26

Refinement

For all molecules For all conformers

For all hypotheses generated Test each qualified conformer against each

hypothesis End ForIf new hypothesis found

Insert the new hypothesis into the listEnd For

End For

27

Benchmarking: Test Datasets

1. Bit string alignment1. Bit string alignment

20 20-bit strings20 20-bit strings

2. Single binding mode2. Single binding mode

Angiotensin-Converting Enzyme (ACE) inhibitorsAngiotensin-Converting Enzyme (ACE) inhibitors

3. 3. Multiple binding modesMultiple binding modes/mechanisms/mechanisms

Dopamine receptor inhibitors (D2/D4)Dopamine receptor inhibitors (D2/D4)

28

Example 1: A Toy Dataset (Gibbs Sampling Only)

20 x 20 bit strings,

mimic 20 molecules,

each with 20 conformers.

Each bit string is 20 bits long.

Computation time:

<1 sec.

Result:

1_14 10001000010000000100

2_14 10001000010000000100

3_15 10001000010000000100

4_12 10001000010000000000

5_15 10001000010010000100

6_7 10001000010000000000

7_8 10001000010000000000

8_19 10001000010000000100 …

29

Example 2: ACE Inhibitors

78 active compounds.78 active compounds.

OMEGAOMEGA® From OpenEye® is used to ® From OpenEye® is used to generate multiple conformers.generate multiple conformers.

Two RMSD cutoffs used:Two RMSD cutoffs used:2.0 Å : 4,613 conformers generated. 2.0 Å : 4,613 conformers generated. 1.0 Å : 1.0 Å : 46,26846,268 conformers generated. conformers generated.

30

ACE inhibitors Results

Using 4,613 conformers,Using 4,613 conformers,55/78 molecules contain expected 55/78 molecules contain expected pharmacophore.pharmacophore.

Using Using 46,26846,268 conformers, conformers,65/7865/78 molecules contain expected molecules contain expected pharmacophore.pharmacophore.

31

Example 2: ACE inhibitors:Best Identified Pharmacophore

O

Zn Binding SiteO

O

2.84 ~ 4.50 Å4.51 ~ 5.70 Å

4.99 ~ 6.77

32

Example 2: ACE inhibitorsOther possible pharmacophore

33

Example 3: Testing on Multiple Binding Modes (D2, D4 ligands)

34

Example 3: Dopamine antagonists

Two pharmacophores were extracted from one data set!

35

Conclusion

Traditional Methods:Traditional Methods:Exhaustive enumeration of pharmacophores, Exhaustive enumeration of pharmacophores, limited limited coverage of conformational spacecoverage of conformational space..

““Many to many” limits search.Many to many” limits search.

PharmID:PharmID:Selective enumeration of pharmacophores, Selective enumeration of pharmacophores, better coverage of conformational space.better coverage of conformational space.

Each search is “many to one”.Each search is “many to one”.

36

Acknowledgements

CoworkersCoworkersStan Young, Jun Feng, Ashish SanilStan Young, Jun Feng, Ashish Sanil

OMEGA is a product from OpenEye OMEGA is a product from OpenEye Scientific Software Inc.Scientific Software Inc.

Support from Hereditary Disease Support from Hereditary Disease Foundation.Foundation.

Become a NISS affiliate!

Date post:	18-Dec-2015
Category:	Documents
Upload:	shauna-benson
View:	214 times
Download:	2 times

1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil...

Documents