+ All Categories
Home > Documents > In silico prediction of solubility: Solid progress but no solution?

In silico prediction of solubility: Solid progress but no solution?

Date post: 02-Jan-2016
Category:
Upload: lydia-gallagher
View: 28 times
Download: 0 times
Share this document with a friend
Description:
In silico prediction of solubility: Solid progress but no solution?. Dr John Mitchell University of St Andrews. Given accurately measured solubilities of 100 molecules, can you predict the solubilities of 32 similar ones?. - PowerPoint PPT Presentation
Popular Tags:
69
In silico prediction of solubility: Solid progress but no solution? Dr John Mitchell University of St Andrews
Transcript
Page 1: In silico  prediction of solubility:  Solid progress but no solution?

In silico prediction of solubility:

Solid progress but no solution?

Dr John MitchellUniversity of St Andrews

Page 2: In silico  prediction of solubility:  Solid progress but no solution?
Page 3: In silico  prediction of solubility:  Solid progress but no solution?

Given accurately measured solubilities of 100 molecules, can you predict the solubilities of 32 similar ones?

Page 4: In silico  prediction of solubility:  Solid progress but no solution?

For this study Toni Llinàs measured 132 solubilities using the CheqSol method.

He used a Sirius glpKa instrument

Page 5: In silico  prediction of solubility:  Solid progress but no solution?

K0

Solid

Solution

AH

AHS

][

][K 00

Ka

0a

]][[

][

]][[K

S

AH

AH

AH

Solution

Intrinsic solubility- Of an ionisable compound is the thermodynamic solubility of the free acid or base form (Horter, D, Dressman, J. B., Adv. Drug Deliv. Rev., 1997, 25, 3-14)

A Na A- ……….Na+ AHAH

][1

][][][S 0

00t H

KS

H

SKSAAH aa

Solubility Versus pH Profile

logS

pH (Concentration scale)

-5

-4

-3

-2

-1

2 3 4 5 6 7 8 9 S0 is essentially the solubility of the neutral form only.

Page 6: In silico  prediction of solubility:  Solid progress but no solution?

DiclofenacDiclofenac

NH

O

Cl Cl

ONa NH

OCl Cl

O

Na+

NH

OCl Cl

OH

NH

OCl Cl

OH

In SolutionPowder

● We continue “Chasing equilibrium” until a specified number of crossing points have been reached ● A crossing point represents the moment when the solution switches from a saturated solution to a subsaturated solution; no change in pH, gradient zero, no re-dissolving nor precipitating….

SOLUTION IS IN EQUILIBRIUM

Random error less than 0.05 log units !!!!

dpH/dt Versus Time

dpH

/dt

Time (minutes)

-0.008

-0.004

0.000

0.004

0.008

20 25 30 35 40 45

Supersaturated Solution

Subsaturated Solution8 Intrinsic solubility values8 Intrinsic solubility values

* A. Llinàs, J. C. Burley, K. J. Box, R. C. Glen and J. M. Goodman. Diclofenac solubility: independent determination of the intrinsic solubility of three crystal forms. J. Med. Chem. 2007, 50(5), 979-983

● First precipitation – Kinetic Solubility (Not in Equilibrium)● Thermodynamic Solubility through “Chasing Equilibrium”- Intrinsic Solubility (In Equilibrium)

Supersaturation Factor SSF = Skin – S0

“CheqSol”

Page 7: In silico  prediction of solubility:  Solid progress but no solution?

Caveat: the official results are used in the following slides, but most of the interpretation is my own.

Page 8: In silico  prediction of solubility:  Solid progress but no solution?

A prediction was considered correct if it was within 0.5 log units

Page 9: In silico  prediction of solubility:  Solid progress but no solution?

Not a very generous margin of error!

Page 10: In silico  prediction of solubility:  Solid progress but no solution?

Solubility Challenge Results

0

2

4

6

8

10

12

14

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Number of Correct Predictions out of 32 Molecules per Entry

Nu

mb

er o

f E

ntr

ies

A “null prediction” based on predicting everything to have the mean training set solubility would have got 9/32 correct

Page 11: In silico  prediction of solubility:  Solid progress but no solution?

Correlation Achieved by Entrants

0

5

10

15

20

25

30

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

R squared

Nu

mb

er o

f E

ntr

ies

Using an R2 threshold of 0.500, only 18/99 entries were good

Page 12: In silico  prediction of solubility:  Solid progress but no solution?

Number of Correct Predictions vs R squared

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

4 6 8 10 12 14 16 18 20 22

Number of Correct Predictions

R s

qu

ared

Page 13: In silico  prediction of solubility:  Solid progress but no solution?

Number of Correct Predictions vs R squared

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

4 6 8 10 12 14 16 18 20 22

Number of Correct Predictions

R s

qu

ared

GOOD

BAD

Page 14: In silico  prediction of solubility:  Solid progress but no solution?

Number of Correct Predictions vs R squared

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

4 6 8 10 12 14 16 18 20 22

Number of Correct Predictions

R s

qu

ared

3 “WINNERS”

3 Pareto optimal entries which I think of as “winners”. These combine best R2 with most correct predictions.

Page 15: In silico  prediction of solubility:  Solid progress but no solution?

Some molecules proved much harder to predict than others – the most insoluble were amongst the most difficult.

Page 16: In silico  prediction of solubility:  Solid progress but no solution?

• My opinion is that the overall standard was rather poor;

• It’s obvious that some entries were much better than others;

• But entries were anonymous;

• So we can’t judge between either specific researchers or between their methods;

• We can only rely on the “official” summary …

Conclusions from Solubility Challenge

Page 17: In silico  prediction of solubility:  Solid progress but no solution?

• We can only rely on the “official” summary …

… “a variety of methods and combinations of methods all perform about equally well.”

Conclusions from Solubility Challenge

Page 18: In silico  prediction of solubility:  Solid progress but no solution?

How should we approach the

prediction/estimation/calculation

of the aqueous solubility of

druglike molecules?

Two (apparently) fundamentally different approaches: theoretical chemistry & informatics.

Page 19: In silico  prediction of solubility:  Solid progress but no solution?

What Error is Acceptable?

• For typically diverse sets of druglike molecules, a “good” QSPR will have an RMSE ≈ 0.7 logS units.

• A RMSE > 1.0 logS unit is probably unacceptable.

• This corresponds to an error range of 4.0 to 5.7 kJ/mol in Gsol.

Page 20: In silico  prediction of solubility:  Solid progress but no solution?

What Error is Acceptable?

• A useless model would have an RMSE close to the SD of the test set logS values: ~ 1.4 logS units;

• The best possible model would have an RMSE close to the SD resulting from the experimental error in the underlying data: ~ 0.5 logS units?

Page 22: In silico  prediction of solubility:  Solid progress but no solution?

Theoretical Chemistry

“The problem is difficult, but by making suitable approximations we can solve it at reasonable cost based on our understanding of physics and chemistry”

Page 23: In silico  prediction of solubility:  Solid progress but no solution?

Theoretical Chemistry

• Calculations and simulations based on real physics.

• Calculations are either quantum mechanical or use parameters derived from quantum mechanics.

• Attempt to model or simulate reality.

• Usually Low Throughput.

Page 24: In silico  prediction of solubility:  Solid progress but no solution?

Drug Disc.Today, 10 (4), 289 (2005)

Page 25: In silico  prediction of solubility:  Solid progress but no solution?

Thermodynamic Cycle

Page 26: In silico  prediction of solubility:  Solid progress but no solution?

Thermodynamic Cycle

Crystal

Gas

Solution

Page 27: In silico  prediction of solubility:  Solid progress but no solution?

Sublimation Free Energy

Crystal

Gas

Page 28: In silico  prediction of solubility:  Solid progress but no solution?

Sublimation Free Energy

Crystal

Gas

Page 29: In silico  prediction of solubility:  Solid progress but no solution?

Sublimation Free Energy

Crystal

Gas

Page 30: In silico  prediction of solubility:  Solid progress but no solution?

Sublimation Free Energy

Crystal

Gas

Calculating Gsub is a standard procedure in crystal structure prediction

Page 31: In silico  prediction of solubility:  Solid progress but no solution?

Crystal Structure Prediction

• Given the structural diagram of an organic molecule, predict the 3D crystal structure.

S NBr

OO

Slide after SL Price, Int. Sch. Crystallography, Erice, 2004

Page 32: In silico  prediction of solubility:  Solid progress but no solution?

CSP Methodology

• Based around minimising lattice energy of trial structures.

• Enthalpy comes from lattice energy and intramolecular energy (DFT), which need to be well calibrated against each other: trade-off of lattice vs conformational energy.

• Entropy comes from phonon modes (crystal vibrations); can get Free Energy.

Page 33: In silico  prediction of solubility:  Solid progress but no solution?

CSP Methodology

• DFT calculation on monomer to obtain DMA electrostatics.

• Generate many plausible crystal structures using different space groups.

• Minimise lattice energy using DMA + repulsion-dispersion potential.

• Many structures may have similar energies.

Page 34: In silico  prediction of solubility:  Solid progress but no solution?

-74

-73

-72

-71

-70

-69

149 150 151 152 153 154 155

Volume/molecule (Å3)

Lat

tice

Ene

rgy

(kJ/

mol

)

P1 P_1P21 P21/cCc C2C2/c PmP2/c P21/mP21212 PcP212121 Pca21Pna21 PbcnPbca Pmn21Pma21 ALPHABETA GAMMA

34

These methods can get relative lattice energies of different structures correct, probably to within a few kJ/mol. Absolute energies harder.

Page 35: In silico  prediction of solubility:  Solid progress but no solution?

-74

-73

-72

-71

-70

-69

149 150 151 152 153 154 155

Volume/molecule (Å3)

Lat

tice

Ene

rgy

(kJ/

mol

)

P1 P_1P21 P21/cCc C2C2/c PmP2/c P21/mP21212 PcP212121 Pca21Pna21 PbcnPbca Pmn21Pma21 ALPHABETA GAMMA

35

Additional possible benefit for solubility: if we don’t know the crystal structure, we could reasonably use best structure from CSP.

Page 36: In silico  prediction of solubility:  Solid progress but no solution?

Other approaches to Lattice Energy

• Periodic DFT calculations on a lattice are an alternative to the model potential approach.

• Advantageous to optimise intra- and intermolecular energies simultaneously using the same method.

• Disadvantage: it’s hard to get accurate dispersion.

Page 37: In silico  prediction of solubility:  Solid progress but no solution?

Empirical routes to Gsub

• Alternatively one could estimate sublimation energy from QSPR (no crystal structure needed, but no obvious benefit over direct informatics approach to solubility).

Page 38: In silico  prediction of solubility:  Solid progress but no solution?

Thermodynamic Cycle

Crystal

Gas

Solution

Page 39: In silico  prediction of solubility:  Solid progress but no solution?

Hydration Free Energy

Page 40: In silico  prediction of solubility:  Solid progress but no solution?

Hydration Free Energy

We expect that hydration will be harder to model than sublimation, because the solution has an inexactly known and dynamic structure, both solute and solvent are important etc.

Page 41: In silico  prediction of solubility:  Solid progress but no solution?
Page 42: In silico  prediction of solubility:  Solid progress but no solution?

Simulation: MD/FEP Parameterised continuum models

Page 43: In silico  prediction of solubility:  Solid progress but no solution?

… and of course the parameterised RISM work of our hosts.

Quoted RMS error ~5kJ/mol or 0.9 log units.

Page 44: In silico  prediction of solubility:  Solid progress but no solution?

… and this one both calculates solubility directly and is simulation based: FEP or Monte Carlo.

Page 45: In silico  prediction of solubility:  Solid progress but no solution?

Luder et al.’s results correspond to an RMS error of about 6kJ/mol, or 1 logS unit, but only when an empirical “correction” is applied ….

Page 46: In silico  prediction of solubility:  Solid progress but no solution?

… their uncorrected results are less impressive.

Page 47: In silico  prediction of solubility:  Solid progress but no solution?

Hydration Energy

Our currrent methodology here is just to try the various different PCM continuum models available in Gaussian.

Page 48: In silico  prediction of solubility:  Solid progress but no solution?

We observe than our TD cycle method based on lattice energy minimisation for sublimation and a PCM continuum model of hydration correlates reasonably with experiment, but is not quantitatively predictive (at least without arbitrary correction).

Caveat: currently only a small sample of molecules.

Page 49: In silico  prediction of solubility:  Solid progress but no solution?
Page 50: In silico  prediction of solubility:  Solid progress but no solution?

An alternative route is via octanol, then using logP.

Page 52: In silico  prediction of solubility:  Solid progress but no solution?

Using a training-test set split to optimise parameters & validate:

RMSE(te)=0.71r2(te)=0.77

Ntrain = 34; Ntest = 26

Page 53: In silico  prediction of solubility:  Solid progress but no solution?

Informatics Approaches

Page 54: In silico  prediction of solubility:  Solid progress but no solution?

Informatics

“The problem is too difficult to solve using physics and chemistry, so we will design a black box to link structureand solubility”

Page 55: In silico  prediction of solubility:  Solid progress but no solution?

Informatics and Empirical Models

• In general, Informatics methods represent phenomena mathematically, but not in a physics-based way.

• Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model.

• Do not attempt to simulate reality. • Usually High Throughput.

Page 56: In silico  prediction of solubility:  Solid progress but no solution?

Machine Learning Method

Random Forest

Page 57: In silico  prediction of solubility:  Solid progress but no solution?

Random Forest: Solubility Results

RMSE(te)=0.69r2(te)=0.89Bias(te)=-0.04

RMSE(tr)=0.27r2(tr)=0.98Bias(tr)=0.005

RMSE(oob)=0.68r2(oob)=0.90Bias(oob)=0.01

DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007) Ntrain = 658; Ntest = 300

Page 58: In silico  prediction of solubility:  Solid progress but no solution?

Random Forest: Replicating Solubility Challenge (post hoc)

RMSE(te)=1.09r2(te)=0.3910/32 correct within 0.5 logS units Ntrain 100; Ntest 32

CDK descriptors

Page 59: In silico  prediction of solubility:  Solid progress but no solution?

Support Vector Machine

Page 60: In silico  prediction of solubility:  Solid progress but no solution?

SVM: Solubility Results

et al.,

Ntrain = 150 + 50; Ntest = 87RMSE(te)=0.94r2(te)=0.79

Page 61: In silico  prediction of solubility:  Solid progress but no solution?

What can we Learn from Informatics?

Page 62: In silico  prediction of solubility:  Solid progress but no solution?

What Descriptors Correlate with logS?

…amongst the solubility challenge training set, once intercorrelated descriptors with R2 > 0.64 are removed?

Page 63: In silico  prediction of solubility:  Solid progress but no solution?
Page 64: In silico  prediction of solubility:  Solid progress but no solution?

The first 21 are all negatively correlated with logS …

… things that reduce solubility.

Page 65: In silico  prediction of solubility:  Solid progress but no solution?

The first 21 are all negatively correlated with logS …

… things that reduce solubility.

Some of this is meaningful: aromatic groups reduce solubility.

Some is accidental: logP happens to be defined as octanol:water, rather than water:octanol.

Page 66: In silico  prediction of solubility:  Solid progress but no solution?

Future Work

• Explore different models of hydration: PCM, simulation (MD/FEP), RISM …

• Route: Direct to water or via octanol?

• Machine Learning (Random Forest, SVM etc.) for hybrid experimental/parameterised models.

• Consistent training and validation sets and methodologies to compare methods: e.g., solubility challenge {100+32}.

Page 67: In silico  prediction of solubility:  Solid progress but no solution?

Conclusions thus far…

Page 68: In silico  prediction of solubility:  Solid progress but no solution?

Solubility has proved a difficult property to calculate.

It involves different phases (solid & solution) and different substances (solute and solvent), and both enthalpy & entropy are important.

The theoretical approaches are generally based around thermodynamic cycles and involve some empirical element.

Page 69: In silico  prediction of solubility:  Solid progress but no solution?

Thanks• Pfizer & PIPMS• Gates Cambridge Trust• SULSA

• Dr Dave Palmer, Laura Hughes, Dr Toni Llinas• James McDonagh, Dr Tanja van Mourik• James Taylor, Simon Hogan, Gregor McInnes, Callum Kirk,

William Walton (U/G project)


Recommended