+ All Categories
Home > Documents > Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

Date post: 10-Dec-2016
Category:
Upload: laszlo
View: 214 times
Download: 0 times
Share this document with a friend
8
Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals La ´szlo ´ Fa ´bia ´n* Pfizer Institute for Pharmaceutical Materials Science, Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, U.K., CB2 1EZ ReceiVed August 6, 2008; ReVised Manuscript ReceiVed December 10, 2008 ABSTRACT: A set of complete, reliable cocrystal structures was extracted from the Cambridge Structural Database, and molecular descriptors, usually used in quantitative structure-activity relationship studies, were calculated for each molecule. The resulting database describes pairs of molecules that form cocrystals with each other in terms of their calculated molecular properties. Statistical analysis of the data was performed to identify properties that tend to be similar or complementary for such pairs of molecules. The strongest descriptor correlations found relate to the shape and polarity of cocrystal formers. Hydrogen bond donor and acceptor counts of cocrystal formers, on the other hand, show no obvious statistical relationship. Introduction The design of cocrystals has been a field of intensive research in recent years. 1,2 With reliable design strategies, cocrystals could offer a modular approach to developing materials with desirable properties. 3 Pharmaceutical cocrystals 4-7 are of par- ticular interest, since the molecular structure of an active pharmaceutical ingredient (API) is determined by its required biological activity. The unfavorable physical properties of a potential solid drug product thus cannot be tackled by modifying the API molecules, but only by changing formulations. Even though salt formation is the most widely used method to change a drug formulation, 8 the lack of suitable acidic or basic groups in the API or problems with the physical properties of the salts (e.g., their tendency to form variable solvates) may preclude the use of salt forms. Cocrystals can provide a viable alternative in such cases, as demonstrated by cocrystals of model APIs with improved dissolution characteristics, 5 hydration stabilities, 6 or melting points. 7 The rational design of cocrystals is usually based on su- pramolecular synthons. 9 If the molecules are able to associate by utilizing different, competing synthons, a design strategy must be concerned with the hierarchy of the synthons, that is, which of the possible synthons are formed at the expense of others. For relatively strong, specific interactions, such as hydrogen bonds and halogen bonds, synthon hierarchies can be established and successfully exploited. 2 Nevertheless, the multitude of weaker, nonspecific interactions seriously limits our ability to design cocrystals. Homologous compounds (with the same functional groups and the same possible synthons) often exhibit different reactivity toward cocrystal formation, while some molecules are able to form cocrystals without any obvious synthons connecting them. 10 These limitations are usually handled by cocrystal screening, 11 a trial-and-error procedure. For practical applications, development costs will depend on the number of screening experiments needed before a suitable cocrystal former is found. It would therefore be important to identify further factors beyond synthon matching that influence the success or failure of screening experiments. The aim of this work is to find such factors by the statistical analysis of data on cocrystals from the Cambridge Structural Database 12 (CSD, version 5.29, November 2007). Experimental Methods Cocrystal Database Creation. The CSD was searched for ordered, error-free organic crystal structures (at least one C atom, only C, H, N, O, S, P, F, Cl, Br, or I atoms allowed). Duplicates and unreliable or incomplete structures were filtered out by using the “best representative” list of van de Streek. 13 The remaining structures were exported from the CSD to mol2 files, which were used for further processing and calculations. Sum formulas, formal charges (as stored in the CSD), and InChI identifiers 14 were calculated for each residue. Cocrystals were defined as structures containing at least two neutral residues with different InChI identifiers (i.e., structural formulas) that do not appear in a list of common solvents. 15 Cocrystals of molecules that occur at least 10 times in the data set were excluded to avoid the possible bias caused by the specific requirements of popular cocrystal formers (Table 1). The resulting database contains 974 cocrystal structures formed by 1949 molecules. Calculation of Molecular Descriptors. The complete set of quantitative structure-activity relationship (QSAR) type descriptors available in our software tools was used to characterize the 1949 molecules, without any prior consideration of their importance in cocrystal formation. Altogether 131 molecular descriptors were calcu- lated for each molecule by using locally written Perl scripts and the programs RPluto, 16 JOElib2, 17 and Sybyl. 18 The 131 descriptors include simple atom, bond and group counts, hydrogen bond donor and acceptor counts, size and shape descriptors, surface area descriptors (with partitioned and charge weighted variants), and molecular electrostatic descriptors (see Table S1, Supporting Information for a complete list). Partial atomic charges for the calculation of electrostatic descriptors were assigned by using the Gasteiger-Huckel method in Sybyl. 18 Statistical Analysis. Molecules that were found in the same cocrystal were combined into pairs. Each pair of molecules corresponds to a set of 2 × 131 molecular descriptors. As a first approximation, we analyzed descriptors in pairs, that is, only one descriptor per molecule was considered at a time. (In other words, the analysis was performed in 2 × 1-dimensional projections of the 2 × 131 dimensional parameter space.) If a particular pair of descriptors refers to molecular properties that influence cocrystal formation then the descriptors are expected to assume favorable combinations of values more frequently than unfavor- able ones. Consequently, pairs of descriptors that indicate some form of complementarity should be correlated. To find such correlations, correlation coefficients were calculated for all possible pairs of descriptors (131 × 130/2 ) 8515 pairs). The distribution of descriptor values among the molecules is far from a normal distribution, which limits the usability of the most common statistical parameters, such as mean value and standard deviation (Figure 1). Therefore, nonparametric statistical descriptors, which are meaning- * To whom correspondence should be addressed. E-mail: fabian@ ccdc.cam.ac.uk; tel.: +44 1223 763498; fax: +44 1223 336033. CRYSTAL GROWTH & DESIGN 2009 VOL. 9, NO. 3 1436–1443 10.1021/cg800861m CCC: $40.75 2009 American Chemical Society Published on Web 01/14/2009
Transcript
Page 1: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

Cambridge Structural Database Analysis of MolecularComplementarity in Cocrystals

Laszlo Fabian*

Pfizer Institute for Pharmaceutical Materials Science, Cambridge Crystallographic Data Centre,12 Union Road, Cambridge, U.K., CB2 1EZ

ReceiVed August 6, 2008; ReVised Manuscript ReceiVed December 10, 2008

ABSTRACT: A set of complete, reliable cocrystal structures was extracted from the Cambridge Structural Database, and moleculardescriptors, usually used in quantitative structure-activity relationship studies, were calculated for each molecule. The resultingdatabase describes pairs of molecules that form cocrystals with each other in terms of their calculated molecular properties. Statisticalanalysis of the data was performed to identify properties that tend to be similar or complementary for such pairs of molecules. Thestrongest descriptor correlations found relate to the shape and polarity of cocrystal formers. Hydrogen bond donor and acceptorcounts of cocrystal formers, on the other hand, show no obvious statistical relationship.

Introduction

The design of cocrystals has been a field of intensive researchin recent years.1,2 With reliable design strategies, cocrystalscould offer a modular approach to developing materials withdesirable properties.3 Pharmaceutical cocrystals4-7 are of par-ticular interest, since the molecular structure of an activepharmaceutical ingredient (API) is determined by its requiredbiological activity. The unfavorable physical properties of apotential solid drug product thus cannot be tackled by modifyingthe API molecules, but only by changing formulations. Eventhough salt formation is the most widely used method to changea drug formulation,8 the lack of suitable acidic or basic groupsin the API or problems with the physical properties of the salts(e.g., their tendency to form variable solvates) may precludethe use of salt forms. Cocrystals can provide a viable alternativein such cases, as demonstrated by cocrystals of model APIswith improved dissolution characteristics,5 hydration stabilities,6

or melting points.7

The rational design of cocrystals is usually based on su-pramolecular synthons.9 If the molecules are able to associateby utilizing different, competing synthons, a design strategy mustbe concerned with the hierarchy of the synthons, that is, whichof the possible synthons are formed at the expense of others.For relatively strong, specific interactions, such as hydrogenbonds and halogen bonds, synthon hierarchies can be establishedand successfully exploited.2 Nevertheless, the multitude ofweaker, nonspecific interactions seriously limits our ability todesign cocrystals. Homologous compounds (with the samefunctional groups and the same possible synthons) often exhibitdifferent reactivity toward cocrystal formation, while somemolecules are able to form cocrystals without any obvioussynthons connecting them.10 These limitations are usuallyhandled by cocrystal screening,11 a trial-and-error procedure.For practical applications, development costs will depend onthe number of screening experiments needed before a suitablecocrystal former is found. It would therefore be important toidentify further factors beyond synthon matching that influencethe success or failure of screening experiments. The aim of thiswork is to find such factors by the statistical analysis of data

on cocrystals from the Cambridge Structural Database12 (CSD,version 5.29, November 2007).

Experimental Methods

Cocrystal Database Creation. The CSD was searched for ordered,error-free organic crystal structures (at least one C atom, only C, H,N, O, S, P, F, Cl, Br, or I atoms allowed). Duplicates and unreliable orincomplete structures were filtered out by using the “best representative”list of van de Streek.13 The remaining structures were exported fromthe CSD to mol2 files, which were used for further processing andcalculations. Sum formulas, formal charges (as stored in the CSD),and InChI identifiers14 were calculated for each residue. Cocrystals weredefined as structures containing at least two neutral residues withdifferent InChI identifiers (i.e., structural formulas) that do not appearin a list of common solvents.15 Cocrystals of molecules that occur atleast 10 times in the data set were excluded to avoid the possible biascaused by the specific requirements of popular cocrystal formers (Table1). The resulting database contains 974 cocrystal structures formed by1949 molecules.

Calculation of Molecular Descriptors. The complete set ofquantitative structure-activity relationship (QSAR) type descriptorsavailable in our software tools was used to characterize the 1949molecules, without any prior consideration of their importance incocrystal formation. Altogether 131 molecular descriptors were calcu-lated for each molecule by using locally written Perl scripts and theprograms RPluto,16 JOElib2,17 and Sybyl.18 The 131 descriptors includesimple atom, bond and group counts, hydrogen bond donor and acceptorcounts, size and shape descriptors, surface area descriptors (withpartitioned and charge weighted variants), and molecular electrostaticdescriptors (see Table S1, Supporting Information for a complete list).Partial atomic charges for the calculation of electrostatic descriptorswere assigned by using the Gasteiger-Huckel method in Sybyl.18

Statistical Analysis. Molecules that were found in the same cocrystalwere combined into pairs. Each pair of molecules corresponds to a setof 2 × 131 molecular descriptors. As a first approximation, we analyzeddescriptors in pairs, that is, only one descriptor per molecule wasconsidered at a time. (In other words, the analysis was performed in 2× 1-dimensional projections of the 2 × 131 dimensional parameterspace.) If a particular pair of descriptors refers to molecular propertiesthat influence cocrystal formation then the descriptors are expected toassume favorable combinations of values more frequently than unfavor-able ones. Consequently, pairs of descriptors that indicate some formof complementarity should be correlated. To find such correlations,correlation coefficients were calculated for all possible pairs ofdescriptors (131 × 130/2 ) 8515 pairs).

The distribution of descriptor values among the molecules is far froma normal distribution, which limits the usability of the most commonstatistical parameters, such as mean value and standard deviation (Figure1). Therefore, nonparametric statistical descriptors, which are meaning-

* To whom correspondence should be addressed. E-mail: [email protected]; tel.: +44 1223 763498; fax: +44 1223 336033.

CRYSTALGROWTH& DESIGN

2009VOL. 9, NO. 3

1436–1443

10.1021/cg800861m CCC: $40.75 2009 American Chemical SocietyPublished on Web 01/14/2009

Page 2: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

ful irrespective of the shape of the distributions, were also used.Distributions were summarized by median, lower quartile, and upperquartile values, rather than by mean and standard deviation. (Medianis the value that “splits” a data set such that 50% of the data values arelower and 50% are higher than the median. Quartiles are definedanalogously as values that are higher than 25% (lower quartile) and75% (upper quartile) of the data set, respectively.) In addition to the

more common Pearson’s correlation coefficient (r, based on mean andstandard deviation), Spearman’s nonparametric correlation coefficient(F, based on the ranking of values) was calculated for each moleculardescriptor pair.

For descriptor pairs with a correlation coefficient of at least 0.25,two-dimensional density plots and box plots were created to presenttheir relationship visually. The usual method of showing the relationship

Table 1. List of the Most Frequent Cocrystal Formers in the Cocrystal Data Seta

no. of structuresb compound name no. of structures compound name

109 4,4-bipyridine 15 benzene-1,3,5-tricarboxylic acid85 tetracyano-p-quinodimethane 15 picric acid70 hydroquinone 15 resorcinol65 18-crown-6 14 adipic acid60 urea 14 glutaric acid47 (E)-4,4′-diazastylbene 14 carbamazepine47 2,2′-dihydroxy-1,1′-binaphthyl 14 4-nitrobenzoic acid46 cholic acid 14 4-nitrophenol39 phenazine 14 1,10-phenanthroline38 triphenylphosphine oxide 13 4,4′-biphenol37 1,3,5-trinitrobenzene 13 2,2′-bipyridine36 isonicotinamide 13 4,4′-dipyridyl N,N′-dioxide33 tetrathiafulvalene 13 1,4-diazabicyclo[2.2.2]octane31 fumaric acid 13 fullerene (C60)31 succinic acid 13 naphthalene29 hexamethylenetetramine 13 (R)-(1-naphthyl)glycyl-(R)-phenylglycine28 iodine 13 tetrachloro-p-benzoquinone27 p-benzoquinone 12 cis,cis-1,3,5-cyclohexanetricarboxylic acid27 pyrene 12 octafluoronaphthalene27 1,2-bis(4-pyridyl)ethane 12 phenol26 oxalic acid 12 1,4-phenylenediamine26 tetracyanoethylene 12 tetrabromomethane25 3R,7R,12R-trihydroxy-5�-cholamide 12 theophylline24 anthracene 12 trans-1,5-dichloro-9,10-diethynyl-9,10-dihydroanthracene-9,10-diol24 4-aminobenzoic acid 11 benzoic acid24 3,5-dinitrobenzoic acid 11 cis-anti-cis-dicyclohexano-18-crown-624 pyrazine 11 trans-1,2-diaminocyclohexane24 1,1,6,6-tetraphenylhexa-2,4-diyne-1,6-diol 11 2,5-piperazinedione22 C-methylcalix4resorcinarene 11 tetraiodoethene22 5,5-diethylbarbituric acid 11 trans-2,3-bis(1,1-diphenylhydroxymethyl)-1,4-dioxaspiro[4.5]decane22 2-pyridone 11 trans-4,5-bis(diphenylhydroxymethyl)-2,2-dimethyl-1,3-dioxolane20 2-aminopyrimidine 11 O,O′-dibenzoyl-tartaric acid20 1,4-diiodotetrafluorobenzene 11 hexamethylbenzene20 hexafluorobenzene 11 sebacic acid20 pyromellitic dianhydride 10 chloranilic acid20 1,2,4,5-tetracyanobenzene 10 4-hydroxybenzoic acid17 1,1′-bis(4-hydroxyphenyl)cyclohexane 10 terephthalic acid16 caffeine 10 perylene16 thiourea 10 1,3-bis(4-pyridyl)propane16 2,5-bis(4-pyridyl)-1,3,4-oxadiazole 10 2,3,5,6-tetrafluoro-7,7,8,8-tetracyanoquinodimethane15 acridine

a Common solvents15 are not included in the list. b The number of structures refers to the subset of complete and ordered organic structures (seeExperimental Methods). The total number of known cocrystals with a given compound may be significantly higher than listed here, especially forcavitands and inclusion host compounds.

Figure 1. (a) Histogram of the number of heavy atoms in the molecules of the cocrystal data set. The continuous line represents the normaldistribution with the same mean (17.4) and standard deviation (12.7) as the observed distribution. Lower quartile (9), median (14), and upperquartile (21) provide a better description of the observed distribution than mean and standard deviation. (b) The same distribution shown as a boxplot.

Molecular Complementarity in Cocrystals Crystal Growth & Design, Vol. 9, No. 3, 2009 1437

Page 3: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

of two variables in a data set is a scatter plot, such as the one shownin Figure 2a. The large number of data points and their overlap,however, makes the interpretation of Figure 2a difficult. It is easier tosee the underlying trends if the individual data points are replaced bya function that describes how many data points fall in a specific regionof the plot. Density plots (Figure 2b) provide such a representation.The two axes of a density plot are the same as those of thecorresponding scatter plot, while the number of observations (datapoints) in each area of the plot is indicated by color-coding. Lightercolors represent areas with more data points; darker colors representareas with less data points. Smoothed density plots were generated byapplying two-dimensional kernel density estimates.19 The actual colorscale of Figure 2b thus refers to the estimated two-dimensionalprobability density function. The number of data points in any area ofFigure 2a can be obtained by integrating the probability density function(Figure 2b) over the given area and multiplying the result by the totalnumber of data points, 1949.

Box plots represent the distribution of a variable, so they can beconsidered as a simplified alternative to histograms (Figure 1b). Thetop and bottom of the box corresponds to the upper and lower quartiles,the thick horizontal line in the box represents the median, while thewhiskers attached to the box stretch to the minimum and maximum ofthe distribution. Individual outliers are marked as dots. Box plots (e.g.,Figure 3b) are used to compare the distribution of a variable for differentgroups of data, with each group being represented as a box. In the boxplots presented here, groups are defined by ranges of a descriptor thatdescribes one molecule (FPV mol 1 in Figure 3b). The individual boxesthen show the distribution of another descriptor that refers to thecocrystal-forming partner of the reference molecule (FPV mol 2 inFigure 3b). If the two molecular descriptors are correlated, then a cleartrend in the position of the boxes is seen, showing that the distributionof the second variable is shifted gradually as the first variable ischanging. The stronger the correlation the less adjacent boxes overlap.

All statistical calculations were performed and statistical figures werecreated with the R package.20

Results

The initial density plots obtained for a variety of descriptorssuggested that our data are composed of two subsets that showdifferent behaviors. In particular, a significant negative correla-tion of F ) -0.26 was found between the number of heavy(i.e., non-hydrogen) atoms in both molecules. This wouldindicate a preference of small molecules to cocrystallize withlarge ones. A density plot (Figure 2), however, reveals that thisbehavior is exhibited by only a small part of the data set.Molecules with ca. 5-25 heavy atoms show little discriminationfor the size of their partners, while those larger than 30 heavyatoms cocrystallize predominantly with smaller molecules (lessthan 10 heavy atoms). The latter group is formed by classicinclusion compounds: crystals of a large host molecule with anawkward shape that cannot pack efficiently and a small guestmolecule that fills the voids inside or between the hostmolecules. Since these molecule pairs showed a distinct behaviorand we are interested in cocrystal formation of moleculeswithout major packing frustration, inclusion compounds wereexcluded from further analysis. The remaining data set contains710 cocrystal structures, each formed by molecules with 6-30heavy atoms. The highest correlations for the reduced data set(Table S2, Supporting Information) form three groups, whichreveal three qualitative trends. These trends and correlationsbetween hydrogen bond donor/acceptor counts will be discussedin the following sections.

Figure 2. The number of heavy atoms in pairs of molecules taken from cocrystals shown (a) as a scatter plot and (b) as a density plot. Lightercolors indicate more observed molecule pairs. The contour levels show the value of the two-dimensional probability density function.

Figure 3. The relationship of the fractional polar volumes (FPV) of pairs of molecules that formed cocrystals, shown (a) as a density plot (b) asa box plot.

1438 Crystal Growth & Design, Vol. 9, No. 3, 2009 Fabian

Page 4: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

Molecular Polarity. The strongest correlations found arerelated to the polarity of the molecules. The positive sign ofthe correlation coefficients (Table 2) suggests that moleculespreferably form cocrystals with partners of similar polarity.Molecular polarity is not a rigorously defined term, so a numberof descriptors can be associated with it. It is apparent from Table2 that these descriptors are not equivalent. The highest correla-tion found in this analysis relates the fractional polar volumes(FPV) of the cocrystallized molecules. FPV is defined as thefraction of the molecular volume that belongs to polar atoms(N, O, S atoms, and H atoms bonded to N, O, or S). A simpleralternative to using FPV is the descriptor FNO, which isobtained by dividing the total number of N and O atoms by thenumber of heavy atoms in the molecule. FNO still shows arelatively strong correlation (Table 2), and it can be easilycalculated from the molecular formula. The dipole moments ofmolecules in cocrystals show a similarly strong relationship.(The large difference between the r and F values for dipoles inTable 2 is caused by the strongly skewed distribution of dipolemoments: small dipoles are much more frequent in the data setthan large ones.) Polar surface area (PSA) and octanol-waterpartition coefficient (log P) are frequently used in drug discoveryto quantify molecular polarity,22 but they seem to have littlerelevance for cocrystal formation.

Correlation coefficients are useful in selecting the interestingdescriptors, but they do not show how reliably we can use themfor cocrystal design. As illustrated in Figure 3, density plotsand box plots are helpful in this regard. The density of observedcocrystals (and that of the corresponding pairs of FPV values)is the highest along the diagonal of Figure 3a, and it decreasesgradually with increasing distance from the diagonal. The boxplot in Figure 3b compares molecules in four different rangesof their FPV values (mol 1) in terms of the distribution of theFPV values of their cocrystal forming partners (mol 2 axis ingraph). The median values (indicated by the horizontal line inthe box) show the same trend as the density plot. The degreeof overlap between adjacent boxes gives a semiquantitativemeasure of the significance of this trend. (If the data werenormally distributed then the boxes would span the median (0.67 σ interval.) Since the whiskers stretch over almost thecomplete range of FPV values, not even the highest correlationcan be regarded as the manifestation of a strict rule. Neverthe-less, there is an obvious trend, and it can be judged from Figure3 whether and how much molecular polarity favors cocrystalformation by two molecules with particular FPV values.

Shape and Size. Simple descriptors of molecular shape andsize were defined by following the ideas from the box modelof crystal packing.23 In this model, the van der Waals volumeof the molecule is enclosed in a rectangular box, and the long,medium and short axes of this box are denoted L, M, and S,respectively. While L, M, and S refer to the size of the molecule,their ratios provide information about molecular shape. Forexample, S/L is small for planar molecules, and M/L is small

for rod-shaped ones. Indeed, these axis ratios show muchstronger correlations than axis lengths (Table 3), suggesting thatmatching of molecular shapes is more important for cocrystalformation than the matching of absolute molecular dimensions.Nonetheless, both the short and the long axes appear to be moreinfluential than other frequently used size descriptors, such asmolecular weight.

The corresponding density plots (Figure 4) show that S/L andM/L correlations can be interpreted similarly to the FPVcorrelations: the frequency of observed molecule pairs decreasesgradually with increasing difference between the axis ratios.Both descriptors show a marked skewness: S/L is biased towardsmall and M/L toward big values. An interesting feature of theM/L graph (Figure 4b) is that it is much narrower at low M/Lratios. Qualitatively this means that the more elongated amolecule, the less likely that it forms a cocrystal with a moleculeof different shape. The S/M density plot (Figure 4c) iscomplicated by the peculiar distribution of S/M values. Themolecules in the database form two distinct groups, with S/Mratios around 0.5 and 0.75, respectively. Members of both groupsform cocrystals more frequently with molecules from the samegroup than with molecules from the other group. The wide lobesbetween the two maxima in Figure 4c, however, indicate thatthere are several examples showing a marked deviation fromthe overall trend.

The correlations found for the short (S) and long (L) axisdimensions are related to the shape correlations. The strong S/Lshape correlation means that molecules of a flat shape tend toform cocrystals with other flat molecules. With approximatelyhalf of the molecules in the data set being planar (S < 5 Å),the shape correlation directly translates to similar S values (i.e.,to cocrystals of planar molecules with planar molecules). Thecorrelation between the long axis values is very weak, but theshape of the density distribution (Figure 4d) suggests a strongercorrelation for larger L values. This is confirmed by thecorrelation coefficients calculated separately for structures withL1 < 14 Å and for those with L1 > 14 Å: r(L1, L2) ) 0.08 forthe former and 0.14 for the latter subset. The molecules with L> 14 Å typically exhibit an M/L ratio of ca. 0.5, so thecorrelation of the long axes is explained by the stronger tendencyof elongated molecules to cocrystallize with partners of similarshape.

Globularity is a shape descriptor that relates the surface areaof a molecule to its volume. Globularity is small for moleculeswith a smooth surface, while bumps and hollows of themolecular shape increase its value. The correlation seen for thisdescriptor (Table 3) is linked to the packing frustration that couldarise in a cocrystal formed by a bumpy and a smooth molecule.This shape relationship appears to be stronger for smoothmolecules (i.e., those with lower values of globularity), whichare predominantly planar molecules (see Supporting Informationfor figures).

Negative Molecular Surface and Hydrogen BondDonors. The strongest relationships between different descrip-tors for two cocrystal-forming molecules link the negativesurface area of a molecule to the number of hydrogen bonddonors in the other (Figure 5, Table S2, Supporting Informa-tion). Donor H atoms have a positive partial charge, so theyincrease the positive surface area of molecules. Consequently,a positiVe correlation between donor group counts and nega-tive surface area descriptors would be expected. Surprisingly,the sign of the actual correlation is negatiVe: F(ASAN1,Dplu2) ) -0.32. (ASAN is the accessible surface area ofatoms with negative partial charge, computed using a probe

Table 2. Correlation Coefficients for Molecular Descriptors Relatedto Polarity

descriptor (p)a dipole PV FPV FNO PSA FPSA log P(calcd)

r(p1,p2) 0.28 0.22 0.37 0.30 -0.14 -0.01 0.08F(p1,p2) 0.39 0.30 0.41 0.31 -0.08 0.01 0.10

a Descriptor definitions: PV: polar volume, the volume of N, O, Satoms, and H atoms bonded to these atoms in the molecule; FPV ) PV/molecular volume; FNO ) (no. of N atoms + no. of O atoms)/no. ofheavy atoms; PSA: polar surface area (defined analogously to PV);FPSA ) PSA/molecular surface area; log P: logarithm of octanol-waterpartition coefficient, calculated using the method of ref 21.

Molecular Complementarity in Cocrystals Crystal Growth & Design, Vol. 9, No. 3, 2009 1439

Page 5: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

radius of 1.5 Å, while Dplu is the total number of donor Hatoms.) The box plot in Figure 5 reveals that the negativecorrelation is due mainly to a specific group of cocrystals,formed by a molecule with no donors and by another with alarge negative surface area. A manual survey of the correspond-ing cocrystals resolved the apparent contradiction: most ofthem are charge transfer complexes, which are formed by anaromatic hydrocarbon or a tetrathiafulvalene analogue (nohydrogen bond donors) and by another planar molecule with

delocalized electrons that is made π-electron deficient byseveral electron withdrawing substituents (large negative surfacearea).

Hydrogen Bond Donors and Acceptors. The success ofcocrystal design by utilizing hydrogen-bonded supramolecularsynthons clearly shows the importance of hydrogen bonds informing cocrystals. One may thus expect that donor/acceptorcounts in our data set should reflect the distinguished role ofsuch interactions. Figure 6, however, shows that neither absolutedonor/acceptor counts nor their differences show the expectedtrend. (The number of donors is defined as the number of polarH atoms, while the number of acceptors is given by the numberof possible acceptor heteroatoms, that is, by the number of Oand N atoms except for N atoms with more than three bonds.)The results remain the same even if the charge transfercomplexes and cocrystals with a stoichiometry other than 1:1are removed from the data set and/or if simple donor/acceptorcounts are replaced by the average number of donor and acceptorhydrogen bonds a functional group forms in the CSD (FigureS6, Supporting Information).24

The contradiction between these results and the knownimportance of hydrogen bonds is only an apparent one. Whatthese results show is that counting donors and acceptors isinsufficient to describe their complementarity. The formationof synthons is governed by the strength of hydrogen bondsbetween cocrystal formers rather than by the number of available

Table 3. Correlation Coefficients for Molecular Descriptors Related to the Shape and Size of the Molecules

descriptor (p)a L M S S/L S/M M/L mol weight volume globularity

r(p1,p2) 0.17 0.04 0.19 0.38 0.38 0.41 -0.02 -0.09 0.25F(p1,p2) 0.16 0.03 0.22 0.40 0.38 0.38 -0.05 -0.04 0.21

a Descriptor definitions: L, M, S: long, medium and short axis of an enclosing box; globularity: molecular surface area divided by the surface of asphere with the same volume as the molecule.

Figure 4. Density plots showing the shape relationship of molecules in cocrystals. S, M, and L are the short, medium, and long axes of a rectangularbox that encloses the molecule.

Figure 5. Accessible surface area of atoms with negative partial charge(ASAN) in molecules that form a cocrystal with partners having 0, 1,2, 3, or more hydrogen bond donors.

1440 Crystal Growth & Design, Vol. 9, No. 3, 2009 Fabian

Page 6: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

groups. Modeling of hydrogen bond donor/acceptor strengths,however, requires more sophisticated calculations25,26 than thoseapplied in the current analysis. We plan to extend this work inthe near future by using the logit hydrogen bond propensitymodel26 to identify the “best” (i.e., the most likely) homo- andheteromolecular hydrogen bond(s) that can be formed by thedonors and acceptors of the molecules in a cocrystal.

Discussion

The above statistical analysis showed that the majority ofcocrystals in the CSD are formed by molecules of similarpolarities and shapes, but deviations from these overall trendshave also been observed. Analysis of representative structuresthat either follow or do not follow these trends may thus helptheir qualitative chemical interpretation.

Most cocrystals are obtained by solution crystallization, sothe preference for similar molecular polarities could be a

consequence of the comparable solubilities of the cocrystalformers in the crystallization solvent. The most importantpolarity descriptors in the QSPR prediction of solubility valuesare often log P and various surface area descriptors.22b The lackof correlation in these descriptors (Table 2) thus suggests thatsolubility is not the only reason behind the preference for similarpolarities.

Figure 7 shows examples of cocrystals27-30 formed bymolecules with both similar (Figure 7a,b) and dissimilar (Figure7c,d) polarities in terms of the FPV and FNO descriptors. (Forexample, FNO ) 4 O atoms/18 heavy atoms ) 0.22 for theacid, and FNO ) 2 O atoms + 4 N atoms/30 heavy atoms )0.2 for the amide in Figure 7a.) In three of these four cocrystals,the molecules are arranged such that distinct hydrophobic andhydrophilic slabs can be identified. (The hydrophobic region isin the middle of Figure 7a,b, and on the left-hand side of Figure7d.) Polar and apolar regions are often segregated in crystals,

Figure 6. The relationship of hydrogen bond donor/acceptor counts in molecule pairs from cocrystals: (a) donors in one molecule vs. acceptors inthe other molecule (b) difference between the number of acceptors and donors in both molecules. The number of donors is given as the number ofdonor H atoms, while each available acceptor atom is counted as one acceptor.

Figure 7. Cocrystal structures from the CSD. (a) 2,6-Bis(((6-methylpyrid-2-yl)amino)carbonyl)-naphthalene 1,12-dodecanedicarboxylic acid,JOHPUR,27 (b) 3-(2,6-dimethylphenyl)pyrimido(4,5-b)-1,8-naphthyridine-2,4(1H,3H)-dione N-n-butyl-N′-(4-methylpyridin-2-yl)urea, IXUDIO,28

(c) bis(1,2,5)-thiadiazolotetracyanoquinodimethanide m-divinylbenzene, HEJHOT,29 (d) n-heptadecanoic acid nicotinamide, FIFLAI.30 Atom colors:red - oxygen, blue - nitrogen, yellow - sulfur, gray - carbon, white - hydrogen.

Molecular Complementarity in Cocrystals Crystal Growth & Design, Vol. 9, No. 3, 2009 1441

Page 7: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

and the topology of the separate regions (layers, rods, spheres)depends on the polar/apolar volume ratio.31 Consequently,molecules with dissimilar FPV values are expected to favordifferent topologies, which, in turn, may make the formationof a cocrystal with the preferred separation of hydrophobic andhydrophilic regions more difficult.

Exceptions to the similar polarity rule can be expected whenspecific favorable interactions link polar and apolar groups toeach other or when the orientation of functional groups is suchthat it allows effective segregation without matching overallmolecular polarities. The first case is illustrated by HEJHOT(Figure 7c), where charge transfer interaction generates stacksof alternating polar and apolar molecules. The second type ofexception is demonstrated by FIFLAI (Figure 7d). Althoughthe nicotinamide molecules are polar and the margaric acidmolecules are mostly apolar, the carboxyl group is attached tothe end of the long hydrophobic chain, so the two moleculescan hydrogen bond without the occurrence of unfavorablehydrophobic-hydrophilic contacts.

Three of the example cocrystals (Figure 7a-c) are formedby molecules of similar shapes. The margaric acid nicotinamidecocrystal (FIFLAI, Figure 7d) shows an example of cocrystalformation by molecules of different shapes. Close packing ofthis cocrystal is possible because of the specific mutualorientation of the hydrogen-bonded molecules. The nicotinamidemolecule has a similar width and depth to that of the alkyl chain,so its attachment to the end of the chain in this specificorientation does not prevent close packing of the alkyl chains.If the shapes of the molecules are more similar (Figure 7a-c)then close packing puts less restriction on their relativeorientation. The molecules of Figure 7a, for example, could forma close packed array with any relative shift between the longaxes of the acid and amide molecules (horizontal in the figure),so the arrangement that optimizes hydrogen bonding is easilyrealized.

The charge transfer complex HEJHOT (Figure 7c) illustrateswhy such complexes generate a negative correlation betweennegative surface area and hydrogen bond donors (Figure 5). Theelectron-rich hydrocarbon molecule (with no hydrogen-bonddonors) interacts with an electron acceptor that has several Natoms in the electron withdrawing substituents (contributing toits negative surface area).

Similarly to the overwhelming majority of cocrystals, noneof the molecules in the example structures of Figure 7 has morehydrogen bond donors than acceptors. Consequently, theabundance of acceptors in one molecule of a cocrystal cannotbe compensated for by the abundance of donors in the other.The apparent hydrogen-bonded heterosynthons in Figure 7a,b,dillustrate that such reliable interactions will occur irrespectiveof any donor/acceptor “imbalance” in the molecules.

Conclusion

Statistical analysis of known cocrystals in the CSD hasled to the identification of molecular properties that influencecocrystal formation. The shapes and polarities of moleculesthat form cocrystals tend to be similar, while there is noindication for complementarity with regard to the numerical“imbalance” of hydrogen bond donors and acceptors. Inclu-sion compounds and charge transfer complexes have beenrevealed as distinct subsets of cocrystals. The observedrelationships may provide useful qualitative guidelines forthe rational design of cocrystals and, by using the simplemolecular descriptors presented here, may form the basis ofa semiquantitative predictive model.

Understanding the relationship of these ideas with specificsupramolecular synthons is an important aspect of their practicalutility. Initial experimental results suggest that obtaining acocrystal is likely if both the molecular descriptors discussedhere and the available supramolecular heterosynthons favor itsformation.32 Further experiments to elucidate this relationshipand the development of a predictive computational model usingmolecular descriptors and hydrogen bond propensities are inprogress.

Acknowledgment. The author is grateful to David Palmer(University of Cambridge) for his help with QSAR descriptors,to Samuel Motherwell (CCDC) for advising on shape descriptorsand on using RPluto, and to Frank Allen (CCDC) for hiscomments on the manuscript. William Jones (University ofCambridge), Neil Feeder (Pfizer), and Pete Marshall (Pfizer)are acknowledged for useful discussions. The financial supportof Pfizer Inc. is gratefully acknowledged.

Supporting Information Available: Complete list of moleculardescriptors used, descriptor pairs with the highest correlations, additionaldensity and box plots.This material is available free of charge via theInternet at http://pubs.acs.org.

References

(1) (a) Bhogala, B. R.; Basavoju, S.; Nangia, A. CrystEngComm 2005,7, 551–562. (b) Du, M.; Zhang, Z.-H.; Zhao, X.-J.; Cai, H. CrystGrowth Des. 2006, 6, 114–121. (c) Childs, S. L.; Hardcastle, K. I.CrystEngComm 2007, 9, 364–367. (d) Thalladi, V. R.; Dabros, M.;Gehrke, A.; Weiss, H.-C.; Boese, R. Cryst. Growth Des. 2007, 7, 598–599.

(2) (a) Saha, B. K.; Nangia, A.; Jaskolski, M. CrystEngComm 2005, 7,355–358. (b) Aakeroy, C. B.; Desper, J.; Helfrich, B. A.; Metrangolo,P.; Pilati, T.; Resnati, G.; Stevenazzi, A. Chem. Commun. 2007, 4236–4238. (c) Bouchmella, K.; Boury, B.; Dutremez, S. G.; van der Lee,A. Chem. Eur. J. 2007, 13, 6130–6138. (d) Aakeroy, C. B.; Hussain,I.; Forbesa, S.; Desper, J. CrystEngComm 2007, 9, 46–54.

(3) (a) Friscic, T.; MacGillivary, L. R. Croat. Chem. Acta 2006, 79, 327–333. (b) Horiuchi, S.; Kumaia, R.; Tokura, Y. Chem. Commun. 2007,2321–2329. (c) Maspoch, D.; Domingo, N.; Roques, N.; Wurst, K.;Tejada, J.; Rovira, C.; Ruiz-Molina, D.; Veciana, J. Chem. Eur. J.2007, 13, 8153–8163.

(4) (a) Almarsson, O.; Zaworotko, M. J. Chem. Commun. 2004, 1889–1896. (b) Vishweshwar, P.; McMahon, J. A.; Bis, J. A.; Zaworotko,M. J. J. Pharm. Sci. 2006, 95, 499–516. (c) Reddy, L. S.; Babu, N. J.;Nangia, A. Chem. Commun. 2006, 1369–1371.

(5) (a) Remenar, J. F.; Morissette, S. L.; Peterson, M. L.; Moulton, B.;MacPhee, J. M.; Guzman, H. R.; Almarsson, O. J. Am. Chem. Soc.2003, 125, 8456–8457. (b) Childs, S. L.; Chyall, L. J.; Dunlap, J. T.;Smolenskaya, V. N.; Stahly, B. C.; Stahly, G. P. J. Am. Chem. Soc.2004, 126, 13335–13342. (c) Li, Z. J.; Abramov, Y.; Bordner, J.;Leonard, J.; Medek, A.; Trask, A. V. J. Am. Chem. Soc. 2006, 128,8199–8210.

(6) (a) Trask, A. V.; Motherwell, W. D. S.; Jones, W. Cryst. Growth Des.2005, 5, 1013–1021. (b) Trask, A. V.; Motherwell, W. D. S.; Jones,W. Int. J. Pharm. 2006, 320, 114–123. (c) Friscic, T.; Fabian, L.;Burley, J. C.; Reid, D. G.; Duer, M. J.; Jones, W. Chem. Commun.2008, 1644–1646.

(7) (a) Walsh, R. D. B.; Bradner, M. W.; Fleischman, S.; Morales, L. A.;Moulton, B.; Rodrıguez-Hornedo, N.; Zaworotko, M. J. Chem.Commun. 2003, 186–187. (b) Fleischman, S. G.; Kuduva, S. S.;McMahon, J. A.; Moulton, B.; Walsh, R. D. B.; Rodrıguez-Hornedo,N.; Zaworotko, M. J. Cryst. Growth. Des. 2003, 3, 909–919.

(8) (a) Haleblian, J. K. J. Pharm. Sci. 1975, 64, 1269–1288. (b) Stahl,P. H.; Wermuth, C. G., Eds. Handbook of Pharmaceutical Salts:Properties, Selection and Use; Wiley-VCH/VHCA: Weinheim/Zurich,2002.

(9) Desiraju, G. R. Angew. Chem., Int. Ed. Engl. 1995, 34, 2311–2327.(10) Siegler, M. A.; Fu, Y.; Simpson, G. H.; King, D. P.; Parkin, S.; Brock,

C. P. Acta Crystallogr., Sect. B 2007, 63, 912–925.(11) (a) Morissette, S. L.; Almarsson, O.; Peterson, M. L.; Remenar, J. F.;

Read, M. J.; Lemmo, A. V.; Ellis, S.; Cima, M. J.; Gardner, C. R.AdV. Drug DeliVery ReV. 2004, 56, 275–300. (b) Stahly, G. P. Cryst.Growth Des. 2007, 7, 1007–1026.

1442 Crystal Growth & Design, Vol. 9, No. 3, 2009 Fabian

Page 8: Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals

(12) Allen, F. H. Acta Crystallogr., Sect. B 2002, 58, 380–388.(13) van de Streek, J. Acta Crystallogr., Sect. B 2006, 62, 567–579.(14) (a) http://www.iupac.org/inchi/(b) Stein, S. E.; Heller, S. R.; Tchek-

hovski, D. An Open Standard for Chemical Structure Representation:The IUPAC Chemical Identifier, In Proceedings of the 2003 Interna-tional Chemical Information Conference (Nimes); pp 131-143.

(15) (a) Gorbitz, C. H.; Hersleth, H.-P. Acta Crystallogr., Sect. B 2000,56, 526–534. (b) Nangia, A.; Desiraju, G. R. Chem. Commun. 1999,605–606.

(16) RPluto: http://www.ccdc.cam.ac.uk/free_services/rpluto.(17) JOElib2 - a Java based computational chemistry package, http://

joelib.sourceforge.net.(18) Sybyl 7.0; Tripos Inc.: St. Louis, MO, USA.(19) Venables, W. N.; Ripley, B. D. Modern Applied Statistics; Springer,

New York, 2002.(20) (a) R Development Core Team; R: A Language and EnVironment for

Statistical Computing; R Foundation for Statistical Computing: Vienna,Austria, 2006. (b) http://www.r-project.org/

(21) Wildman, S. A.; Crippen, G. M. J. Chem. Inf. Comput. Sci. 1999, 39,868–873.

(22) (a) Manly, C. J.; Louise-May, S.; Hammer, J. D. Drug DiscoV. Today2001, 6, 1101–1110. (b) Hughes, L. D.; Palmer, D. S.; Nigsch, F.;Mitchell, J. B. O. J. Chem. Inf. Model. 2008, 48, 220–232.

(23) Pidcock, E.; Motherwell, W. D. S. Chem. Commun. 2003, 3028–3029.(24) Infantes, L.; Motherwell, W. D. S. Chem. Commun. 2004, 1166–1167.(25) Hunter, C. A. Angew.Chem. Int. Ed. 2004, 43, 5310–5324.

(26) Galek, P. T. A.; Fabian, L.; Motherwell, W. D. S.; Allen, F. H.; Feeder,N. Acta Crystallogr., Sect. B 2007, 63, 768–782.

(27) Garcia-Tellado, F.; Geib, S. J.; Goswami, S.; Hamilton, A. D. J. Am.Chem. Soc. 1991, 113, 9265–9269.

(28) Quinn, J. R.; Zimmerman, S. C. Org. Lett. 2004, 6, 1649–1652.

(29) Suzuki, T.; Fukushima, T.; Yamashita, Y.; Miyashi, T. J. Am. Chem.Soc. 1994, 116, 2793–2803.

(30) Amai, M.; Kamijo, M.; Nagase, H.; Endo, T.; Ueda, H. Anal. Sci.:X-Ray Struct. Anal. Online 2005, 21, x9.

(31) Ward, M. D.; Horner, M. J. CrystEngComm 2004, 6, 401–407.

(32) Friscic, T. private communication.

CG800861M

Molecular Complementarity in Cocrystals Crystal Growth & Design, Vol. 9, No. 3, 2009 1443


Recommended