+ All Categories
Home > Documents > Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property...

Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property...

Date post: 03-Oct-2016
Category:
Upload: benoit
View: 218 times
Download: 2 times
Share this document with a friend
9
Published: July 25, 2011 r2011 American Chemical Society 3900 dx.doi.org/10.1021/ef200795j | Energy Fuels 2011, 25, 39003908 ARTICLE pubs.acs.org/EF Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods Diego Alonso Saldana, Laurie Starck, Pascal Mougin, Bernard Rousseau, Ludivine Pidol, Nicolas Jeuland, and Benoit Creton* ,IFP Energies nouvelles, 1 et 4 Avenue de Bois-Pr eau, 92852 Rueil-Malmaison, France Laboratoire de Chimie Physique, Universit e Paris Sud 11, UMR 8000 CNRS, 91405 Orsay Cedex, France b S Supporting Information ABSTRACT: In the present work, we report the development of models for the prediction of two fuel properties: ash points (FPs) and cetane numbers (CNs), using quantitative structure property relationship (QSPR) approaches. Compounds inside the scope of the QSPR models are those likely to be found in alternative jet and diesel fuels, i.e., hydrocarbons, alcohols, and esters. A database containing FPs and CNs for these types of molecules has been built using experimental data available in the literature. Various approaches have been used, ranging from those leading to linear models, such as genetic function approximation and partial least squares, to those leading to nonlinear models, such as feed-forward articial neural networks, general regression neural networks, support vector machines, and graph machines. Except for the case of the graph machine method, for which the only inputs are the simplied molecular input line entry specication (SMILES) formulas, previously listed approaches working on molecular descriptors and functional group count descriptors were used to build specic models for FPs and CNs. For each property, the predictive models return slightly dierent responses for each molecular structure. Thus, nal models labeled as consensus modelswere built by averaging the predicted values of selected individual models. Predicted results were compared with respect to experimental data and predictions of existing models in the literature. Models were used to predict FPs and CNs of molecules for which to the best of our knowledge there is no experimental data in the literature. Using information in the database, evolutions of properties when increasing the number of carbon atoms in families of compounds were studied. 1. INTRODUCTION The problem of climate change implies that greenhouse gas emissions should be reduced. Moreover, fuel availability at a reasonable cost seems more and more uncertain. While energy demand remains high, peak oil is expected to be seen in the coming years. 1 In that context, alternative fuels and especially biofuels respecting durability criteria seem to be a promising solution. Biofuels can be thought of as mixtures of renewablemolecules, such as normal and isoparans, naphthenic and aromatic compounds, normal and iso-olens, alcohols, and/or esters. Industrial processes, such as FischerTropsch (FT) and hydrotreatment of vegetable oils (HVO), can provide normal and isoparanic compounds. 2,3 Naphthenic and aromatic compounds can be obtained from the liquefaction or pyrolysis of biomass. 4,5 Fermentation processes lead to the synthesis of alcohols, and transesteri cation processes enable the production of esters from raw vegetable oils, such as rapeseed, sunower, palm, or jatropha. 6,7 The introduction of these molecules with di erent chemistry compared to that of petroleum derivatives requires a large amount of research and development work. In fact, it seems essential to understand the impact of the introduction of these compounds on the physical properties of alternative fuels. The knowledge of fuel properties is extremely important because they drive the conditions for storage, transportation, and combustion qua- lity. Dependent upon the fuel (gasoline, jet, or diesel), the properties and their required speci cations can be di erent. Safety-related properties are crucial for jet fuel, and this is made apparent in the specication by a limit on the ash point (FP). The FP denes the lowest temperature at which the vapor of a volatile liquid ignites when brought into contact with a ame. In the specic case of jet fuels and more precisely Jet A-1, the American Standard Test Method (ASTM) D1655 requires the FP to be of at least 38 °C. 8 Liaw et al. have proposed a methodology to predict the FP of mixtures based on Le Chateliers rule, Antoine equation, and the composition of the vapor phase. 9 During the past decade, it has been shown that mixtures can exhibit non-ideal FP behaviors (with minimum or maximum), and a large eort has been devoted to evaluate activity coecients mainly using universal functional activity coecient (UNIFAC), universal quasichem- ical (UNIQUAC), Wilson, or non-random two-liquid (NRTL) models to account for this non-ideal behavior. 1014 Within a blend of many compounds, such as fuels, it has been observed that the FP of the mixture is in rst approximation governed by that of the molecule exhibiting the smallest FP value. 15 Thus, the knowledge of the FP of individual compounds remains essential during the formulation of alternative fuels. Measurements can be dicult to carry out when complex and/or dicult to isolate molecules are considered. Correlative models involving various mathematical models and experimental properties as coecients have been developed to predict the FP of pure compounds and Received: May 31, 2011 Revised: July 22, 2011
Transcript
Page 1: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

Published: July 25, 2011

r 2011 American Chemical Society 3900 dx.doi.org/10.1021/ef200795j | Energy Fuels 2011, 25, 3900–3908

ARTICLE

pubs.acs.org/EF

Flash Point and Cetane Number Predictions for Fuel CompoundsUsing Quantitative Structure Property Relationship (QSPR) MethodsDiego Alonso Saldana,† Laurie Starck,† Pascal Mougin,† Bernard Rousseau,‡ Ludivine Pidol,†

Nicolas Jeuland,† and Benoit Creton*,†

†IFP Energies nouvelles, 1 et 4 Avenue de Bois-Pr�eau, 92852 Rueil-Malmaison, France‡Laboratoire de Chimie Physique, Universit�e Paris Sud 11, UMR 8000 CNRS, 91405 Orsay Cedex, France

bS Supporting Information

ABSTRACT: In the present work, we report the development of models for the prediction of two fuel properties: flash points (FPs)and cetane numbers (CNs), using quantitative structure property relationship (QSPR) approaches. Compounds inside the scope ofthe QSPR models are those likely to be found in alternative jet and diesel fuels, i.e., hydrocarbons, alcohols, and esters. A databasecontaining FPs and CNs for these types of molecules has been built using experimental data available in the literature. Variousapproaches have been used, ranging from those leading to linear models, such as genetic function approximation and partial leastsquares, to those leading to nonlinear models, such as feed-forward artificial neural networks, general regression neural networks,support vector machines, and graph machines. Except for the case of the graph machine method, for which the only inputs are thesimplified molecular input line entry specification (SMILES) formulas, previously listed approaches working on moleculardescriptors and functional group count descriptors were used to build specific models for FPs and CNs. For each property, thepredictive models return slightly different responses for each molecular structure. Thus, final models labeled as “consensus models”were built by averaging the predicted values of selected individual models. Predicted results were compared with respect toexperimental data and predictions of existing models in the literature. Models were used to predict FPs and CNs of molecules forwhich to the best of our knowledge there is no experimental data in the literature. Using information in the database, evolutions ofproperties when increasing the number of carbon atoms in families of compounds were studied.

1. INTRODUCTION

The problem of climate change implies that greenhouse gasemissions should be reduced. Moreover, fuel availability at areasonable cost seems more and more uncertain. While energydemand remains high, peak oil is expected to be seen in the comingyears.1 In that context, alternative fuels and especially biofuelsrespecting durability criteria seem to be a promising solution. Biofuelscan be thought of as mixtures of “renewable” molecules, such asnormal and isoparaffins, naphthenic and aromatic compounds,normal and iso-olefins, alcohols, and/or esters. Industrial processes,such as Fischer�Tropsch (FT) and hydrotreatment of vegetable oils(HVO), can provide normal and isoparaffinic compounds.2,3

Naphthenic and aromatic compounds can be obtained from theliquefaction or pyrolysis of biomass.4,5 Fermentation processes leadto the synthesis of alcohols, and transesterification processes enablethe production of esters from raw vegetable oils, such as rapeseed,sunflower, palm, or jatropha.6,7 The introduction of these moleculeswith different chemistry compared to that of petroleum derivativesrequires a large amount of research and development work. In fact, itseems essential to understand the impact of the introduction of thesecompounds on the physical properties of alternative fuels. Theknowledge of fuel properties is extremely important because theydrive the conditions for storage, transportation, and combustion qua-lity. Dependent upon the fuel (gasoline, jet, or diesel), the propertiesand their required specifications can be different.

Safety-related properties are crucial for jet fuel, and this ismade apparent in the specification by a limit on the flash point

(FP). The FP defines the lowest temperature at which the vaporof a volatile liquid ignites when brought into contact with a flame.In the specific case of jet fuels and more precisely Jet A-1, theAmerican Standard Test Method (ASTM) D1655 requires theFP to be of at least 38 �C.8

Liaw et al. have proposed a methodology to predict the FP ofmixtures based on Le Chatelier’s rule, Antoine equation, and thecomposition of the vapor phase.9 During the past decade, it hasbeen shown that mixtures can exhibit non-ideal FP behaviors(with minimum or maximum), and a large effort has beendevoted to evaluate activity coefficients mainly using universalfunctional activity coefficient (UNIFAC), universal quasichem-ical (UNIQUAC), Wilson, or non-random two-liquid (NRTL)models to account for this non-ideal behavior.10�14 Within ablend of many compounds, such as fuels, it has been observedthat the FP of the mixture is in first approximation governed bythat of the molecule exhibiting the smallest FP value.15 Thus, theknowledge of the FP of individual compounds remains essentialduring the formulation of alternative fuels. Measurements can bedifficult to carry out when complex and/or difficult to isolatemolecules are considered. Correlative models involving variousmathematical models and experimental properties as coefficientshave been developed to predict the FP of pure compounds and

Received: May 31, 2011Revised: July 22, 2011

Page 2: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3901 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

mixtures.16,17 Most of the predictive models developed duringthe last few years are based on quantitative structure propertyrelationship (QSPR) approaches.18 Linear regressions have beenestablished to predict the FP of esters using a genetic algorithm(GA) approach.19,20 Recent studies have shown that the use ofartificial neural network (ANN) approaches leads to a significantimprovement of models when predicting the FP.18,21�26 Most ofthese models differ from chemical families considered; e.g., somepredictive models are trained over data sets containing onlyhydrocarbons,22,23 while others are intended to estimate the FPof diverse organic compounds, including heteroatoms, such asnitrogen, oxygen, sulfur, etc.21,24�26 A neural network model basedon group contributions has been developed by Gharagheizi et al. topredict the FP considering 1294 diverse organic compounds in thedata set and reporting a squared correlation coefficient, R2, of 0.97computed over a test set.27 Carroll et al. have recently proposed amethod to predict the FP of hydrocarbons using experimentalboiling points and additive groups and reporting 2.9 K of averageabsolute deviation for their model.28 Batov et al. have proposed anadditive group technique to estimate the FP of alcohols, ketones,and esters; however, the authors have only considered 89 com-pounds to fit the parameters.29 Caroll et al. have very recentlyadapted their model to account for various chemical compoundsusing a database containing 1000 molecules.30 Moreover, Ghara-gheizi et al. have developed a similar empiricalmethod trained over adatabase containing 1471 compounds.31 Absolute average errorsreported by Caroll et al. and Gharagheizi et al. are 2.5 and 2.4 K,respectively. Nevertheless, these empirical models need the knowl-edge of boiling point values as the input parameter, and no externaltest set ofmolecules seems to have been used to validate themodels,omitting any discussion over their predictive power.

With regard to diesel fuel, one of the most stringent propertieslinked to combustion is the control of the ignition, which is expressedby the cetane number (CN). The CN mimics the tendency of amolecule to autoignite when exposed to heat and pressure, as ithappens in a diesel engine under working conditions. Two standardcompounds are used to define the CN scale: isocetane (2,2,4,4,6,8,8-heptamethylnonane, also called HMN) and cetane (n-hexadecane),which are fixed to 15 and 100, respectively. CN, which is used toquantify the combustion quality in diesel engines, is required byASTMD97532 to be at least 40 and byEN59033 to be at least 51. CNmeasurements are time-consuming and require a large volume ofsample (about 1 L) when obtained by running the product in asingle-cylinder cooperative fuel research (CFR) engine according toASTM D61334 or EN ISO 5165.35 Empirical equations whereparameters are physical property values have been developed topredict the CN of diesel fuels.36,37 Because of the variety of com-pounds in fuels, recent studies concern the simplification of thecomposition leading to surrogate fuels.38,39CNs of thesemixtures areestimated using a linear volume fraction mixing rule, which suggeststhe knowledge of the CN of each surrogate component.40 Somemodels dedicated to pure hydrocarbons have been reported in theliterature.18,41�44 Yang et al. have developed a predictive model forCN using a neural network trained over 21 n- and isoparaffins andreporting a R2 of 0.97.41 A few years later, Santana et al. extended thedatabase to 147 hydrocarbons and developed neural networkmodelsto predict the CN.42 The hydrocarbons in the database were dividedinto two subgroups: saturated and unsaturated compounds, withboth models showing standard errors of 8 CN units. Recently, aQSPR approach based on genetic function approximation was usedbyCreton et al. to predict theCNof hydrocarbons.43To improve theprediction (absolute average error between 2.3 and 6.9 CN units),

the authors divided the database into four different classes ofcompounds, with each one corresponding to a specific chemicalfamily: (i) n- and isoparaffins, (ii) naphthenes, (iii) aromatics, (iv)and n- and iso-olefins. To the best of our knowledge, only one of themodels presented in the literature accounts for oxygenated com-pounds, which are thought to be the main components of alternativefuels.44

The aim of the present work is to develop models for theprediction of the FP andCN for envisaged compounds in alternativejet fuels and diesel fuels, respectively. Thus, models of FP and CNpresented hereafter will be dedicated to hydrocarbons and oxyge-nated compounds. Models have been established using variousQSPR approaches. The paper is organized as follows: in section 2,we detail the databases and present QSPRmethodologies followed;in section 3, we detail predictivemodels and the calculated values arediscussed and compared to other models and/or experimentalresults, when available. This paper ends with section 4, which givesthe conclusions.

2. MATERIALS AND METHODS

2.1. Experimental Data.The predictive quality ofQSPRmodels islargely influenced by the size and quality of the database used to train them.Our FP database was built on the basis of experimental data gathered fromQSPR studies available in the literature,28,45,46 and additional experimentalFPs were gathered from sources, such as Design Institute for PhysicalProperties (DIPPR), websites of chemical companies, the chemical data-base of the University of Akron, and the information available in Yaws’handbook.47�52 Only experimental FPs of molecules belonging to a familyof interest were considered. In some cases, experimental FP values reportedfor a molecule can vary depending upon the sources of data; only one valuehas been retained for each molecule. The priority was given to valuesreported as “accepted” by theDIPPR staffmembers. The complete databasecontaining FPs of 625 hydrocarbons and oxygenated compounds can beextracted from the Supporting Information.

The database used by Creton et al. and the compendium by Murphyet al. were used to build the database of CNs of hydrocarbons andoxygenated compounds, respectively.43,53 However, because most of thedata were acquired several decades ago, an uncertainty is associated withthe CN measurement for many of these compounds.42 This error ismainly due to (i) the impurities in the chemical products, (ii) the waythat CN was measured (pure molecule or in a mixture with a fuel), and(iii) in some cases, CN calculated from reported cetene or octanenumber values. For many compounds, multiple reported values maydiffer by 5 or 10 CN units,53 and the median value was taken intoaccount. A complete database containing CNs of 299 hydrocarbons wasestablished, and all these data can be extracted from the SupportingInformation. In the case of data for which high doubt can be attributed tothe experimental reported value, compounds were set aside andcomparisons will be performed with predictions of our model.2.2. Molecular Structures. Simplified molecular input line entry

specification (SMILES) formulas were assigned to each molecule. Most ofthe SMILES were extracted from the DIPPR database, but for moleculesoutside the DIPPR database as well as for molecules having particularlycomplex specifications, such as chirality or configurations around doublebonds, SMILES formulas were manually created. The Python library Pybelwas used to convert all SMILES formulas to canonical SMILES forms.54

Three-dimensional (3D) structures of molecules were automaticallygeneratedon the basis of SMILES formulas usingPybel. All geometrieswerethen optimized usingMaterials Studio 5.0 software,55 and its Forcitemodulewith the condensed-phase optimized molecular potentials for atomisticsimulation studies (COMPASS) force field and atomic charges wereattributed using the Gasteiger method.56�58 Finally, all structures were

Page 3: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3902 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

translated and rotated tominimizemoments of inertia around the x, y, and zaxes of the Cartesian coordinate system.2.3. Molecular Descriptor Calculation and Selection. On

the basis of minimized geometries, the Materials Studio software wasused to compute a wide number of one-dimensional (1D), two-dimen-sional (2D), and 3Dmolecular descriptors.43 Given the impossibility of afull search, heuristic methods have been developed to select descriptorsof particular importance.59 The main goals of this stage of preselectioncan be summarized as (i) maximizing the usefulness of the descriptorsand (ii) minimizing the presence of descriptors that are excessivelyintercorrelated. Linear correlation coefficients, such as the well-knownPearson correlation coefficient, are commonly used and included in off-the-shelf QSPR software packages, such as Materials Studio’s QSARmodule. Katritzky et al. have successfully used multilinear regression toexplore the descriptor space in QSPR problems.60 For nonlineardependencies, it is possible to use a nonlinear model using a singlevariable as a replacement for linear correlation.61 A different family ofapproaches, known as wrapper methods, consists of descriptor selectionmethods “wrapped around” learning algorithms.62 Because of theconvenience of support vector machines (SVMs),63 a number of SVMsbased wrapper methods, such as recursive feature elimination (RFE)64

and GAs,65 have been reported in the literature. Moreover, some simplerdescriptor selection methods, such as forward selection and backwardelimination, are considered effective.66

Some of the learning algorithms used and described hereafter includea built-in feature selection or dimensionality reduction methods. For

methods having no implicit feature selection, such as ANN, SVMs, andgeneralized regression neural networks (GRNNs), forward selectionwrapped around a SVM fit was applied to the descriptors to find the bestset of descriptors. Descriptions of selected molecular descriptors andtheir values computed for each molecule belonging to the databases aregiven in the Supporting Information.

The decomposition of molecular structures into functional groups isalso a well-known technique used to generate so-called functional groupcount descriptors.67 In this study, on the basis of the functional groupslisted by Pan et al.,67 28 functional groups were identified to be relevantby analyzing chemical structures in our databases. These 28 functionalgroups are given in Table 1. The substructure search was carried outusing the SMARTS language over SMILES formulas for each compoundbelonging to the database. The results of the substructure search can befound in the Supporting Information.2.4. Data Partitioning. Before building the QSPR models, the

main database was split into three data subsets: (i) The training set isused to train each learning algorithm, and during cross-validation, it issplit in several folds, which are used as cross-validation training andcross-validation test sets. (ii) The validation set serves as an additionaltool to estimate the predictive error of individual QSPR models, as wellas an indicator to determine which models generalize better. (iii) Thetest set is used to estimate the predictive error of the final model.

The total number of compounds in the FP and CN databases allowsus to build data subsets: training, validation, and test sets consisting of70, 20, and 10% of the entire database, respectively. The split was carriedout using the random sampling functions implemented in the Rstatistical environment. Indeed, a random selection was preferredinstead of other methods in order not to favor one of the approaches.Figure 1 shows that distributions of FPs and CNs are similar for all of thedata subsets. This ensures that all subsets are representative of theoriginal data set in terms of FP and CN values.2.5. Multivariate Analysis. 2.5.1. Linear Models. Genetic func-

tion approximation (GFA), as implemented in Accelrys MaterialsStudio,55 was chosen because of its ability to find linear relationshipsover large numbers of features without overfitting. GFA uses a series ofselection and reproduction cycles coupled with an objective function,known as the Friedman lack of fit (LOF) score, to choose the best fittingmodels while controlling the number of descriptors. Partial least squares(PLS) regression, which is a well-known multivariate data analysis tech-nique, was built usingR functionalities. PLS builds new variables, known aslatent variables, by constructing orthogonal sets of linear combinations ofdescriptors, resulting in a new space with fewer dimensions.

2.5.2. Nonlinear Models.ANNs are not only widely used inQSPR butalso in machine learning in general.68 An ANN consists of a number ofnodes also called neurons connected by edges also called synapses. ANNcan be used for both the task of supervised classification as well asregression. There are many different types of ANN models. For thisstudy, the specific types of ANNmodels used were feed-forward artificialneural networks69 (FF-ANNs), graph machines70 (GMs), and GRNNs,a kernel-based method.71 Another widely used nonlinear regressionmethod, SVM regression, was also used to build predictive models.

3. RESULTS AND DISCUSSION

Tables 2 and 4 show values of various statistical parameterscomputed with respect to the validation set for predictive modelsof FP and CN obtained using methods such as GFA, PLS, FF-ANN, GRNN, SVM, and GM and two families of descriptors:molecular descriptors and functional group count descriptors.For all models, with the exception of GFA andGM, 10-fold cross-validation was applied for both fine tuning of model parametersand predictive error estimation.72 For GFA and GM, cross-validation took place in the form of leave-one-out (LOO)

Table 1. Functional Groups Identified To Be Relevanta

number group SMARTS notation

1 �H [H]

2 �CH3 [CX4H3]

3 �CH2� [CX4H2]

4 >CH� [CX4H1]

5 >C< [CX4H0]

6 dCH2 [CX3H2]

7 dCH� [CX3H1]

8 dC< [CX3H0]

9 tCH [CX2H1]

10 tC� [CX2H0]

11 (�CH2�)R [CX4H2R]

12 (>CH�)R [CX4H1R]

13 (>C<)R [CX4H0R]

14 (dCH�)R [CX3H1R]

15 (dC<)R [CX3H0R]

16 aCHa [cX3H1](:*):*

17 saCa [cX3H0](:*)(:*)*

18 �OH (alcohol) [OX2H1]

19 �OH (phenol) [OX2H1][cX3]:[c]

20 �O� [OX2H0]

21 (�O�)R [OX2H0R]

22 aOa [oX2H0](:*):*

23 >CdO [CX3H0]d[O]

24 (>CdO)R [CX3H0R]d[O]

25 �CHO [CX3H1]d[O]

26 �COOH [CX3H0](d[O])[OX2H1]

27 �COO� [CX3H0](d[O])[OX2H0]

28 aaCa [cX3H0](:*)(:*):*a Symbols are defined as follows: s, a, and R stand for simple bond,aromatic bond, and aliphatic ring, respectively.

Page 4: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3903 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

validation using the cross-validated R2 statistic and virtual-leave-one-out (V-LOO) cross-validation,70 respectively.3.1. FP Predictions. 3.1.1. Selection of the Best Predictive

Models. Statistical parameters presented in Table 2 indicate thatthree of ourmodels stand out in terms of predictive power. Thesemodels are FF-ANN—MD, FF-ANN�FGCD, and SVM—FGCD, where MD and FGCD denote molecular descriptorsand functional group count descriptors, respectively. The threemodels are roughly equivalent, exhibiting a R2 value close to 1and the lowest values of root-mean-square error (RMSE),average absolute error (AAE), and average absolute relative error(AARE) among all predictive models. With the exception ofFF-ANN and SVM approaches, the use of functional group countdescriptors seems to lead to less accuratemodels thanwhen using

molecular descriptors. Statistical coefficients obtained from the resultsof GM are not too far from those of the best models; however, theRMSE is quite high compared to that of the three selected models,which explains our choice to set aside this model.3.1.2. Consensus Modeling. QSPR models mentioned above

are based on a variety of approaches, with each one having aslightly different response formolecular structures in terms of FP.Values returned by some of these QSPRmodels may underestimateexperimental data, while othersmay lead to accurate or overestimatedpredictions. Thus, a final model hereafter labeled as the “consensusmodel” was built by averaging the predicted values of the models.Consensus modeling has previously shown improved generalizationand prediction compared to individual predictive models.73,74

A script has been created to find the best consensus model,minimizing the sum of RMSE, AAE, and absolute bias over allcombinations of single predictivemodels. The validation setwas usedto select the best consensus model. The thus-obtained consensusmodel is defined as the average of the predicted values of thefollowing models: FF-ANN—MD, FF-ANN�FGCD, SVM—FGCD, and PLS—MD. The prediction ability of the consensusmodel is summarized in Table 3 through statistical parameters and isillustrated in Figure 2. The RMSE obtained for the consensus modelwith respect to the validation set is smaller than the RMSE values ofall individual models. None of the RMSE values computed for thethree subsets is excessively greater than the experimental uncertainty,which is commonly assumed to be 10 K, and the RMSE obtained forthe test set is close to that obtained for the training set. AAEvalues arewell below the experimental uncertainty. These results confirm thatconsensus modeling improves the generalization with respect to theindividual models.3.1.3. Comparisons to Models of the Literature. Figure 2

shows a comparison between experimental and predicted FPs forthe three data subsets. Predicted values are in good agreementwith respect to experimental data, with the exception of a fewmolecules, e.g., the molecule having the highest experimental FP(533 K), 2,2-bis(hydroxymethyl)propane-1,3-diol. Although thismolecule belongs to the training set, the four selected models

Figure 1. Distributions of molecules in the FP and CN databases and in their respective data subsets.

Table 2. Statistical Parameters Computed over the Valida-tion Set for the FP Predictive Models

GFA PLS FF-ANN GRNN SVM

Molecular Descriptors

R2 0.902 0.930 0.957 0.772 0.907

RMSE (K) 16.3 13.7 10.7 22.8 16.3

AAE (K) 10.7 9.4 7.7 13.8 8.4

AARE (%) 3.3 2.9 2.4 4.2 2.5

Functional Group Count Descriptors

R2 0.775 0.811 0.957 0.427 0.951

RMSE (K) 23.1 22.3 10.9 31.4 11.6

AAE (K) 16.9 15.7 7.6 23.3 7.6

AARE (%) 5.5 5.0 2.3 7.4 2.3

GM

R2 0.922

RMSE (K) 15.1

AAE (K) 10.4

AARE (%) 3.2

Page 5: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3904 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

individually underestimate the experimental FP and the con-sensus model leads to 470 ( 7 K. For 2,2-bis(hydroxymethyl)-propane-1,3-diol, ref 48 indicates a rough value (FP greater than423 K), other websites such as www.chemblink.com and www.chemicalbook.com return 513 K, and ref 50 gives the value of473 K. Thus, a large experimental uncertainty exists for the FPvalue of this molecule, and our model after learning over allcompounds of the database indicates a value lower than thataccepted by the DIPPR staff members. With regard to methylcy-clopentadiene (experimental, 322 K; predicted, 263 K), the latestversion of theDIPPRdatabase advocates the use of a predicted valueof 255 K, in agreement with the value obtained with our consensusmodel. One can also mention the case of methyl-(Z)-docos-13-enoate (experimental, 306 K; predicted, 372 K); the website www.chemnet.com reports an experimental value of 357.75 K, in betteragreement with our prediction. These comparisons suggest that theconsensus model is resistant to noise.Table 3 presents statistical parameters computed for specific

chemical families of the database. In the case of hydrocarbons,our model leads for the test set to 8.1 and 5.1 K for RMSE andAAE, respectively. Using the values presented by Tetteh et al., wehave found that their neural network model leads to a RMSE of44.0 K and an AAE of 8.1 K for the proposed test set, whichcontained 19 hydrocarbons.24 Pan et al. have reported, for a neuralnetwork model developed especially for n- and isoparaffins, an

absolute average error of 4.8 K computed over 15 molecules of atest set.23 Comparisons between experimental and predictedvalues in the revised work by Patel et al. lead to a high value ofRMSE of 101.9 K and an AAE of 19.2 K when considering onlyhydrocarbons of the test set.46 These latest values seem to be quitehigh compared to that of our model and others. Among the 10compounds in the test set of Patel et al., predictions for 2 (tricosaneand but-2-yne) molecules strongly differ with respect to proposedexperimental values. Noting that we have found in the literatureexperimental data that are more in agreement with predicted valuesfor these two compounds, when setting aside these two outliers, theRMSE and AAE of the model by Patel et al. become 22.1 and 6.8 K,respectively. Comparisons between AAE show that our model is inline with the predictive power of models intended for the predictionof the FPs of hydrocarbons. From the RMSE values of models, itappears that the use of consensus modeling improves the stability ofpredictions.In the case of alcohols, Table 3 shows that the RMSE and AAE

of our model are for the test set to 11.0 and 9.6 K, respectively.Using proposed experimental and predicted data in the paper byTetteh et al., the RMSE and AAE are 41.3 and 7.3 K whenconsidering the 14 alcohols in the test set, respectively.24 Fromexperimental and predicted data presented in the revised work byPatel et al., the computed RMSE and AAE are 83.1 and 17.3 K forcompounds in the test set, respectively.46 AAE of models showthat our model has a predictive power similar to that by Tettehet al.; moreover, the use of consensus modeling leads to a lowRMSE value.In the case of esters, Table 3 indicates that our model leads for

compounds belonging to the test set to 21.2 and 15.8 K forRMSE and AAE, respectively. The slightly high values of RMSEand AAE compared to those obtained for hydrocarbon andalcohol compounds can be related to the highest observeduncertainty of measurements when considering fatty acid methylesters (reproducibility of about 15 K).75 We have computedusing the experimental and predicted values proposed in thepaper by Tetteh et al. the RMSE and AAE for esters in the test set,which leads to 34.2 and 7.2 K, respectively.24 Taking into account

Table 3. Statistical Parameters for the FP Consensus Model

training validation test total

Consensus Model (All Compounds)

R2 0.960 0.967 0.944 0.959

RMSE (K) 10.9 9.4 13.2 10.9

AAE (K) 7.2 6.3 8.4 7.1

AARE (%) 2.2 1.9 2.5 2.2

bias (K) 0.3 0.7 �0.9 0.2

n 437 125 63 625

Consensus Model (Hydrocarbons)

R2 0.970 0.977 0.965 0.971

RMSE 8.2 5.8 8.1 7.8

AAE 5.3 4.3 5.1 5.1

AARE (%) 1.9 1.5 1.7 1.8

bias (K) 0.1 �0.1 �1.5 �0.1

n 111 35 16 162

Consensus Model (Alcohols)

R2 0.883 0.930 0.917 0.896

RMSE 15.4 11.1 11.0 14.4

AAE 10.0 8.0 9.6 9.5

AARE (%) 2.6 2.0 2.9 2.5

bias (K) �1.1 �3.4 �9.6 �2.1

n 49 15 4 68

Consensus Model (Esters)

R2 0.900 0.905 0.857 0.896

RMSE 13.3 13.8 21.2 14.3

AAE 9.4 9.5 15.8 10.0

AARE (%) 2.8 2.8 4.4 2.9

bias (K) 1.5 3.9 3.3 2.1

n 227 63 38 328

Figure 2. Comparison between experimental and predicted FPs usingthe consensus model intended for hydrocarbons, alcohols, and esters.The dashed blue line is the ideal prediction, and the gap between bluelines denotes the experimental uncertainty on measurements(commonly assumed to be 10 K).

Page 6: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3905 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

the slightly larger range of temperatures considered in our study(253�466 K in our study, against 253�421 K in the study byTetteh et al.), our consensus model leads to average predictionswith higher errors than that of the model by Tetteh et al.;however, their model seems to be less stable than ours. Khajehet al. have developed QSPR models only intended for theprediction of the FPs of esters, using two approaches: GFAand adaptive neuro-fuzzy inference system (ANFIS).76 Themodels that result from these two approaches present similarRMSE and AAE: 21.8 and 16.7 K (GFA) and 20.2 and 16.2 K(ANFIS). Statistical parameters computed for our model that hasbeen trained on various chemical families are similar to thoseobtained for the model by Khajeh et al.3.1.4. Prediction of FPs for Some Compounds. The consensus

model was used to predict FPs of compounds for which to thebest of our knowledge no experimental value exists in theliterature. The evolutions of FP values when increasing n arepresented in Figures 3 and 4; n denotes either the number of carbonatoms along the principal chain or the position of a specific functionalgroup. Figures 3 and 4 were drawn using information contained inthe database (see the Supporting Information), and because of theuncertainty of predictions, data have been fit using polynomialequations. Resulting tendencies presented in Figure 3 show thatmoving a group (methyl or ethyl) from position 2 to positions 3, 4,andmore on themain chain has scarcely any impact on the FP value.Figure 3 shows that the FP of nC-cyclohexane (a single linear chainattached to a cyclohexane) is slightly lower than that of nC-benzenewhen n is lower than 4, while it is the contrary when larger values of nare considered. We have drawn curves of the iso-carbon atomnumber using the FPs of the molecules considered in Figure 3.The FP appears as mainly dependent upon the total number ofcarbon atoms in the molecule, in agreement with empirical correla-tions linking the FP with the boiling point, which is correlated to thenumber of carbon atoms.18 In the case of jet fuel applications,hydrocarbons satisfying a FP of at least 311.15K (38 �C,which is thelimit required by the ASTM D1655) are those containing a totalnumber of carbon atoms of 10 andmore. Figure 4, which is devotedto alcohols, indicates that, when considering a small length in theprincipal chain, FPs of linear alcohols are higher than those of n-paraffins. Figure 4 shows that the FP of primary alcohol > the FP ofsecondary alcohol > the FP of tertiary alcohol, at a fixed number ofcarbon atoms. Furthermore, molecules with two alcohol functionsare above the limit of 311.15 K (38 �C), and their FP seems toincrease with the distance between the two alcohol functions.Because most experimental and/or predicted FPs of esters aregreater than the limit fixed by the ASTM, we have chosen not toinclude FP tendencies of esters in this paper. Although triesters arenot represented in the training set, we have attempted to predict FPsfor some compounds, such as 2,3-di(dodecanoyloxy)propyl dode-canoate (predicted FP of 563 ( 231 K) or 2,3-bis[[(Z)-octadec-9-enoyl]oxy]propyl (Z)-octadec-9-enoate (predicted FP of 571 (388 K). These huge uncertainties come from the PLS—MD, whichleads to a FP of about 1000 K for these molecules. Considering onlypredictions of FF-ANN—MD, FF-ANN�FGCD, and SVM—FGCD models, predicted FPs are 448 ( 45 and 379 ( 72 K for2,3-di(dodecanoyloxy)propyl dodecanoate and 2,3-bis[[(Z)-octa-dec-9-enoyl]oxy]propyl (Z)-octadec-9-enoate, respectively.3.2. CN Predictions. 3.2.1. Selection of the Best Predictive

Models. Statistical parameters presented in Table 4 indicate thepredictive power of the QSPR models developed using variousapproaches. Because of experimental values of CN equal to 0 inthe database, we have chosen not to consider AARE for this

property. Among all models presented in Table 4, six seem to standout in terms of predictive power: SVM—MD, GM, FF-ANN—MD, GFA—MD, SVM—FGCD, and FF-ANN�FGCD. TheGFA approach working on a set of molecular descriptors appearsas an accurate approach to use when developing a QSPR model forthe prediction of theCN.This fact is in linewith thework byCretonet al., who have chosen this kind of approach to model the CN ofpure hydrocarbons and have reported absolute average errors lowerthan 6.9.43 The fact that the AAE of our GFA—MD is 9.3 can beattributed (i) to the consideration of oxygenated compounds in thedatabase and (ii) to the non-subdivision of the database into subsetsrepresentative of chemical families. These items are strengthenedfrom the work by Taylor et al., who have used a GA approach on adatabase including hydrocarbons and oxygenated compounds andhave reported a standard error of 9.1 for their predictive model.44

Taylor et al. have obtained lower standard errors after classifyingtheir database by chemical family. Table 4 shows that the use ofnonlinear approaches, such as GM, SVM, and FF-ANN, leads toabsolute average errors of about 8.4 CNs. Taking into account thatour database includes oxygenated compounds, the predictive ability

Figure 3. Tendencies of the FP evolution for some hydrocarbons whenincreasing n. The parameter n denotes the number of carbon atomsalong the principal chain in some hydrocarbons. The iso-number ofcarbon atom curves are represented in dashed gray.

Figure 4. Tendencies of the FP evolution for some alcohols with theparameter n. The parameter n denotes either the number of carbonatoms along the principal chain or the position of a specific functionalgroup. The dotted line presents FP values for n-paraffins.

Page 7: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3906 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

of our nonlinear models developed using GM, SVM, and FF-ANNmethods is consistent with models previously obtained by Santanaet al.42 Indeed, these authors have reported AAE of 8 CNs for anANN model intended for the prediction of hydrocarbons.3.2.2. Consensus Modeling.Our script dedicated to the search

for the best consensus model among all linear combinations ofsingle predictive models was used. The selection of the consensusmodel was performed through comparisons between predictedand experimental data in the validation set. The thus-obtainedconsensus model is built by averaging the predicted values of FF-ANN—MD, GRNN—MD, SVM—MD, SVM—FGCD, andGM. The predictive ability of the consensus model is summar-ized in Table 5 through statistical parameters and is illustrated inFigure 5. None of the RMSE values computed for the threesubsets is excessively greater than the experimental uncertainty,which is commonly assumed to be 5 for hydrocarbons andgreater for oxygenated compounds, and the RMSE obtainedfor the test set is close to that computed for the training set. Onecan remark that AAE values are comparable to the experimentaluncertainty and that AAE values resulting from our model areinferior to those reported in refs 42 and 44, which are 8 and 9 CNunits, respectively. These results confirm that consensus modelingimproves the generalization with respect to individual models.3.2.3. Prediction of CN for Some Compounds. Figure 5

presents a comparison between experimental and predictedCNs for the three data subsets plus one that gathers all com-pounds set aside because of the doubt attributed to the reportedexperimental values. The agreement between the results of theconsensus model and experimental values is good. However,large standard deviations can be observed, which indicate that

individual predictive models do not agree. It is important to notethat both CNs of isocetane (2,2,4,4,6,8,8-heptamethylnonane)and cetane (n-hexadecane) are well-predicted by our model.Predicted CNs of compounds placed in the prediction set seemto agree with reported experimental values. However, some largedeviations between predicted and experimental values are ob-served in the extreme parts of Figure 5. As an example, for n-propane, the reported experimental value is �20, while ourmodel leads to 29 ( 11 CN units. Considering all experimentalCNs of n-paraffins, a CN of 29 seems to be more reliable than thereported experimental value (see Figure 6). Octyloleate anddodecyloleate were set aside because of the large experimentalCN reporting 131 and 134, respectively. The consensus model,as all of its constitutive predictive models, leads to values largelybelow experimental values for octyloleate and dodecyloleate(86 ( 7 and 94 ( 12, respectively).

Table 4. Statistical Parameters Computed over the Valida-tion Set for the Global CN Predictive Models

GFA PLS FF-ANN GRNN SVM

Molecular Descriptors

R2 0.794 0.635 0.845 0.435 0.897

RMSE 12.0 15.1 11.3 16.0 8.6

AAE 9.3 11.4 8.2 11.6 6.9

Functional Group Count Descriptors

R2 0.123 0.428 0.733 0.601 0.802

RMSE 16.9 15.3 12.5 18.1 12.4

AAE 13.7 11.9 9.3 14.2 8.4

GM

R2 0.852

RMSE 11.0

AAE 8.1

Table 5. Statistical Parameters for the Consensus GlobalModel of CN

training validation test total

R2 0.950 0.883 0.913 0.934

RMSE 5.5 8.7 6.5 6.3

AAE 4.1 6.5 5.7 4.7

bias 0.0 �0.5 0.4 0.0

n 164 46 19 229

Figure 5. Comparison between experimental and predicted CNs usingthe consensus model intended for hydrocarbons, alcohols, and esters.The dashed blue line is the ideal prediction, and the gap between bluelines denotes the experimental uncertainty on measurements(commonly assumed to be 5 CN units).

Figure 6. Tendencies of the CN evolution for some families ofcompounds with the parameter n. The parameter n denotes either thenumber of carbon atoms along the principal chain or the position of aspecific functional group.

Page 8: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3907 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

The consensus model was used to predict CNs of compoundsfor which to the best of our knowledge no experimental valueexists in the literature. Using predicted CN, one can observe that,when the methyl group is in position 2 on the principal chain, itleads to a slightly higher CN than when this group is in positions3, 4, and more, e.g., methyloctane, methylnonane, or methyl-decane.43 The evolutions of CN when increasing a parameter nare presented in Figure 6; n denotes either the number of carbonatoms along the principal chain or the position of a specificfunctional group on the principal chain. Figure 6 was drawn usinginformation contained in the database (see the SupportingInformation). Because of the uncertainty associated with predic-tions, data have been fit with polynomial equations. From thethus-obtained tendencies, the evolution of the CN of n-paraffiniccompounds when increasing the number of carbon atomsappears similar to previously published curves.43,77 Figure 6presents a comparison between the CNs of nC-cyclohexaneand nC-benzene compounds, with n in the range of 1�20. Foreach value of n, the difference between CNs of these two familiesof compounds is roughly constant, 25 CN units. Figure 6 alsoshows the impact of the presence of an alcohol group in a linearparaffin. Thus, the CNs of nC-1-ol compounds are about 20 CNunits lower than that of the corresponding n-paraffin. A 5 CNunit difference is observed between CNs when the OH group ismoved from position 1 to 2. Predictions in the database show thatthe CN of primary alcohols (nC-1-ol) > the CN of secondaryalcohols (nC-2-ol) > the CN of tertiary alcohols (2-methyl-nC-2-ol or 3-methyl-nC-3-ol), at a fixed number of carbon atoms. Inthe case of molecules with two alcohol groups, the localization ofthe group does not seem to impact the CN value, e.g., the case ofC5�1,n-diol compounds. With regard to esters that can bewritten as ROC(dO)R0, Figure 6 shows that the CN does notdepend upon the localization of the main carbon chain, i.e.,whether nC is located on R or R0.

4. CONCLUSION

QSPR models have been developed to model the FP and CNof molecules likely to be found in alternative fuels, i.e., hydro-carbons, alcohols, and esters. Experimental values of FP and CNwere gathered using different sources and/or data available in theliterature. For both properties, various approaches were investi-gated from linear modeling methods, such as GAs and PLS, tononlinear methods, such as FF-ANNs, GRNNs, SVMs, andGMs. For both properties, none of the obtained models wassignificantly more accurate than others; thus, consensus model-ing was used because this methodology is known to improvegeneralization and predictive power compared to individualpredictive models.

For the two properties of interest, the obtained computedresults using the predictive consensus models are in goodagreement with respect to experimental data and average abso-lute deviations are similar to the experimental uncertainties. ForFP, the predictive power of our consensus model was comparedto that of models in the literature. We have shown that our modelreproduces the FPs of molecules belonging to the target familiesof compounds at least as well as other QSPR models in theliterature specifically developed for each chemical family. More-over, deviations for each chemical family matched their respectiveexperimental uncertainty, particularly in the case of hydrocarbonsand esters. Consequently, we have used our model to estimatethe FPs of compounds for which to the best of our knowledge no

experimental FP can be found in the literature. Using thesevalues, we have extracted information about the evolution ofthe FP when increasing the number of carbon atoms or movingthe position of a specific functional group. Thus, the FP appearsto be mainly dependent upon the total number of carbon atomsin the molecule. It is noteworthy that correlations between theFP and boiling point have been established, with this laterproperty being correlated to the number of carbon atoms ormolecular weight.18 In the case of CN, only a few works of theliterature deal with the prediction of the CNs of pure hydrocarbonsand oxygenated compounds. The predictive power of our consensusmodel is at least the same as that of models in the literature, whichare often only devoted to hydrocarbons. Predictions were realizedfor compounds for which to our knowledge no experimental CNshave been measured. Using this information, we have shown howthe CN evolves when adding one or two alcohol groups to a carbonchain and when moving these groups along the carbon chain.

This work shows that, when using good quality databases andvarious QSPR approaches and when applying consensus model-ing, the thus-obtained predictive models are powerful tools toestimate the property values of complex and/or difficult to isolatehydrocarbons and oxygenated compounds. This tool can beuseful to select a molecule according to its properties. As anexample, the first linear alcohol that satisfies the limit of 311.15 K(38 �C) is pentan-1-ol. Ethanol, which is used as an alternativefuel for automotives, with a FP of 286 K, is not convenient foraeronautic uses. This study is to be extended to other stringentproperties of gasoline, jet, and diesel fuels.

’ASSOCIATED CONTENT

bS Supporting Information. Brief description of the molec-ular descriptors in QSPR models and values of moleculardescriptors computed for each molecule belonging to the train-ing, validation, and test sets. This material is available free ofcharge via the Internet at http://pubs.acs.org.

’AUTHOR INFORMATION

Corresponding Author*E-mail: [email protected].

’REFERENCES

(1) International Energy Agency (IEA).World Energy Outlook 2010;IEA: Paris, France, 2010; http://www.iea.org.

(2) Schulz, H. Appl. Catal., A 1999, 186, 3.(3) Murata, K.; Liu, Y.; Inaba, M.; Takahara, I. Energy Fuels 2010,

24, 2404.(4) Weiss, W.; Dulot, H.; Quignard, A.; Charon, N.; Courtiade, M.

Proceedings of the International Pittsburgh Coal Conference; Istanbul,Turkey, Oct 11�14, 2010.

(5) Carlson, T. R.; Tompsett, G. A.; Conner, W. C.; Huber, G. W.Top. Catal. 2009, 52, 241.

(6) Demirbas, A. Energy Convers. Manage. 2009, 50, 14.(7) Demirbas, M. F. Appl. Energy 2009, 86, S151.(8) American Society for Testing and Materials (ASTM). ASTM

D1655, Standard Specification for Aviation Turbine Fuels; ASTM: WestConshohocken, PA, 2011.

(9) Liaw, H.-J.; Lee, Y.-H.; Tang, C.-L.; Hsu, H.-H.; Liu, J.-H. J. LossPrev. Process Ind. 2002, 15, 429.

(10) Vidal, M.; Rogers, W. J.; Mannan, M. S. Process Saf. Environ.Prot. 2006, 84, 1.

(11) Catoire, L.; Paulmier, S.; Naudet, V. Process Saf. Prog. 2006, 25, 33.

Page 9: Flash Point and Cetane Number Predictions for Fuel Compounds Using Quantitative Structure Property Relationship (QSPR) Methods

3908 dx.doi.org/10.1021/ef200795j |Energy Fuels 2011, 25, 3900–3908

Energy & Fuels ARTICLE

(12) Liaw, H.-J.; Lin, S.-C. J. Hazard. Matter. 2007, 140, 155.(13) Liaw, H.-J.; Gerbaud, V.; Chiu, C.-Y. J. Chem. Eng. Data 2010,

55, 134.(14) Liaw, H.-J.; Gerbaud, V.; Li, Y.-H. Fluid Phase Equilib. 2011,

300, 70.(15) Pidol, L; Lecointe, B.; Jeuland, N. SAE Int. J. Fuels Lubr.

2009No. 2009-01-1807.(16) Vidal, M.; Rogers, W. J.; Holste, J. C.; Mannan,M. S. Process Saf.

Prog. 2004, 23, 47.(17) Liu, X.; Liu, Z. J. Chem. Eng. Data 2010, 55, 2943.(18) Katritzky, A. R.; Kuanar, M.; Slavov, S.; Hall, C. D. Chem. Rev.

2010, 110, 5714.(19) Gramatica, P.; Navas, N.; Todeschini, R. Trends Anal. Chem.

1999, 18, 461.(20) Khajeh, A.; Modarress, H. J. Hazard. Matter. 2010, 179, 715.(21) Zhokhova, N. I.; Baskin, I. I.; Palyulin, V. A.; Zefirov, A. N.;

Zefirov, N. S. Russ. Chem. Bull. 2003, 52, 1885.(22) Mathieu, D. J. Hazard. Matter. 2010, 179, 1161.(23) Pan, Y.; Jiang, J.; Wang, Z. J. Hazard. Matter. 2007, 147, 424.(24) Tetteh, J.; Suzuki, T.; Metcalfe, E.; Howells, S. J. Chem. Inf.

Comput. Sci. 1999, 39, 491.(25) Katritzky, A. R.; Stoyanova-Slavova, I. B.; Dobchev, D. A.;

Karelson, M. J. Mol. Graphics Modell. 2007, 26, 529.(26) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. Energy Fuels

2008, 22, 1628.(27) Gharagheizi, F.; Abbasi, R. Ind. Eng. Chem. Res. 2010, 49,

12685.(28) Carroll, F. A.; Lin, C.-Y.; Quina, F. H. Energy Fuels 2010,

24, 4854.(29) Batov, D. V.;Mochalova, T. A.; Petrov, A. V. Russ. J. Appl. Chem.

2011, 84, 54.(30) Carroll, F. A.; Lin, C.-Y.; Quina, F. H. Ind. Eng. Chem. Res. 2011,

50, 4796.(31) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.;

Richon, D. Ind. Eng. Chem. Res. 2011, 50, 5877.(32) American Society for Testing and Materials (ASTM). ASTM

D975, Standard Specification for Diesel Fuel Oils; ASTM: West Con-shohocken, PA, 2011.(33) European Standards Organization (CEN). EN590 Standard

Specification on the Quality of European Diesel Fuel; CEN: Brussels,Belgium, 2009.(34) American Society for Testing and Materials (ASTM). ASTM

D613, Standard TestMethod for Cetane Number of Diesel Fuel Oil; ASTM:West Conshohocken, PA, 2010.(35) European Standards Organization (CEN). EN ISO 5165,

Standard Test Method for Cetane Number of Diesel Fuel Oil; CEN:Brussels, Belgium, 2009(36) De la Paz, C.; Rodríguez, J. E.; Valentin, C. P.; Ramos, E. R. Pet.

Sci. Technol. 2007, 25, 1225.(37) €Ozdemir, D. Pet. Sci. Technol. 2008, 26, 101.(38) Pitz, W. J.; Mueller, C. J. Prog. Energy Combust. Sci. 2011, 37, 330.(39) Huber, M. L.; Lemmon, E. W.; Bruno, T. J. Energy Fuels 2010,

24, 3565.(40) Ghosh, P.; Jaffe, S. B. Ind. Eng. Chem. Res. 2006, 45, 346.(41) Yang, H.; Fairbridge, C.; Ring, Z. Pet. Sci. Technol. 2001,

19, 573.(42) Santana, R. C.; Do, P. T.; Santikunaporn, M.; Alvarez, W. E.;

Taylor, J. D.; Sughrue, E. L.; Resasco, D. E. Fuel 2006, 85, 643.(43) Creton, B.; Dartiguelongue, C.; de Bruin, T.; Toulhoat, H.

Energy Fuels 2010, 24, 5396.(44) Taylor, J. D.; McCormick, R. L.; Clark, W. Report on the

Relationship between Molecular Structure and Compression Ignition Fuels,Both Conventional and HCCI; National Renewable Energy Laboratory(NREL): Golden, CO, 2004; MP-540-36726(45) Catoire, L.; Naudet, V. J. Phys. Chem. Ref. Data 2004, 33, 1083.(46) (a) Patel, S. J.; Ng, D.; Mannan, M. S. Ind. Eng. Chem. Res. 2009,

48, 7378. (b) Patel, S. J.; Ng, D.; Mannan, M. S. Ind. Eng. Chem. Res.2010, 49, 8282.

(47) Rowley, R. L.; Wilding, W. V.; Oscarson, J. L.; Yang, Y.; Zundel,N. A.; Daubert, T. E.; Danner, R. P.Design Institute for Physical Properties(DIPPR) Data Compilation of Pure Compound Properties; DIPPR,American Institute of Chemical Engineers (AIChE), New York, 2003.

(48) http://www.sigmaaldrich.com/france.html(49) http://www.alfa.com/fr/gh100w.pgm(50) http://www.lookchem.com/(51) http://ull.chemistry.uakron.edu/erd/(52) Yaws, C. L. Chemical Properties Handbook: Physical, Thermo-

dynamic, Environmental, Transport, Safety and Health Related Propertiesfor Organic and Inorganic Chemicals; McGraw-Hill: New York, 1999.

(53) Murphy, M. J.; Taylor, J. D.; McCormick, R. L. Compendium ofExperimental Cetane Number Data; National Renewable Energy Labora-tory (NREL): Golden, CO, 2004; SR-540-36805.

(54) O’Boyle, N. M.; Morley, C.; Hutchison, G. R. Chem. Cent. J.2008, 2, 5.

(55) Accelrys Software, Inc. Materials Studio, Release 5.0; AccelrysSoftware, Inc., San Diego, CA, 2009.

(56) Sun, H. J. Phys. Chem. B 1998, 102, 7338.(57) Sun, H.; Ren, P.; Fried, J. R. Comput. Theor. Polym. Sci. 1998,

8, 229.(58) Gasteiger, J.; Marsili, M. Tetrahedron 1980, 36, 3219.(59) Katritzky, A. R.; Lobanov, V.; Karelson, M. CODESSA: Refer-

ence Manual; University of Florida: Gainesville, FL, 1996.(60) Katritzky, A. R.; Kuanar, M.; Dobchev, D. A.; Vanhoecke,

B. W. A.; Karelson, M.; Parmar, V. S.; Stevens, C. V.; Bracke, M. E.Bioorg. Med. Chem. 2006, 14, 6933.

(61) daCosta Couto,M. P.Neural Comput. Appl. 2009, 18, 891–901.(62) Kohavi, R.; John, G. Artif. Intell. 1997, 97 (1�2), 273–324.(63) (a) Vapnik, V. N. The Nature of Statistical Learning Theory;

Springer: Berlin, Germany, 1995. (b) Vapnik, V. N. Statistical LearningTheory; John Wiley and Sons: New York, 1998.

(64) Xue, Y.; Li, Z. R.; Yap, C. W.; Sun, L. Z.; Chen, X.; Chen, Y. Z.J. Chem. Inf. Comput. Sci. 2004, 44, 1630–1638.

(65) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Cui, Y. J. Hazard. Mater.2009, 168, 962–969.

(66) Guyon, I.; Elisseeff, A. J. Mach. Learn. Res. 2003, 3, 1157.(67) Pan, Y.; Jiang, J.; Wang, R.; Cao, H.; Zhao, J. QSAR Comb. Sci.

2008, 27, 1013.(68) Simoes, M. IEEE Trans. Ind. Electron. Control Instrum. 2003,

50, 585.(69) Billings, S. Int. J. Control 1992, 56, 319.(70) Goulon, A.; Duprat, A.; Dreyfus, G. Lect. Notes Comput. Sci.

2006, 4135, 1.(71) Niwa, T. J. Chem. Inf. Comput. Sci. 2003, 43, 113.(72) Ambroise, C.; McLachlan, G. J. Proc. Natl. Acad. Sci. U.S.A.

2002, 99, 6562.(73) Tropsha, A. Mol. Inf. 2010, 29, 476.(74) Grammatica, P.; Giani, E.; Papa, E. J. Mol. Graphics Modell.

2007, 25, 755.(75) American Society for Testing and Materials (ASTM). ASTM

D3828, Standard Test Method for Flash Point; ASTM: West Consho-hocken, PA, 2009.

(76) Khajeha, A.; Modarressb, H. J. Hazard. Mater. 2010, 79, 715.(77) Ghosh, P. Energy Fuels 2008, 22, 1073.


Recommended