White Paper Prepared for the ACC LRI on Chemical Space ...

White Paper Prepared for the ACC LRI on Chemical Space Analysis: An Approach for Evaluating Similarity of Substances for Exposure Estimations1 Executive Summary With thousands of chemicals in commerce, there is a need for tools that can rapidly generate data for prioritizing human and environmental risk due to chemical exposures. While high-throughput tools exist for filling data gaps, the use of human-relevant fate and transport tools are advantageous for providing more realistic exposure characterizations. This paper describes the development of chemical space analysis to organize 13,000 chemicals from the CERAPP project using physicochemical properties that are important for identifying likely exposure scenarios. Random Forest analysis was utilized for predicting three exposure proximity classes: near-field (NF), far-field (FF), and pharmaceutical (Rx), based on U.S. EPA Aggregated Computational Toxicology Online Resource (ACToR), after which a principal component analysis was employed using the highest performing decision algorithm. A 5-fold cross validation classification accuracy of 75% was achieved when simultaneously utilizing two descriptor sets: 1. physicochemical property descriptors including OPERA predicted properties and 2. structural signatures in the form of DSSTox ToxPrint chemotypes. The model was then applied to over 3600 chemicals lacking original ACToR categories, but for which there existed product use information compiled in the Consumer Product Categories Database that were mapped to NF, FF, and Rx classes. The classification accuracy for this set was 53% indicating that there exists a combination of physicochemical parameters that are predictive of chemical proximity for a subset of CERAPP chemicals. This approach should provide a useful and efficient framework for prioritizing efforts, assuring that the differential properties of thousands of compounds undergoing further exposure assessments are adequately considered based on their use and proximity.

1 December 30, 2019. This white paper was prepared by ScitoVation scientists with support provided by ACC LRI. The contributing scientists included Chantel I. Nicolas, Saad Haider, Kamel Mansouri, Jeremy Fitzpatrick, Salil N. Pendse, Marjory Moreau, Cory Strope, Patrick D. McMullen, and Harvey J. Clewell. ScitoVation scientists thank Katherine A. Phillips and John F. Wambaugh for helpful discussions.

1

Chemical Space Analysis ACC LRI White Paper

December 30, 2019 Background There is an increasing recognition of the need to use efficient approaches to assess the risk of high numbers of chemicals within a short timeframe. The use of a tiered paradigm has been proposed for prioritizing thousands of chemicals for further study. This tiered testing approach needs to account for currently available in vitro, in vivo, and in silico hazard and exposure data in order to identify chemicals that may require more immediate focus using limited resources (1, 2). This strategy moves from lower tiers focused on rapid triage of chemicals and prioritization to higher tiers with increased accuracy and reduced uncertainty for predicting risk of high priority chemicals (3). The integration of in vitro and in silico approaches in a tiered testing strategy is an important component in the overall plan to evaluate the thousands of chemicals regulated under the Toxic Substances Control Act (TSCA) and reduce the requirement for animal testing by targeting subsequent in vivo toxicity testing (4). With limited human-relevant hazard and exposure data, high-throughput (HT) tools have been applied for filling data gaps for risk prioritization. Thresholds of toxicological concern (TTCs) have been demonstrated to be a useful tool for filling gaps in hazard surrogate data (5, 6). TTC is a level of exposure that is considered to be of no appreciable risk to human health despite the absence of chemical-specific toxicity data (7, 8). Similarly, the U.S. EPA ExpoCast (9) program’s HT predictions have provided an accessible exposure context for large-scale risk prioritization of chemicals for further investigation (8). At this point, there is a need to better characterize risk at lower tiers of prioritization by using fit-for-purpose tools that provide a mass balance and environmental fate and transport approach for exposure evaluations (10, 11). Towards this end, there have been efforts to predict functional use of thousands of chemicals in products based on their structures (12).

Quantitative data on product chemical composition is a necessary parameter for characterizing exposure. Therefore, two databases called the Aggregated Computational Toxicology Resource (ACToR) and the Chemical Product Categories Database (CPCat) were used in this work to provide information on the kind of exposure the population may be exposed to (12, 13). ACToR is a database and set of software applications that bring into one central location many types and sources of data (information on chemical structure, in vitro bioassays and in vivo toxicology assays) on environmental chemicals coming from all over the world (EPA, Centers for Disease Control (CDC), government agencies in Canada, the World Health Organization (WHO) and more (13). CPDat is a database containing information mapping chemicals to a set of terms categorizing their usage or function in 16,000 consumer products (e.g., shampoo, soap) types based on what chemicals they contain (14). The knowledge of chemical uses via these two databases can be used for informing the use of fit-for-purpose exposure tools and cheminformatics approaches to predict whether chemical stressors are likely from proximate sources. A set of 13,292 chemicals used in this study was identified from compounds appearing in both the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) and ACToR (15). This compilation was derived from different lists of chemicals that the EPA has identified as man-made chemicals to which humans might be significantly exposed. These chemicals were grouped into three exposure proximity classes: near-field (NF), far-field (FF), or pharmaceutical (Rx). These proximity classes are important considerations for risk prioritization in that they provide context for potential exposure scenarios. Chemicals that are considered as near-field tend to persist in the indoor environment for longer periods of time. Some near-field chemicals are semi-volatile organic compounds (SVOCs), which can persist in the air and adhere to material surfaces, dust, and airborne particles, leading to ingestion, inhalation, or dermal contact (16). SVOCs such as polybrominated diphenyl ethers (PBDEs) persist in the indoor environment and may cause endocrine system disruption (17, 18). Because people tend to spend more time in the indoor rather than the outdoor environment, higher exposures to potential toxic compounds are likely to occur in these NF exposures (19, 20).

2


December 30, 2019 While near-field consumer exposure can be seen as a bubble surrounding the user that moves throughout the room with the user, far-field population exposure refers to aggregated intakes via environmental emissions.

To provide a risk prioritization context for the chemical space analysis, we applied machine learning techniques – specifically a Random Forest model - for predicting NF, FF, and Rx classes. The descriptor set encompasses 19 predicted physicochemical properties from the OPEn structure-activity/property Relationship App (OPERA) including octanol-water partition coefficient, water solubility, and vapor pressure and 729 chemically unique U.S. EPA ToxPrint chemotypes (21, 22). Chemical bioactivity and pharmacokinetics are heavily influenced by physicochemical properties including volatility, water solubility, and lipophilicity. These properties can be thought of as dimensions within which compounds can be categorized (Figure 1). Compounds with similar properties can be grouped together and data for compounds with similar properties can be used to fill gaps in the knowledge base for other compounds in the same classification group. For example, a previous study evaluated the possibility of predicting in vivo kinetics of volatile organic compounds (VOCs) using PBPK models derived solely on the basis of physiological data and quantitative structure-property relationship (QSPR) modelling (23, 24). They found that acceptable predictions could be made for the inhalation of lipophilic VOCs, such as trichloroethylene, but the necessary QSPR algorithms were not available to apply the methodology to water soluble VOCs such as acetone. As high throughput approaches become increasingly important for rapid chemical risk prioritization, we look to extend this approach to more diverse groups of chemistries.

The first objective of this study is to provide a three-dimensional visualization scheme for qualitatively summarizing characteristics of 13,000 chemicals from the U.S. EPA CERAPP list using lipophilicity, water solubility, and predicted volatility predicted using validated quantitative structure-property relationship models (15, 21). We then use these, and other physicochemical property estimates to predict likely proximity classes in order to provide insight into what lower-throughput exposure modeling tools might be used to provide realistic exposure estimations.

Figure 1. Conceptual illustration of three-dimensional characterization of chemical properties.

3


December 30, 2019 METHODS

Data curation A set of 13,292 chemicals used in this study was identified from compounds appearing in both the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) and ACToR (13, 15). This compilation was derived from different lists of chemicals that the EPA has identified as man-made chemicals to which humans might be significantly exposed. These chemicals cover a variety of use classes, including consumer products, food additives, and human and veterinary drugs. These chemicals were then processed by a KNIME structure-standardization workflow (25). The workflow begins by parsing the SDF files and removing inorganic compounds. It then standardized structures and converts them into SMILES before removing duplicates. At this point the structures are checked in a node to remove those with unacceptable atom types. From here QSAR ready SMILES, 2d descriptors, and 3d coordinates are all created and written as output files (26). To characterize differences between the chemicals in this list, we defined the pharmaceutical, near- and far-field environmental chemical classes by using the ACToR use categories via the CPCat (13). The ACToR use categories are built using chemical data from federal, state, and international regulatory listings for chemicals falling into specific classes. Chemicals classed as near-field were categorized by any of the following terms: fragrance, personal care, consumer use, colorant, food additive, flame retardant, and antimicrobial. Far-field chemicals were identified by the ACToR categories: industrial, manufacturing, petrochemical, chemical warfare, fertilizer, herbicide, inert, and pesticide. Pharmaceuticals aligned with chemicals categorized as drugs in the ACToR database. Individual chemicals may be annotated with multiple classes and thus appear in multiple categories, the three proximity classes were then collapsed into one column in order to simplify the model. The proximity classes were ranked as follows: Rx>NF>FF. If the chemical was binned into both the NF and FF classes, then it was classified as NF.

OPERA, a free open-source and open-data suite of QSAR models for physicochemical properties and environmental fate endpoints, was used to generate physicochemical properties for the dataset (20). Additional details can be accessed from the command-line output files or online through the Computational Toxicology Dashboard of the U.S. EPA (https://comptox.epa.gov/dashboard/).

For the CERAPP chemicals that were not mapped to ACToR (3,813 chemicals), CPCat consumer use and functional use keywords were used to determine likely exposure categories. This set of chemicals may be used to validate proximity predictions. If the term “drug” was among the consumer or functional uses of the chemical, then it was classed as a pharmaceutical. A chemical was classed as near-field if any of the following keywords were listed: "consumer_use", "food_contact", "fragrance", "colorant", "furniture", "cosmetics", "personal_care", "child_use", "toys", "sports_equipment", "electronics", and "hair_dye". All others were classed as far-field.

For visualization, decisions of whether chemicals were regarded as VOC or SVOC were based on the thresholds: if VP ≥ 0.1 then they are designated as a VOC; if VP < 0.1, then it is an SVOC (California Air Resources Board, 2013). Also, chemicals were further grouped into classes based on the presence of strings in their names (i.e., if “phthalate” exists in the name, then the chemical belongs to the phthalate class). The same was done for esters, dioxins, siloxanes, alcohols, and amides. However, in order to simplify our initial application of this methodology, chemicals that did not fall into any of the groupings by name were binned into a group called “other”. EPA ToxPrint chemotypes were added to the database along with binary indicators: 1 if the structural signature exists in the chemical and 0 if it did not. The mode had a total of 748 descriptors

4

https://comptox.epa.gov/dashboard/


December 30, 2019 consisting of 19 OPERA-predicted physicochemical property descriptors and 729 chemotypes. The final dataset is provided in supplemental spreadsheet Table 1.

Principal Component Analysis We employed a principal component analysis on all predictors (physicochemical properties and chemotypes) and analyzed the contribution of predictors in the first two principal components to understand the correlation of variables and their effect on the data. Kernel density functions were computed for the first two principal components of the predictors of NF, FF, and Rx chemicals. The Frobenius distance between the kernel distributions was considered to find differences and similarity of chemicals in different proximity classes. Random Forest Predictions The ToxPrint chemotypes and physicochemical properties were used both separately and together as Random Forest descriptors to classify the chemicals as NF, FF, or Rx (27). A covariance-based variable importance measure was utilized in the Random Forest classifier in order to arrive at the best descriptors (28). The random forest model was created in Matlab using the fit ensemble function. Options were set as follows: “Bag”, 100, “Tree”, “Type”, “Classification”. This will generate a forest of 100 trees using bagging, it will be a classification model as opposed to a regression model. RESULTS

Three-Dimensional Visualization of Chemical Space After compiling the final chemical descriptor set, several three-dimensional visualization tools were designed for organizing chemicals of interest based on lipophilicity (LogP), solubility (logWS), and volatility (logVP). We have made a tool to do this available freely online here: http://pmcmullen.com/chemspace/. Figure 2a illustrates one such tool that uses orange, brown, and blue circles to indicate whether a chemical has been classified as near-field (NF), far-field (FF), and pharmaceutical (Rx), respectively. To more clearly distinguish the Rx and NF groups, they were plotted separately (Fig 2b).

5

http://pmcmullen.com/chemspace/


December 30, 2019

Figure 2. a) Three-dimensional orientation of the variation of chemical space for different exposure categories: blue, brown, and orange circles indicate whether chemicals are considered to be pharmaceuticals (Rx), far-field (FF), or near-field (NF),

respectively, and b) Rx and NF only.

Figure 3 provides perspective on how chemicals in particular classes are arranged in three-dimensional space. Chemicals that were classed as phthalates and esters were flagged by yellow and red circles, respectively. Phthalates and esters were also parsed into NF, FF, and Rx by color (orange, brown, and blue, respectively). Similar comparisons were done for amides, alcohols, and siloxanes in multiple combinations and can be found in supplementary figures.

6


December 30, 2019

Figure 3. Three-dimensional orientation of the variation of chemical space for chemicals of interest with a subset grouped by their chemical classes based on their names: a) phthalates, esters, alcohols, amides, and siloxanes and b) phthalates and esters in terms

of NF, FF, Rx classes.

Principal Components Analysis Figure 4 shows the distribution of chemicals across the first two principal components of chemical space based on physicochemical properties and ToxPrint chemotypes. The near-field and far-field show very similar coverage of first and second components in Figure 4. To confirm the observation, we used Kernel distribution function for the first two components for all three proximity classes. The Frobenius norm between the distributions was computed to compare the distributions. The resulting distances (Rx-NF=1.95, Rx-FF=1.85, NF-FF=0.52) support the conclusion that the Rx distribution can be seen as distinctly different from the other two.

7


December 30, 2019

Figure 4. Principal components analysis of near-field (NF), far-field (FF) and pharmaceutical (Rx) exposure proximities with a) NF, FF, and Rx in gray, orange, and dark blue circles, respectively, b) FF vs. all classes together (light blue circles), c) Rx

vs. all classes together, and d) NF vs. all classes together.

Figure 5 shows comparison of prediction performance by Random Forest using physicochemical properties and chemotypes.

Figure 5. Performance of Random Forest model to predict exposure proximity classes based on physicochemical properties and ToxPrint chemotypes.

8


December 30, 2019 Table 1 provides a breakdown of the number of chemicals that fall into each of the proximity classes by group: alcohols, amides, dioxins, esters, siloxanes, VOCs, SVOCs, and other. Out of 13,292 chemicals, a total of 740 belong to one or another of these groups based on a query of character strings found in their preferred molecular names in the CERAPP database. For each category, Random Forest classification accuracy is reported. Out of 20 siloxanes, one is reported as a pharmaceutical compound. It should be emphasized that chemicals may fall into any three of the exposure proximity classes. For simplicity of the analysis, chemicals were only pooled into one class in the descriptor set with priority: NF>FF>Rx. Table 1 lists the total number of chemicals in each category and Random Forest classification performance for each category using combined descriptors of chemotypes and physicochemical properties. While alcohols and phthalates show a poor performance, the overall classification accuracy (5-fold cross validation) for all chemicals was 75%.

Table 1. Summary of chemical categories in terms of how they bin into the three exposure proximity classes (NF, FF, and Rx) and Random Forest classification performance.

Chemical Categories

Total NF FF Rx Classification Accuracy

(%) Alcohols 35 19 11 5 46 Amides 324 61 86 177 68 Dioxins 10 0 9 1 90 Esters 316 145 143 28 77

Phthalates 35 16 16 3 46 Siloxanes 20 15 4 1 75

Other 12552 3449 3217 5886 75 VOCs 2227 1377 568 282 74

SVOCs 11065 2328 2918 5819 75

The Random Forest model was validated using a set of 3659 chemicals lacking original ACToR proximity classes. The classes were assigned from product use information from CPCat. 53% of these 3659 chemicals were accurately classified when the model was validated using this independent set of chemicals. Rx classes were predicted more accurately (74% accuracy) than the other 2 classes (NF: 54% and FF 49%). A summary of validation results is provided in Table 2. Note that all chemicals that have both ACToR and CPCat use categories had the same proximity classifications. Table 2. Summary of validation of the Random Forest model using 3659 chemicals lacking original ACToR categories but for which there existed product use information to assign proximity classes. 1944 out of 3659 chemicals were predicted accurately using Random Forest model.

Proximity Classes

Total Accurately Predicted by RF

Classification Accuracy (%)

Rx 359 265 74 NF 1334 721 54 FF 1966 958 49

9


December 30, 2019 DISCUSSION

3D Visualization of Chemical Space In general, chemical space analysis is a visualization approach that can be used in a variety of bioinformatics applications to increase transparency and accessibility of the results for both experimental and regulatory scientists. As expected, chemicals characterized as phthalates, dioxins, or siloxanes collectively tend to have relatively high predicted lipophilicities (-0.05 > logP > 10.1). The percent of NF, FF, and Rx compounds among all the chemicals were 28%, 26%, and 46%, respectively. For phthalates, the breakdown of exposure proximities NF, FF, and Rx were 46%, 46%, and 8%, respectively. Because of their physicochemical properties it is not unexpected that phthalates in generally do not meet the “druggability” criteria (i.e., high water solubility, low lipophilicity, and low volatility). However, copper phthalate is a compound that does meet pharmaceutical standards through its use as a redox-active therapeutic (29). In total, 16 phthalates were classified as NF chemicals. One, di(2-ethylhexyl) phthalate (DEHP) is a near-field SVOC that tends to persist in the indoor environment. Given that it is very costly to evaluate so many chemicals at a given time and that chemical risk assessment is moving towards alternative testing approaches, it is imperative that high-throughput methods are pursued in order to prioritize data-poor chemicals for more expensive or labor-intensive testing. It is important to note that not all chemicals were successfully binned into groups based on their preferred DSSTox names (e.g., not every ester will have an ‘ester’ string in its name). Therefore, in order to cast a wider perspective, the EPA ToxPrint chemotype database may be further queried by using structural signatures that match a specified class.

Exposure Proximity Predictions Our Random Forest model predicted the three exposure proximities (NF, FF, and Rx) at a 75% accuracy rate (5-fold cross-validation), with pharmaceuticals having the highest relative accuracy rate (85%) and far-field having the lowest (55%). The highest accuracy rate (overall 75%) was achieved when physicochemical property descriptors were combined with ToxPrint chemotypes, while each descriptor set alone yielded a classification accuracy of ~70%. With the advancement of valuable data-curation techniques, accurate chemical structures are publicly available and updated regularly (15, 30). The use of structural signatures provides access to a number of chemicals that may not be officially documented by current sites, such as the Chemical Abstract Service (CAS). Therefore, unknown chemicals and potential industrial substitutes may benefit from analysis as described here. Similar analyses could also be used to compare complex mixtures (e.g., petroleum product streams) in order to assess their degree of similarity for categorization and read-across decisions. While the analysis demonstrated here focused on physicochemical properties, the same approach can be applied to biochemical properties (e.g., metabolism and transport) and toxicity (e.g., genomic responses). Prediction of proximity class of a chemical can serve as a first pass for prioritization before implementing low throughput methods for highly prioritized substances. Since physicochemical properties are good predictors of proximity class and many high priority chemicals are not suitable for existing high throughput QSAR models, our chemical space analysis a step towards developing novel analytical methods for determining the physicochemical properties of priority chemicals.

CONCLUSION

Using chemical space analysis, the selection in order to further examine the complexities of chemical space, a platform is currently under development that will provide real-time interactive 3D visualization. To provide a

10


December 30, 2019

measure for prioritizing environmental compounds for further exposure evaluation, chemicals ranked by their margins of exposure using rapid estimates of hazard and exposure surrogates may be prioritized for use of more fit-for-purpose tools (9, 13). For chemicals that are predicted to have both a near-field proximity class and a relatively high margin of exposure, further analysis into both the efficacy of current in vitro assay methods as well as likely exposure scenarios would be necessary to more thoroughly evaluate these compounds. This framework can also be used to prioritize in-house industrial candidate chemistries for further evaluation with appropriate further testing.

REFERENCES

1. Embry MR, Bachman AN, Bell DR, Boobis AR, Cohen SM, Dellarco M, et al. Risk assessment in the 21stcentury: roadmap and matrix. Crit Rev Toxicol. 2014;44 Suppl 3:6-16.

2. NRC. Using 21st Century Science to Improve Risk-Related Evaluations: National Academy Press:Washington D.C.; 2017.

3. Andersen M, McMullen PD, Phillips MB, Yoon M, Pendse SN, Clewell HJ, et al. Developing contextappropriate toxicity testing approaches using new alternative methods (NAMs). Altex. 2019:532-4.

4. US-EPA. Strategic Plan to Promote the Development and Implementation of Alternative Test MethodsWithin the TSCA Program. 2018.

5. Hartung T. Thresholds of Toxicological Concern: Setting a threshold for testing below which there is littleconcern. Alternatives to animal experimentation: ALTEX. 2017;34(3):331-51.

6. Patlewicz G WJ, Felter S, Simon TW, Becker RA. . Utilizing Threshold of Toxicological Concern (TTC)with High Throughput Exposure Predictions (HTE) as a Risk-Based Prioritization Approach forthousands of chemicals. Computational Toxicology 2018; in press.

7. Munro IC, Renwick AG, Danielewska-Nikiel B. The Threshold of Toxicological Concern (TTC) in riskassessment. Toxicol Lett. 2008;180(2):151-6.

8. Patlewicz G WJ, Felter S, Simon TW, Becker RA. Utilizing Threshold of Toxicological Concern (TTC)with High Throughput Exposure Predictions (HTE) as a Risk-Based Prioritization Approach forthousands of chemicals. in press. 2018.

9. Wambaugh JF, Setzer RW, Reif DM, Gangwal S, Mitchell-Blackwood J, Arnot JA, et al. High-throughputmodels for exposure-based chemical prioritization in the ExpoCast project. Environmental science &technology. 2013;47(15):8479-88.

10. Isaacs KK, Glen WG, Egeghy P, Goldsmith MR, Smith L, Vallero D, et al. SHEDS-HT: an integratedprobabilistic exposure model for prioritizing exposures to chemicals with near-field and dietary sources.Environ Sci Technol. 2014;48(21):12750-9.

11. Moreau M, Leonard J, Phillips KA, Campbell J, Pendse SN, Nicolas C, et al. Using exposure predictiontools to link exposure and dosimetry for risk-based decisions: A case study with phthalates. Chemosphere.2017;184:1194-201.

12. Phillips KA, Wambaugh JF, Grulke CM, Dionisio KL, Isaacs KK. High-throughput screening ofchemicals as functional substitutes using structure-based classification models. Green Chem.2017;19(4):1063-74.

13. Judson R, Richard A, Dix D, Houck K, Elloumi F, Martin M, et al. ACToR--Aggregated ComputationalToxicology Resource. Toxicol Appl Pharmacol. 2008;233(1):7-13.

14. Dionisio KL, Frame AM, Goldsmith MR, Wambaugh JF, Liddell A, Cathey T, et al. Exploring consumerexposure pathways and patterns of use for chemicals in the environment. Toxicol Rep. 2015;2:228-37.

15. Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, et al. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environmental health perspectives. 2016;124(7):1023-33.

11


December 30, 2019 16. Weschler CJ, Beko G, Koch HM, Salthammer T, Schripp T, Toftum J, et al. Transdermal Uptake of

Diethyl Phthalate and Di(n-butyl) Phthalate Directly from Air: Experimental Verification. Environ Health Perspect. 2015;123(10):928-34.

17. Little JC, Weschler CJ, Nazaroff WW, Liu Z, Cohen Hubal EA. Rapid methods to estimate potential exposure to semivolatile organic compounds in the indoor environment. Environ Sci Technol. 2012;46(20):11171-8.

18. Knudsen TB, Martin MT, Kavlock RJ, Judson RS, Dix DJ, Singh AV. Profiling the activity of environmental chemicals in prenatal developmental toxicity studies using the U.S. EPA's ToxRefDB. Reprod Toxicol. 2009;28(2):209-19.

19. Lioy PJ, Wallace L, Pellizzari E. Indoor/outdoor, and personal monitor and breath analysis relationships for selected volatile organic compounds measured at three homes during New Jersey TEAM-1987. J Expo Anal Environ Epidemiol. 1991;1(1):45-61.

20. Wallace LN, W.; WESTERDAHL, D. Personal exposures, indoor-outdoor air concentrations, and breath concentrations of 25 volatile organic compounds. J Expo Anal Environ Epidemiol. 1991;1:157-92.

21. Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform. 2018;10(1):10.

22. Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, et al. New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model. 2015;55(3):510-28.

23. Liao C, Nicklaus MC. Comparison of nine programs predicting pK(a) values of pharmaceutical substances. J Chem Inf Model. 2009;49(12):2801-12.

24. Liao KH, Tan YM, Conolly RB, Borghoff SJ, Gargas ML, Andersen ME, et al. Bayesian estimation of pharmacokinetic and pharmacodynamic parameters in a mode-of-action-based cancer risk assessment for chloroform. Risk Anal. 2007;27(6):1535-51.

25. Strope CL, Mansouri K, Clewell HJ, 3rd, Rabinowitz JR, Stevens C, Wambaugh JF. High-throughput in-silico prediction of ionization equilibria for pharmacokinetic modeling. Sci Total Environ. 2018;615:150-60.

26. Mansouri K, Grulke CM, Richard AM, Judson RS, Williams AJ. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Environ Res. 2016;27(11):939-65.

27. Breiman L. Random forests. Machine learning. 2001;45(1):5-32. 28. Haider S, Rahman R, Ghosh S, Pal R. A Copula Based Approach for Design of Multivariate Random

Forests for Drug Sensitivity Prediction. PLoS One. 2015;10(12):e0144490. 29. Slator C, Barron N, Howe O, Kellett A. [Cu(o-phthalate)(phenanthroline)] Exhibits Unique Superoxide-

Mediated NCI-60 Chemotherapeutic Action through Genomic DNA Damage and Mitochondrial Dysfunction. ACS Chem Biol. 2016;11(1):159-71.

30. Richard AM, Williams CR. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat Res. 2002;499(1):27-52.

12

Date post:	20-Jan-2022
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

White Paper Prepared for the ACC LRI on Chemical Space ...

Documents