Application of rule induction techniques for detecting the possible
impact of endocrine disruptors on the North Sea ecosystems
Tim Verslycke1, Peter Goethals1,2, Gert Vandenbergh1, Karen Callebaut3 & Colin Janssen1
1 Laboratory of Environmental Toxicology and Aquatic Ecology, Ghent University2 Institute for Forestry and Game Management3 Ecolas n.v.
Outline
Introduction on endocrine disruptors
ED North project
Database set-up
Data mining and rule induction
Practical application on ED North database
Conclusions
Endocrine disruptors, pseudo-hormones, endocrine modulators, xeno-hormones, …
Compounds that interfere with the endocrine system, resulting in (negative) effects on health and/or reproduction of organisms
Since 90s: one of the strongest growing research domains in environmental toxicology
Dozens of lists, 100s compounds
Worldwide implication: industry - government - academics
Endocrine disruptors ??
Endocrine disruption in marine environments ??
Sea: final sink for many chemicals
North Sea and its estuaries are under a heavy pollution load
Indications of potential endocrine disruption in these ecosystems
Need to have better overview of potential endocrine disruption in North Sea and Scheldt estuary ED-NORTH project
ED-North project ~ Goals
Critical evaluation of the literature on endocrine disruptors
Build a reference list and database of chemicals with (potential) endocrine disruptive activity
Evaluation of the described and suspected effects of endocrine disruptors on marine organisms
Prioritize the selected chemicals
If enough information: preliminary risk assessment
Formulation of the research needs and policy actions (overview of the Belgian expertise)
ED-North project ~ Methods
Literature study
- electronic databases: Poltox, Medline, Current Contents, CAB abstracts, Agris, Agricola, Web of Science,…
- world wide web: USEPA, OECD, WWF, CEFIC, IEH,…
- grey literature
Database
MS Access (relational database)
ED-North project ~ Results
General overview of endocrine disruption in humans and other mammals, birds, reptiles, fish and invertebrates
Situation in Belgium and The Netherlands
Expertise in Belgium
Emission of synthetic and natural hormones in Belgium
Sources, effects and occurrence of endocrine disruptors in the North Sea + prioritization
Database of (potential) endocrine disruptors for the North Sea ecosystem
CHEMICALS (765)
Chemical ID
Chemical Name Nl
Chemical Name E
CAS
UN
Chemical Formula
Molecular Weight
Boiling Point
Melting Point
Density
Pressure
Solubility
Log Kow
Phase
Notes
ENDOCRINE
Endocrine ID
Chemical ID
Reference ID
Group Name
Organism
Tissue
Age
In vivo
Lab
Flow
Duration
Route
Temperature
Concentration
Notes
EFFECT (3516)
Effect ID
Hormone Name
Endocrine ID
Effect Code
Effect description
REFERENCES (423)
Reference ID
Authors
Year
Title
Source
GROUP
Group Name
HORMONE
Hormone Name
EFFECT CODE
Effect Code
Relational database: anthropogenic (potential) endocrine disruptors
EndocrinID
ChemID
RefID
Group Organism Tissue AgeInVivo
LabDuration
Concentration
Notes
2598 240 26 mammalian Human MCF-7 cells In vitro Laboratory 6 days 10 µMTechnical grade; E-screen
ChemID
ChemNameNl CASChemForm
Molweight
BP MP Pressure SolubilityLogKow
Phase
240 DDT 50-29-3 C14H9Cl5 354,49 260°C 108°C 1,9E-7 mm Hg at 20°C 3,1-3,4 µg/l 6,19 Solid
Tabel: Endocrine
Tabel: Chemicals
RefID Authors Year Source
26 Soto, A.M., Chung, K.L., Sonnenschein, C. 1994 Environ. Health Perspect., 102:380-383
Tabel: References
Relational database
Rule induction techniques
Data mining (analysis) techniques:
1) Clustering methods (which data are related or ‘similar’)e.g. cluster analysis
2) Classification methods (how are variables related, merely using classes (numerical or not) = rules amongst variables)e.g. decision trees
3) Regression methods (quantitative description of the relation between two variables)e.g. multivariate regression
A
A
B
B
A
B
Rule induction techniques
Classification and decision trees: induction of rules from datasets
• which variables are relatede.g. which variables are mainly related to endocrine disruptive effects in animals
• how are variables related (quantitative rules making use of treshold values or classes)e.g. when hormone concentration higher than value A, then estrogenic effects of type X will occur
Rule induction techniques
WEKA data mining software: DOS command window but also Visual JAVA interface
Induced rule set
Rule set performance indicators
Applications on ED-North database
Example on crustacean data
1) Prediction of endocrine disruptive effects based on
physical/chemical properties of chemicals
2) Prediction of estrogenic effect of chemicals to the
crustaceans in the database
3) Which factors (flow, concentration, duration, ...) affect this
estrogenicity
1) Which molecular characteristics are related to estrogenic effects
Estrogenic effects in crustaceans (89 cases)
Tested variables: effects, molecular weight, boiling point, temperature, Log Kow, solubility
Induced rule set:
LogKow 3.74: Estrogenic effect
LogKow > 3.74
| Solubility 0.00033: No Estrogenic effect
| Solubility > 0.00033: Estrogenic effect
Reliability (CCI): 63 %
2) Which estrogenic effects are related with particular compounds in the environment
Estrogenic effects in crustaceans
Tested variables: effects, compounds
Induced rule set (23 rules, one for each compound):
CHEMID = 4-nonylphenol (p-nonylphenol): Estrogenic effect
CHEMID = ...
...
CHEMID = 20-hydroxyecdysone: No Estrogenic effect
Reliability (CCI): 60 %
2) Which estrogenic effects are related with particular compounds in the environment
Estrogenic effects in crustaceans
Tested variables: effects, organisms, compounds
Induced rule set (13 rules, one for each organism):
Organism = Balanus amphitrite: No estrogenic effect
Organism = Daphnia magna: Estrogenic effect
...
Reliability (CCI): 74 %
3) Which factors affect the estrogenic effects
Estrogenic effects in crustaceans
Tested variables: effects, organisms, compounds, age, flow, in vitro/in vivo, duration
Induced rule set (16 rules, one for each age class and for larval also one for each organism type):
Age = Juvenile: No estrogenic effect
Age = Larval
| Organism = Balanus amphitrite : Estrogenic effect
| Organism = ...
Age = Adult: Estrogenic effect
Age = Egg: Estrogenic effect
Reliability (CCI): 78 %
General discussion
This exercice on the ED North data base illustrated that data mining can help to find relations between:
Type of organisms
Test and environmental
conditions
Estrogenic effects
Compounds and their structure
General discussion
Data mining helps to find errors and outliers in the data set, and creates insights to improve further data collection and the development of databases
Interaction between data miners and domain experts (ecologist, ecotoxicologist) very important:
1) easily find ‘reliable nonsense’ rules by excluding important variables during the analysis (need for expertise of ecotoxicologist)
2) the parameter settings and the insight in tuning them have a very important impact on the richness of the outcome of the data mining exercice (need for data mining expertise)
General discussion
The collected data set itself influences to an important extend the outcome of the analysis:
1) importance of collecting data that cover the whole range (variables and their values/classes) and stratification of the instances is necessary
2) Selection of variable-classes can affect the results to a high extend (e.g. larval-adult problem, amount of effect-classes, ...)
Conclusions
Data mining allows to find which gaps exist in the database and delivers information for sustainable data collection and management
Data mining delivers insight in the dataset: generation of knowledge from data
Highly impredictable parts in the dataset are useful to focus further research on
General reliable rules are promising for decision support in environmental management
Important to be aware of exploring correlations instead of causal relations! Control by experts or further research (validation) is always necessary
Data mining adds more colour to our data
Federal Office for Scientific, Technical
and Cultural Affairs (OSTC)
Thesis students
Ward Vanden Berghe (VLIZ)
The Flemish Institute for the Promotion of
Scientific and Technological Research in
Industry (IWT)
Acknowledgements