2nd Summer School in Computational Biology
1
Current Trends in Environmental Modelling with
Uncertainty
Jiří Hřebíček, Michal HejčMasaryk University, Brno, Czech Republic
2nd Summer School in Computational Biology
2
Content
• Introduction• Modified uncertainty analysis• Uncertainty analysis with Maple • Case study: Air pollution by the transport in
Czech Republic• Case study: Assessment of waste management
indicators in South Moravia region• Conclusions
2nd Summer School in Computational Biology
3
Introduction
• The important characteristic feature of environmental modelling is the complexity and uncertainty of its mathematicalrepresentation (uncertainty of formula).
• Imprecision of its input data is another characteristic feature, where it is not possible to omit influences of primary monitoring (e.g. gaps of data, errors of measuring facilities, human factor, etc).
2nd Summer School in Computational Biology
4
Uncertainty
Uncertainties in the scientific sense are the component of all aspects of the environmental modelling process.
They describe lack of knowledge about models, their parameters, constants, data, and beliefs.
2nd Summer School in Computational Biology
5
Sources of uncertainty
• the science underlying a model,• uncertainty in model parameters,• scientific constants and input data,• monitoring and observation errors,• and implementation uncertainty.
2nd Summer School in Computational Biology
6
Modified uncertainty analysis
The uncertainty analysis of environmental models and their algorithm implementation consists of following stages:
• characterization of input uncertainties,• uncertainty propagation,• characterization of model uncertainty,• characterization of the uncertainties in algorithm
predictions.
2nd Summer School in Computational Biology
7
Types of approaches
Proposed uncertainty analysis of environmental models issues from following approaches:– interval arithmetics,– fuzzy theory,– probabilistic analysis,– methodology of Checkland.
2nd Summer School in Computational Biology
8
Checkland SSM iterative approachSeven stages, forming a life cycle of mathematical model:
1. Finding out about the environmental problem situation. This is basic research into the problem area.
2. Expressing the situation through so-called “Rich Picture“.3. Selecting how to view the situation and producing root definitions.4. Building conceptual environmental models of what the system
must do for each root definitions. We have basic “Whats” from the root definitions. Now we begin to define “Hows”.
5. Comparison of the conceptual environmental models with the real world. We compare the results from steps 4 and 2.
6. Identify feasible and desirable changes. Are there ways of improving the situation?
7. Recommendations for taking action to improve the environmental problem situation.
2nd Summer School in Computational Biology
9
Appropriate ICT tools: CAS
Computer algebra based systems (CAS) involve:
• the direct symbolic and algebraic computation (SAC) of the governing equations of mathematical models and analytical/numerical solution of environmental problem and its visualization/presentation;
• and also the estimation of the sensitivity and uncertainty of model outputs with respect to model inputs.
2nd Summer School in Computational Biology
10
Appropriate ICT tools: CAS
The process of environmental modelling using CAS consists of the spiral cycle:
IDENTIFY – DEVELOP – IMPLEMENT –SOLVE – ANALYZE – MODIFY,
which shows the way how complex CAS automate all phases of environmental
modelling.
2nd Summer School in Computational Biology
11
IDENTIFY model components
SOLVE computationalmodel
MODIFY model for optimization
Environmental Problem
Extensive user communityMany journals, books, conferences, etc.
SAC user groupsSAC e-resources (problems & algorithms &
benchmark data)
Symbolic equationsSpecialized math functions
Matrix formulationAutomatic differentiation
OptimizationCurve fitting
Mathematical dictionaryEfficient user interface
Programming languagesConnectivity to SAC e-sources
Unique data structureInteractive debugger
Syntax checking utilitiesAdvanced help system
Analytical solutionHigh precision numerical solution
Fortran, C, Java, MS Visual basic code generation
Interactive graphicsSpecialized plots
2D & 3D animationsNatural equation display
“What if” analysisHTML, XML & MathML export
LaTeX, RFT, PDF export
Interactive worksheetsMultiple document interfaceReusable sub expressions
Maintains symbolic relationshipsTechnical document processingConnectivity to SAC e-sources
DEVELOP equations& algorithms
IMPLEMENT computational model
ANALYZE andpublish solutions
2nd Summer School in Computational Biology
12
The known SAC systems
Yacas,HartMath,The OpenXM project,Prologie,GiNaC,ArtLandia,Axiom,CoCoA,Derive,Algebra Domain Constructor,Fermat,GAP,GANITH,GRG,GRTensor,LiDIA,GNU DOE Maxima,
Magma,Maple,Mathematica,Mathomatic,MathSoft,MATLAB,MathTensor,Milo,MP,MuPAD,NTL,Pari,Reduce,Schur,Singular,SymbMath,TI-92 Calculator, and TI-92 Plus.
2nd Summer School in Computational Biology
13
Uncertainty Analysis with Maple
• Precise value of scientific constants (Scientific constants package)
• Implementation of interval arithmetic (Tolerances package).• Implementation of fuzzy theory (FuzzySet Package).• Implementation of probabilistic analysis (ScientificErrorAnalysis
package).• The Checkland's SSM is not automatically implemented in Maple.
Getting the online data and program codes (www.maplesoft.com).
The Sockets package of Maple allows getting data and program codes for the computation online from the web. In particular, itenables two independent Maple processes running on different computers on a network to communicate with one another
2nd Summer School in Computational Biology
14
Scientific constants implementation
• The Scientific Constants package of Maple provides access to the values of various constant physical quantities, for example, the velocity of light and the atomic weight of sodium. These values are required to solve equations in fields such as chemistry and physics.
• The Scientific Constants package also provides the units for each of the constant values, allowing for greater understanding of the equation as well as units matching for error checking of the solution.
• The quantities available in the Scientific Constants package are divided into two distinct categories.
• physical constants• properties of the chemical elements (and their isotopes)
2nd Summer School in Computational Biology
15
Scientific constants implementation
2nd Summer School in Computational Biology
16
2nd Summer School in Computational Biology
17
Interval arithmetic implementation
The package Tolerance of Maple provides basic data types and operations for interval arithmetic as well as additional features for further interval computation.
It contains the type checking functions, all arithmetic functions including powers, trigonometric andhyperbolic ones, set operations on the interval, range operations for a given function, complex number support and some basic numeric methods as the Newton's method for finding a root of an uncertain function.
2nd Summer School in Computational Biology
18
2nd Summer School in Computational Biology
19
Fuzzy theory implementation
The Fuzzy Sets toolbox of Maple allows constructing and working with fuzzy subsets of both the real line and of user-defined finite sets.
Its modules automatically generate fuzzy controllers from a collection of user-defined rules. This allows modelling, testing, and modifying fuzzy systems in the interactive Maple worksheet environment.
2nd Summer School in Computational Biology
20
Probabilistic Analysis Implementation
The ScientificErrorAnalysis package of Maple provides representation and construction of numerical quantities in Maple that has a central value and associated uncertainty or error, which is some measure of the degree of precision to which the quantity's value is known.
The associated uncertainty can be specified in absolute, relative, or units in the least digit form. In the returned object, the uncertainty is quantified in absolute form.
2nd Summer School in Computational Biology
21
2nd Summer School in Computational Biology
22
Probabilistic Analysis Implementation
2nd Summer School in Computational Biology
23
2nd Summer School in Computational Biology
24
2nd Summer School in Computational Biology
25
Case Study: Air Pollution by the Transport
The emissions from transport in the Czech Republic has been analyzed with respect to uncertainties using ICT tools of Maple and SSM, where the implemented mathematical model of transport air emissions in Maple has issued from the well-known mathematical model COPERT III.
2nd Summer School in Computational Biology
26
COPERT III
We rearrange the model COPERT III in the following way:
• Treating the emission factors, fuel consumptions and transport powers as uncertain.
• Unifying the formulas for various pollutants.• Unifying the formulas for various transport
types.
2nd Summer School in Computational Biology
27
Checkland's SSM
• Of course, not all of these changes must be desirable, but using the Checkland's SSM iterative approach has allowed us to change the model afterwards, taking into account its bad properties which were not corresponded to the situation in the real world.
• Selected emission factors, which are based on measured values, used probabilistic approach, and further the direct dependence of the relationship of transport performances given in passenger kilometres or ton-kilometres were eliminated.
2nd Summer School in Computational Biology
28
Carbon dioxide (CO2) emissions (kg/inhabitant) generated in the Czech Republic by all types of transport
0
100
200
300
400
500
600
700
800
1995 2000 2001 2002 2003 2004
IndividualroadpassengerPublic roadpassenger
Road goods
Urban public
Rail - dieseltraction
Waterway
Air
2nd Summer School in Computational Biology
29
VOC emissions (kg/inhabitant) generated in the Czech Republic byall types of transport
0,00
1,00
2,00
3,00
4,00
5,00
6,00
1995 2000 2001 2002 2003 2004
Indiv idualroadpassengertransportPublic roadpassengertransport
Roadgoodstransport
Urbanpublictransport
Railtransport -dieseltractionWaterw aytransport
2nd Summer School in Computational Biology
30
Assessment of waste management indicators in South Moravia region
This case study presents the use of the modular approach, which divides the annually collected waste management data into particular modules.
Every module processes one portion of the data(portions may have intersections) and uses one method for uncertainty processing (interval arithmetic, probability approach, fuzzy approach or their combination).
2nd Summer School in Computational Biology
31
Modules’ dependencies
METADATA KNOWLEDGE
TRUSTWORTHINESS MODELS
RESULT CHECKFUNCTION
OPTIMUM
2nd Summer School in Computational Biology
32
Module models
The module model uses number of inhabitants in districtand evaluates the coefficients of production per capita in the district from its district primary collected data, uses archived coefficients of production per capita and year average production in the district d from the last year, etc.
The output of this model is estimated the annual regional municipal waste production and the annual regional municipal waste production from the last year.
∑d
pN dI
d1
2nd Summer School in Computational Biology
33
Modules Trustworthiness, Result
The module for Trustworthiness of the data estimates trustworthiness of primary collected data (it can be of the type fuzzy, interval or probability), and it takes into consideration primary data from past three years, models, knowledge and some parameters, which are suitable for optimization.
The module Result just computes the result from primary waste management data, its trustworthiness and the model.
Result takes all their optimization parameters and the output of this function is a value which will substitute the origin value in waste database, but with the confirmation of the government response officers in the case of sensitive data (e.g. demolitions waste, hazardous waste, etc.). Substituted values are mostly the same as in the primary data (more the data are uncertain means more substituted values). This function applied on all values produces the new database.
∑d
pN dI
d1
2nd Summer School in Computational Biology
34
Module Check function
The new database calculated by module Result is evaluated. During evaluation some evaluation criteria (again it can be used with some parameters) are used if possible. Evaluated database is checked by the check function (which is a function of the Result and some knowledge). In accordance with evaluation criteria the optimum database is founded in sufficient number of iterations, generally in cooperation with decision makers from the government of South Moravia.
∑d
pN dI
d1
2nd Summer School in Computational Biology
35
Conclusions
• The uncertainty handling is not further so problematic in environmental modelling using current ICT tools.
• Deeper knowledge of the mathematical model and the data together with uncertainty and sensitivity analysiscan show how much the input uncertainty influence the outcome of the model.
2nd Summer School in Computational Biology
36
Conclusions (cont.)
• Classification of the parameters and the data into clusters (where some of them are sufficient to be known roughly and some of them more accurately) can divide the problem of uncertainty into parts, solved by different approaches(interval arithmetic, fuzzy and probabilistic theory).
2nd Summer School in Computational Biology
37
Conclusions (cont.)
• The further opened question is the sense of using the SSM because now it seems that we need at least two iterations to get as good results as the original model.
• Such approach makes environmental modelling little bit slower, but the iteration guarantees that there are no useless formulas and keeps the model complexity at the lower bound corresponding to the results we have.
2nd Summer School in Computational Biology
38
Thank for your attention
Prof. Dr. Jiří Hřebíč[email protected]
2nd Summer School in Computational Biology
39
Acknowledgment
Supported by project MSM0021622412
INCHEMBIOL