Post on 14-Jul-2020
transcript
Research ArticleA Conceptual Approach to Complex Model Management withGeneralized Modelling Patterns and Evolutionary Identification
Sergey V Kovalchuk 1 Oleg G Metsker1 Anastasia A Funkner 1
Ilia O Kisliakovskii 1 Nikolay O Nikitin1 Anna V Kalyuzhnaya 1
Danila A Vaganov12 and Klavdiya O Bochenina 1
1 ITMO University Saint Petersburg Saint Petersburg Russia2University of Amsterdam Amsterdam Netherlands
Correspondence should be addressed to Sergey V Kovalchuk sergeyvkovalchukgmailcom
Received 1 June 2018 Accepted 17 September 2018 Published 1 November 2018
Guest Editor Rafael Gomez-Bombarelli
Copyright copy 2018 Sergey V Kovalchuk et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Complex systemsrsquo modeling and simulation are powerful ways to investigate a multitude of natural phenomena providing extendedknowledge on their structure and behavior However enhanced modeling and simulation require integration of various data andknowledge sources models of various kinds (data-driven models numerical models simulation models etc) and intelligentcomponents in one composite solution Growing complexity of such composite model leads to the need of specific approachesfor management of suchmodelThis need extends where the model itself becomes a complex system One of the important aspectsof complex model management is dealing with the uncertainty of various kinds (context parametric structural and inputoutput)to control the model In the situation where a system beingmodeled or modeling requirements change over time specific methodsand tools are needed tomakemodeling and application procedures (metamodeling operations) in an automaticmanner To supportautomatic building and management of complex models we propose a general evolutionary computation approach which enablesmanaging of complexity and uncertainty of various kinds The approach is based on an evolutionary investigation of model phasespace to identify the best modelrsquos structure and parameters Examples of different areas (healthcare hydrometeorology and socialnetwork analysis) were elaborated with the proposed approach and solutions
1 Introduction
Today the area of modeling and simulation of complex sys-tems evolves rapidly A complex system [1] is usually charac-terized by a large number of elements complex long-distanceinteraction between elements and multiscale variety One ofthe results of the arearsquos development is growing complexityof the models used for investigation of complex systems Asa result contemporary model of a complex system could beeasily characterized by the same features as a natural complexsystemUsually a complexity of amodel is considered in tightrelation to a complexity of a modeling system Neverthelessin many cases the complexity of a model does not mimicthe complexity of a system under investigation (at leastexactly) It leads to additional issues in managing a complexmodel during identification calibration data assimilation
verification validation and application One of the corereasons for these issues is the uncertainty of various kinds[2 3] applied on levels of system data andmodel In additioncomplexity is even more extended within multidisciplinarymodels and models which incorporate additional complexorand third-party submodels From the application pointof view complex models are often difficult to support andintegrate with a practical solution because of a low level ofautomation and high modeling skills needed to support andadapt a model to the changing conditions
On the other hand recently evolutionary approaches arepopular for solving various types of model-centered oper-ations like model identification [4] equation-free methods[5] ensemble management [6] data assimilation [7] andothers Evolutionary computation (EC) provides the ability toimplement automatic optimization and dynamic adaptation
HindawiComplexityVolume 2018 Article ID 5870987 15 pageshttpsdoiorg10115520185870987
2 Complexity
Ξ
Σ
Parameters Functional characteristics
Structure
S
D
M
Layers
- F-models- DD-models- A-models- complicated transitions- expert knowledge
Φ
Operators ( )LA
Figure 1 Basic concepts of complex modeling on model (119872) data (119863) and system (119878) layers
of the system within a complex state space Still most ofthe solutions are still tightly related to the application andmodeling system
Within the current research we are trying to develop aunified conceptual and technological approach to supportcore operation with a complex model by distinguishing con-cepts and operations on model data and system levels Weconsider a combination of EC and data-driven approachesas a tool for building intelligent solutions for more preciseand systematic managing (and lowering) uncertainty andproviding the required level of automation adaptability andextendibility
2 Conceptual Basis
The proposed approach is based on several key ideas aimedto extend uncertainty management in complex system mod-eling and simulation
(1) Disjoint consideration of model data and system interms of structure behavior and quality is aimed towarda system-level review of modeling and simulation processand distinguishes between the uncertainties of various kindsoriginated from different level [8]
(2) Intelligent technologies like data mining processmining machine learning and knowledge-based approachesare to be hired to fill the gap in automation of modeling andsimulation Key sources for the development of such solutioninclude formalization of various knowledgewithin composite
solutions [9] and data-driven technologies to support theidentification of model components
(3) EC approaches are widespread in modeling andsimulation of complex systems [8 10] We believe thatsystematization of this process with separate consideration ofspaces for a system (with its subsystems) and a model (withits submodels) could enhance such solutions significantly
(4) The aim of the approachrsquos development is twofoldFirst it is aimed towards automation of modeling opera-tions to extend the functionality of possible model-basedapplications Second working with a combination of EC andintelligent data-driven technologies could be considered as anadditional knowledge source for system and model analysis
Furtherly this section considers the conceptual basis ofthe proposed approach with a special focus on the role ofEC algorithms and data-driven intelligent technologies forbuilding and exploiting complex models
21 Core Concepts To distinguish between main modelingconcepts and operations we propose a conceptual framework(see Figure 1) for consideration of key processes and opera-tions duringmodeling of the complex systemThe frameworkmay be considered as a generalization and extension of aframework [11 12] previously defined and used by authorsfor ensemble-based simulation Current research extendsthe concept beyond ensemble-based simulation It is mainlyfocused on complex modeling in general with identifica-tion of key model management procedures and important
Complexity 3
artifacts which can be used for model development andapplication
The proposed framework considers three main layers ofcomplex systemsrsquo modeling namely model (119872) data (119863)and system (119878) Main operations (arrows on the diagram)within the framework are defined within three conceptsquantitative parameters (Ξ) functional characteristics (Φ)and structure (Σ) We denote operations by Γ119860
119871 where 119860 and
119871 stay for concepts and layer (respectively) involved in theoperation Transitions between concepts and between layersare denoted by 1198601 997888rarr 1198602 and 1198711 997888rarr 1198712 respectively egoperator ΓΞ
119878997888rarr119863reflects observation of quantitative param-
eters and operator ΓΞ119863997888rarr119872
stays for basic data assimilationAlso a set of operators may refer to a single modelingoperation eg operators ΓΦ997888rarrΞ
119872and ΓΣ997888rarrΞ119872
are often imple-mented within a single monolithic model Mainly operatorsare related to the specific submodel within a complex modelWe consider three key classes of models F-models are usuallyclassical continuous models developed with knowledge ofa system DD-models are data-driven models based onanalysis of available data sets with corresponding techniques(statistics data mining process mining etc) A-models aremainly intelligent components of a system usually based onmachine learning or knowledge-based approaches Also weconsider EC-based components as belonging to A-modelsclass
A key problemwithin complex systemmodeling and sim-ulation is related to the absent or at least significantly limitedpossibility to observe the structure and functional charac-teristics of the system (operators ΓΦ
119878997888rarr119863and ΓΣ119878997888rarr119863
) directlyThe general solution usually includes implicit substitutionof the operators with the expertise of modeler (operatorsΓΦ119878997888rarr119872
and ΓΣ119878997888rarr119872
) Still the more complex the system underinvestigation and the model are the more limited those oper-ations are To overcome this issue additional DD-models areinvolved (operators ΓΞ997888rarrΣ
119863and ΓΞ997888rarrΦ119863
for mining in availabledata ΓΦ997888rarrΞ
119863997888rarr119872for extended discovery of model parameters for
various functional characteristics) Also A-models are hiredto extend expert knowledge in discovery of119872-layer conceptswith either formalized knowledge or knowledge discoveredin data with machine learning approaches (operators ΓΞ997888rarrΣ
119863997888rarr119872
ΓΞ997888rarrΦ119863997888rarr119872
ΓΣ119863997888rarr119872
and ΓΦ119863997888rarr119872
for direct discovery of struc-ture and functional characteristics directly and operatorsΓΣ997888rarrΦ119863
ΓΦ997888rarrΣ119863
ΓΣ997888rarrΦ119872
and ΓΦ997888rarrΣ119872
for interconnection ofdiscovered characteristics in available data and within theused model) In the proposed approach primary attentionis paid to these kinds of solutions where DD- and A-modelsenable enhancement of complex modeling process with anadditional level of automation adaptation and knowledgeproviding
22 Complex Modeling Patterns Considering the definedconceptual framework we identify several patterns of mod-eling and simulation of a complex system (see Figure 2)The patterns are defined as combinations in a context of theframework described previously (3 layers 3 concepts) Anessential idea of the proposed patterns is systematization of
complex model management approaches with combinationsof expertise intelligent solution (A-models) DD-models andEC
The pattern extends the operators described in Section 21for model building with operators for model application (iemodelling and simulation) and results analysis (eg assessingmodel quality) required for automated model identificationThese additional Operators are denoted with Γ1015840119860
119871and similar
notation for indicesP1 Regular modeling of a system (Figure 2(a)) is a
basic pattern usually applied to discover new knowledgeon the system under investigation A model is built using(a) expertise of modeled for identification of structure andfunctional characteristics of the model (ΓΦ
119878997888rarr119872and ΓΣ
119878997888rarr119872)
(b) available input data usually representing quantitativeparameters of a system considered as a static input of themodel or source for data assimilation (DA) via operatorsΓΞ119878997888rarr119863
and ΓΞ119863997888rarr119872
Results of model application (Γ1015840Ξ119872997888rarr119863
Γ1015840Σ
119872997888rarr119863 and Γ1015840Φ
119872997888rarr119863) could be considered from descriptive
(mainly structural or quantitative characteristics) or predic-tive (often forecasting or other functional characteristics)The obtained results are analyzed in comparison to availableinformation about the investigated system (Γ1015840Ξ
119863997888rarr119878 Γ1015840Σ119863997888rarr119878
and Γ1015840Φ
119863997888rarr119878) forming an optimization loop which can be
considered within the scope of all three concepts Certainlimitations within this pattern being applied to complexsystem modeling and simulation are introduced by twofactors First workingwith complex structural and functionalcharacteristics of the model requires a high level of expertisewhich leads to a limitation of extensibility and automationof model operation Second performing optimization in aloop with most algorithms require multiple runs of a modelAs a result computational-intensive models have limitationsin optimization-based operations (identification calibrationetc) due to performance reasons
P2 Data-driven modeling (Figure 2(b)) provides anextension to the modeling operation describing the rela-tionship between data attributes Application of data-drivenmodels may be considered as replacement of actual ldquofullrdquomodel providing (a) information about structure of systemand model with data mining (DM) and process mining(PM) techniques (ΓΞ997888rarrΣ
119863) (b) generating surrogate models
for functional characteristics (ΓΞ997888rarrΦ119863
) (c) providing estima-tion of investigated parameters with machine learning (ML)algorithms and models (Γ1015840Ξ997888rarrΞ
119863) In contrast to the previous
pattern data-driven models usually operate quickly (althoughit could require significant time to train the model) Stillsuch models have lower quality than original ldquofullrdquo modelsNevertheless combining this pattern with others providessignificant enhancement in functionality and performanceeg data-drivenmodels can be used in optimization loop (seeprevious pattern)
P3 Ensemble-based modeling (Figure 2(c)) extends P1for working with sets of objects (models data sets andstates) reflecting uncertainty variability or alternative solu-tions (eg models) Previously [11] we identified 5 classesof ensembles (see E1-E5 in Figure 2(c)) decomposition
4 Complexity
ΦΣ Ξ
S
D
M
Expe
rtise
Expe
rtise
Quality assessment
Descriptivemodeling
Predictive modeling
Obs
erva
tion
Inpu
tD
A
- data processing
- modeling and simulation- expertise intelligent solution
(a)
ΦΣ Ξ
S
D
M
Mining (DMPM)
ML
Surrogate
Obs
erva
tion
- data processing- data-driven solution EC
(b)
ΦΣ Ξ
S
D
M E1
E2
E4
E3
E1 E5
- data processing- modeling and simulation
(c)
Quality prediction
ΦΣ Ξ
S
D
MMining
(DMPM)
Surrogate
- data processing- modeling and simulation- data-driven solution EC
(d)
ΦΣ Ξ
S
D
M
Mining + EC
EC EC
Surrogate+EC
- data processing- modeling and simulation- data-driven solution EC
(e)
ΦΣ Ξ
S
D
M
EC ECHistory
Generating landscape
Extended quality assessment
- data processing
- modeling and simulation- expertise intelligent solution
- data-driven solution EC
(f)
Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling
ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling
P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872
) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ
119863997888rarr119872) In addition it is possible
to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ
119863997888rarr119878
Γ1015840Ξ997888rarrΣ
119863997888rarr119878 and Γ1015840Ξ997888rarrΦ
119863997888rarr119878)
P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872
) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872
) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)
P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ
119863997888rarr119878and Γ1015840Σ
119872997888rarr119863) and functional
(Γ1015840Ξ997888rarrΦ119863997888rarr119878
and Γ1015840Φ119872997888rarr119863
) characteristics of the actual systemThese patterns could be easily combined to obtain better
results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5
Complexity 5
System
MComposite model
M1 MN Identification calibration
Composition
Modeling and simulation
Observation
Modeling and simulation results
Data assimilation
Data processing
Observation data
Management and control for S D M
EC d
ata-
driv
en a
nd in
telli
gent
pr
oced
ures
Model selection
Domain knowledge
Intelligent proceduresrsquo tuning
System
Data
ModelIntelligentprocedures
ProceduresArtifacts
M1 M
N
Figure 3 Artifacts and procedures within a typical composite solution
P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure
Several important goals may be reached within the pre-sented patterns
(i) automation of complex model management withintelligent solutions DD-models and EC
(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance
(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system
23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned
models 11987210158401 1198721015840
119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel
The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied
(i) to rank and select alternative models(ii) to support model identification calibration compo-
sition and application(iii) to manage artifacts on various conceptual layers in a
systematic way(iv) to infer implicit knowledge from available data and
explicitly presented domain knowledgeThe shown example draws a brief view on the compos-
ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25
24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are
6 Complexity
considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]
(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872
In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)
(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898
Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)
(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr
119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889
119898 119878 times 119872 997888rarr 119876119898
(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878
Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues
25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management
Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)
Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information
Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)
Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)
The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3
26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions
(i) high computational cost due to the multiple runs of amodel
(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure
(iii) complicated tuning of hyperparameters for better ECconvergence
(iv) indistinct definition of genotype boundaries
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
2 Complexity
Ξ
Σ
Parameters Functional characteristics
Structure
S
D
M
Layers
- F-models- DD-models- A-models- complicated transitions- expert knowledge
Φ
Operators ( )LA
Figure 1 Basic concepts of complex modeling on model (119872) data (119863) and system (119878) layers
of the system within a complex state space Still most ofthe solutions are still tightly related to the application andmodeling system
Within the current research we are trying to develop aunified conceptual and technological approach to supportcore operation with a complex model by distinguishing con-cepts and operations on model data and system levels Weconsider a combination of EC and data-driven approachesas a tool for building intelligent solutions for more preciseand systematic managing (and lowering) uncertainty andproviding the required level of automation adaptability andextendibility
2 Conceptual Basis
The proposed approach is based on several key ideas aimedto extend uncertainty management in complex system mod-eling and simulation
(1) Disjoint consideration of model data and system interms of structure behavior and quality is aimed towarda system-level review of modeling and simulation processand distinguishes between the uncertainties of various kindsoriginated from different level [8]
(2) Intelligent technologies like data mining processmining machine learning and knowledge-based approachesare to be hired to fill the gap in automation of modeling andsimulation Key sources for the development of such solutioninclude formalization of various knowledgewithin composite
solutions [9] and data-driven technologies to support theidentification of model components
(3) EC approaches are widespread in modeling andsimulation of complex systems [8 10] We believe thatsystematization of this process with separate consideration ofspaces for a system (with its subsystems) and a model (withits submodels) could enhance such solutions significantly
(4) The aim of the approachrsquos development is twofoldFirst it is aimed towards automation of modeling opera-tions to extend the functionality of possible model-basedapplications Second working with a combination of EC andintelligent data-driven technologies could be considered as anadditional knowledge source for system and model analysis
Furtherly this section considers the conceptual basis ofthe proposed approach with a special focus on the role ofEC algorithms and data-driven intelligent technologies forbuilding and exploiting complex models
21 Core Concepts To distinguish between main modelingconcepts and operations we propose a conceptual framework(see Figure 1) for consideration of key processes and opera-tions duringmodeling of the complex systemThe frameworkmay be considered as a generalization and extension of aframework [11 12] previously defined and used by authorsfor ensemble-based simulation Current research extendsthe concept beyond ensemble-based simulation It is mainlyfocused on complex modeling in general with identifica-tion of key model management procedures and important
Complexity 3
artifacts which can be used for model development andapplication
The proposed framework considers three main layers ofcomplex systemsrsquo modeling namely model (119872) data (119863)and system (119878) Main operations (arrows on the diagram)within the framework are defined within three conceptsquantitative parameters (Ξ) functional characteristics (Φ)and structure (Σ) We denote operations by Γ119860
119871 where 119860 and
119871 stay for concepts and layer (respectively) involved in theoperation Transitions between concepts and between layersare denoted by 1198601 997888rarr 1198602 and 1198711 997888rarr 1198712 respectively egoperator ΓΞ
119878997888rarr119863reflects observation of quantitative param-
eters and operator ΓΞ119863997888rarr119872
stays for basic data assimilationAlso a set of operators may refer to a single modelingoperation eg operators ΓΦ997888rarrΞ
119872and ΓΣ997888rarrΞ119872
are often imple-mented within a single monolithic model Mainly operatorsare related to the specific submodel within a complex modelWe consider three key classes of models F-models are usuallyclassical continuous models developed with knowledge ofa system DD-models are data-driven models based onanalysis of available data sets with corresponding techniques(statistics data mining process mining etc) A-models aremainly intelligent components of a system usually based onmachine learning or knowledge-based approaches Also weconsider EC-based components as belonging to A-modelsclass
A key problemwithin complex systemmodeling and sim-ulation is related to the absent or at least significantly limitedpossibility to observe the structure and functional charac-teristics of the system (operators ΓΦ
119878997888rarr119863and ΓΣ119878997888rarr119863
) directlyThe general solution usually includes implicit substitutionof the operators with the expertise of modeler (operatorsΓΦ119878997888rarr119872
and ΓΣ119878997888rarr119872
) Still the more complex the system underinvestigation and the model are the more limited those oper-ations are To overcome this issue additional DD-models areinvolved (operators ΓΞ997888rarrΣ
119863and ΓΞ997888rarrΦ119863
for mining in availabledata ΓΦ997888rarrΞ
119863997888rarr119872for extended discovery of model parameters for
various functional characteristics) Also A-models are hiredto extend expert knowledge in discovery of119872-layer conceptswith either formalized knowledge or knowledge discoveredin data with machine learning approaches (operators ΓΞ997888rarrΣ
119863997888rarr119872
ΓΞ997888rarrΦ119863997888rarr119872
ΓΣ119863997888rarr119872
and ΓΦ119863997888rarr119872
for direct discovery of struc-ture and functional characteristics directly and operatorsΓΣ997888rarrΦ119863
ΓΦ997888rarrΣ119863
ΓΣ997888rarrΦ119872
and ΓΦ997888rarrΣ119872
for interconnection ofdiscovered characteristics in available data and within theused model) In the proposed approach primary attentionis paid to these kinds of solutions where DD- and A-modelsenable enhancement of complex modeling process with anadditional level of automation adaptation and knowledgeproviding
22 Complex Modeling Patterns Considering the definedconceptual framework we identify several patterns of mod-eling and simulation of a complex system (see Figure 2)The patterns are defined as combinations in a context of theframework described previously (3 layers 3 concepts) Anessential idea of the proposed patterns is systematization of
complex model management approaches with combinationsof expertise intelligent solution (A-models) DD-models andEC
The pattern extends the operators described in Section 21for model building with operators for model application (iemodelling and simulation) and results analysis (eg assessingmodel quality) required for automated model identificationThese additional Operators are denoted with Γ1015840119860
119871and similar
notation for indicesP1 Regular modeling of a system (Figure 2(a)) is a
basic pattern usually applied to discover new knowledgeon the system under investigation A model is built using(a) expertise of modeled for identification of structure andfunctional characteristics of the model (ΓΦ
119878997888rarr119872and ΓΣ
119878997888rarr119872)
(b) available input data usually representing quantitativeparameters of a system considered as a static input of themodel or source for data assimilation (DA) via operatorsΓΞ119878997888rarr119863
and ΓΞ119863997888rarr119872
Results of model application (Γ1015840Ξ119872997888rarr119863
Γ1015840Σ
119872997888rarr119863 and Γ1015840Φ
119872997888rarr119863) could be considered from descriptive
(mainly structural or quantitative characteristics) or predic-tive (often forecasting or other functional characteristics)The obtained results are analyzed in comparison to availableinformation about the investigated system (Γ1015840Ξ
119863997888rarr119878 Γ1015840Σ119863997888rarr119878
and Γ1015840Φ
119863997888rarr119878) forming an optimization loop which can be
considered within the scope of all three concepts Certainlimitations within this pattern being applied to complexsystem modeling and simulation are introduced by twofactors First workingwith complex structural and functionalcharacteristics of the model requires a high level of expertisewhich leads to a limitation of extensibility and automationof model operation Second performing optimization in aloop with most algorithms require multiple runs of a modelAs a result computational-intensive models have limitationsin optimization-based operations (identification calibrationetc) due to performance reasons
P2 Data-driven modeling (Figure 2(b)) provides anextension to the modeling operation describing the rela-tionship between data attributes Application of data-drivenmodels may be considered as replacement of actual ldquofullrdquomodel providing (a) information about structure of systemand model with data mining (DM) and process mining(PM) techniques (ΓΞ997888rarrΣ
119863) (b) generating surrogate models
for functional characteristics (ΓΞ997888rarrΦ119863
) (c) providing estima-tion of investigated parameters with machine learning (ML)algorithms and models (Γ1015840Ξ997888rarrΞ
119863) In contrast to the previous
pattern data-driven models usually operate quickly (althoughit could require significant time to train the model) Stillsuch models have lower quality than original ldquofullrdquo modelsNevertheless combining this pattern with others providessignificant enhancement in functionality and performanceeg data-drivenmodels can be used in optimization loop (seeprevious pattern)
P3 Ensemble-based modeling (Figure 2(c)) extends P1for working with sets of objects (models data sets andstates) reflecting uncertainty variability or alternative solu-tions (eg models) Previously [11] we identified 5 classesof ensembles (see E1-E5 in Figure 2(c)) decomposition
4 Complexity
ΦΣ Ξ
S
D
M
Expe
rtise
Expe
rtise
Quality assessment
Descriptivemodeling
Predictive modeling
Obs
erva
tion
Inpu
tD
A
- data processing
- modeling and simulation- expertise intelligent solution
(a)
ΦΣ Ξ
S
D
M
Mining (DMPM)
ML
Surrogate
Obs
erva
tion
- data processing- data-driven solution EC
(b)
ΦΣ Ξ
S
D
M E1
E2
E4
E3
E1 E5
- data processing- modeling and simulation
(c)
Quality prediction
ΦΣ Ξ
S
D
MMining
(DMPM)
Surrogate
- data processing- modeling and simulation- data-driven solution EC
(d)
ΦΣ Ξ
S
D
M
Mining + EC
EC EC
Surrogate+EC
- data processing- modeling and simulation- data-driven solution EC
(e)
ΦΣ Ξ
S
D
M
EC ECHistory
Generating landscape
Extended quality assessment
- data processing
- modeling and simulation- expertise intelligent solution
- data-driven solution EC
(f)
Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling
ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling
P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872
) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ
119863997888rarr119872) In addition it is possible
to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ
119863997888rarr119878
Γ1015840Ξ997888rarrΣ
119863997888rarr119878 and Γ1015840Ξ997888rarrΦ
119863997888rarr119878)
P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872
) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872
) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)
P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ
119863997888rarr119878and Γ1015840Σ
119872997888rarr119863) and functional
(Γ1015840Ξ997888rarrΦ119863997888rarr119878
and Γ1015840Φ119872997888rarr119863
) characteristics of the actual systemThese patterns could be easily combined to obtain better
results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5
Complexity 5
System
MComposite model
M1 MN Identification calibration
Composition
Modeling and simulation
Observation
Modeling and simulation results
Data assimilation
Data processing
Observation data
Management and control for S D M
EC d
ata-
driv
en a
nd in
telli
gent
pr
oced
ures
Model selection
Domain knowledge
Intelligent proceduresrsquo tuning
System
Data
ModelIntelligentprocedures
ProceduresArtifacts
M1 M
N
Figure 3 Artifacts and procedures within a typical composite solution
P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure
Several important goals may be reached within the pre-sented patterns
(i) automation of complex model management withintelligent solutions DD-models and EC
(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance
(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system
23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned
models 11987210158401 1198721015840
119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel
The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied
(i) to rank and select alternative models(ii) to support model identification calibration compo-
sition and application(iii) to manage artifacts on various conceptual layers in a
systematic way(iv) to infer implicit knowledge from available data and
explicitly presented domain knowledgeThe shown example draws a brief view on the compos-
ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25
24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are
6 Complexity
considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]
(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872
In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)
(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898
Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)
(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr
119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889
119898 119878 times 119872 997888rarr 119876119898
(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878
Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues
25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management
Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)
Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information
Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)
Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)
The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3
26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions
(i) high computational cost due to the multiple runs of amodel
(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure
(iii) complicated tuning of hyperparameters for better ECconvergence
(iv) indistinct definition of genotype boundaries
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 3
artifacts which can be used for model development andapplication
The proposed framework considers three main layers ofcomplex systemsrsquo modeling namely model (119872) data (119863)and system (119878) Main operations (arrows on the diagram)within the framework are defined within three conceptsquantitative parameters (Ξ) functional characteristics (Φ)and structure (Σ) We denote operations by Γ119860
119871 where 119860 and
119871 stay for concepts and layer (respectively) involved in theoperation Transitions between concepts and between layersare denoted by 1198601 997888rarr 1198602 and 1198711 997888rarr 1198712 respectively egoperator ΓΞ
119878997888rarr119863reflects observation of quantitative param-
eters and operator ΓΞ119863997888rarr119872
stays for basic data assimilationAlso a set of operators may refer to a single modelingoperation eg operators ΓΦ997888rarrΞ
119872and ΓΣ997888rarrΞ119872
are often imple-mented within a single monolithic model Mainly operatorsare related to the specific submodel within a complex modelWe consider three key classes of models F-models are usuallyclassical continuous models developed with knowledge ofa system DD-models are data-driven models based onanalysis of available data sets with corresponding techniques(statistics data mining process mining etc) A-models aremainly intelligent components of a system usually based onmachine learning or knowledge-based approaches Also weconsider EC-based components as belonging to A-modelsclass
A key problemwithin complex systemmodeling and sim-ulation is related to the absent or at least significantly limitedpossibility to observe the structure and functional charac-teristics of the system (operators ΓΦ
119878997888rarr119863and ΓΣ119878997888rarr119863
) directlyThe general solution usually includes implicit substitutionof the operators with the expertise of modeler (operatorsΓΦ119878997888rarr119872
and ΓΣ119878997888rarr119872
) Still the more complex the system underinvestigation and the model are the more limited those oper-ations are To overcome this issue additional DD-models areinvolved (operators ΓΞ997888rarrΣ
119863and ΓΞ997888rarrΦ119863
for mining in availabledata ΓΦ997888rarrΞ
119863997888rarr119872for extended discovery of model parameters for
various functional characteristics) Also A-models are hiredto extend expert knowledge in discovery of119872-layer conceptswith either formalized knowledge or knowledge discoveredin data with machine learning approaches (operators ΓΞ997888rarrΣ
119863997888rarr119872
ΓΞ997888rarrΦ119863997888rarr119872
ΓΣ119863997888rarr119872
and ΓΦ119863997888rarr119872
for direct discovery of struc-ture and functional characteristics directly and operatorsΓΣ997888rarrΦ119863
ΓΦ997888rarrΣ119863
ΓΣ997888rarrΦ119872
and ΓΦ997888rarrΣ119872
for interconnection ofdiscovered characteristics in available data and within theused model) In the proposed approach primary attentionis paid to these kinds of solutions where DD- and A-modelsenable enhancement of complex modeling process with anadditional level of automation adaptation and knowledgeproviding
22 Complex Modeling Patterns Considering the definedconceptual framework we identify several patterns of mod-eling and simulation of a complex system (see Figure 2)The patterns are defined as combinations in a context of theframework described previously (3 layers 3 concepts) Anessential idea of the proposed patterns is systematization of
complex model management approaches with combinationsof expertise intelligent solution (A-models) DD-models andEC
The pattern extends the operators described in Section 21for model building with operators for model application (iemodelling and simulation) and results analysis (eg assessingmodel quality) required for automated model identificationThese additional Operators are denoted with Γ1015840119860
119871and similar
notation for indicesP1 Regular modeling of a system (Figure 2(a)) is a
basic pattern usually applied to discover new knowledgeon the system under investigation A model is built using(a) expertise of modeled for identification of structure andfunctional characteristics of the model (ΓΦ
119878997888rarr119872and ΓΣ
119878997888rarr119872)
(b) available input data usually representing quantitativeparameters of a system considered as a static input of themodel or source for data assimilation (DA) via operatorsΓΞ119878997888rarr119863
and ΓΞ119863997888rarr119872
Results of model application (Γ1015840Ξ119872997888rarr119863
Γ1015840Σ
119872997888rarr119863 and Γ1015840Φ
119872997888rarr119863) could be considered from descriptive
(mainly structural or quantitative characteristics) or predic-tive (often forecasting or other functional characteristics)The obtained results are analyzed in comparison to availableinformation about the investigated system (Γ1015840Ξ
119863997888rarr119878 Γ1015840Σ119863997888rarr119878
and Γ1015840Φ
119863997888rarr119878) forming an optimization loop which can be
considered within the scope of all three concepts Certainlimitations within this pattern being applied to complexsystem modeling and simulation are introduced by twofactors First workingwith complex structural and functionalcharacteristics of the model requires a high level of expertisewhich leads to a limitation of extensibility and automationof model operation Second performing optimization in aloop with most algorithms require multiple runs of a modelAs a result computational-intensive models have limitationsin optimization-based operations (identification calibrationetc) due to performance reasons
P2 Data-driven modeling (Figure 2(b)) provides anextension to the modeling operation describing the rela-tionship between data attributes Application of data-drivenmodels may be considered as replacement of actual ldquofullrdquomodel providing (a) information about structure of systemand model with data mining (DM) and process mining(PM) techniques (ΓΞ997888rarrΣ
119863) (b) generating surrogate models
for functional characteristics (ΓΞ997888rarrΦ119863
) (c) providing estima-tion of investigated parameters with machine learning (ML)algorithms and models (Γ1015840Ξ997888rarrΞ
119863) In contrast to the previous
pattern data-driven models usually operate quickly (althoughit could require significant time to train the model) Stillsuch models have lower quality than original ldquofullrdquo modelsNevertheless combining this pattern with others providessignificant enhancement in functionality and performanceeg data-drivenmodels can be used in optimization loop (seeprevious pattern)
P3 Ensemble-based modeling (Figure 2(c)) extends P1for working with sets of objects (models data sets andstates) reflecting uncertainty variability or alternative solu-tions (eg models) Previously [11] we identified 5 classesof ensembles (see E1-E5 in Figure 2(c)) decomposition
4 Complexity
ΦΣ Ξ
S
D
M
Expe
rtise
Expe
rtise
Quality assessment
Descriptivemodeling
Predictive modeling
Obs
erva
tion
Inpu
tD
A
- data processing
- modeling and simulation- expertise intelligent solution
(a)
ΦΣ Ξ
S
D
M
Mining (DMPM)
ML
Surrogate
Obs
erva
tion
- data processing- data-driven solution EC
(b)
ΦΣ Ξ
S
D
M E1
E2
E4
E3
E1 E5
- data processing- modeling and simulation
(c)
Quality prediction
ΦΣ Ξ
S
D
MMining
(DMPM)
Surrogate
- data processing- modeling and simulation- data-driven solution EC
(d)
ΦΣ Ξ
S
D
M
Mining + EC
EC EC
Surrogate+EC
- data processing- modeling and simulation- data-driven solution EC
(e)
ΦΣ Ξ
S
D
M
EC ECHistory
Generating landscape
Extended quality assessment
- data processing
- modeling and simulation- expertise intelligent solution
- data-driven solution EC
(f)
Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling
ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling
P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872
) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ
119863997888rarr119872) In addition it is possible
to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ
119863997888rarr119878
Γ1015840Ξ997888rarrΣ
119863997888rarr119878 and Γ1015840Ξ997888rarrΦ
119863997888rarr119878)
P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872
) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872
) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)
P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ
119863997888rarr119878and Γ1015840Σ
119872997888rarr119863) and functional
(Γ1015840Ξ997888rarrΦ119863997888rarr119878
and Γ1015840Φ119872997888rarr119863
) characteristics of the actual systemThese patterns could be easily combined to obtain better
results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5
Complexity 5
System
MComposite model
M1 MN Identification calibration
Composition
Modeling and simulation
Observation
Modeling and simulation results
Data assimilation
Data processing
Observation data
Management and control for S D M
EC d
ata-
driv
en a
nd in
telli
gent
pr
oced
ures
Model selection
Domain knowledge
Intelligent proceduresrsquo tuning
System
Data
ModelIntelligentprocedures
ProceduresArtifacts
M1 M
N
Figure 3 Artifacts and procedures within a typical composite solution
P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure
Several important goals may be reached within the pre-sented patterns
(i) automation of complex model management withintelligent solutions DD-models and EC
(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance
(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system
23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned
models 11987210158401 1198721015840
119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel
The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied
(i) to rank and select alternative models(ii) to support model identification calibration compo-
sition and application(iii) to manage artifacts on various conceptual layers in a
systematic way(iv) to infer implicit knowledge from available data and
explicitly presented domain knowledgeThe shown example draws a brief view on the compos-
ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25
24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are
6 Complexity
considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]
(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872
In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)
(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898
Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)
(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr
119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889
119898 119878 times 119872 997888rarr 119876119898
(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878
Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues
25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management
Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)
Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information
Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)
Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)
The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3
26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions
(i) high computational cost due to the multiple runs of amodel
(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure
(iii) complicated tuning of hyperparameters for better ECconvergence
(iv) indistinct definition of genotype boundaries
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
4 Complexity
ΦΣ Ξ
S
D
M
Expe
rtise
Expe
rtise
Quality assessment
Descriptivemodeling
Predictive modeling
Obs
erva
tion
Inpu
tD
A
- data processing
- modeling and simulation- expertise intelligent solution
(a)
ΦΣ Ξ
S
D
M
Mining (DMPM)
ML
Surrogate
Obs
erva
tion
- data processing- data-driven solution EC
(b)
ΦΣ Ξ
S
D
M E1
E2
E4
E3
E1 E5
- data processing- modeling and simulation
(c)
Quality prediction
ΦΣ Ξ
S
D
MMining
(DMPM)
Surrogate
- data processing- modeling and simulation- data-driven solution EC
(d)
ΦΣ Ξ
S
D
M
Mining + EC
EC EC
Surrogate+EC
- data processing- modeling and simulation- data-driven solution EC
(e)
ΦΣ Ξ
S
D
M
EC ECHistory
Generating landscape
Extended quality assessment
- data processing
- modeling and simulation- expertise intelligent solution
- data-driven solution EC
(f)
Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling
ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling
P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872
) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ
119863997888rarr119872) In addition it is possible
to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ
119863997888rarr119878
Γ1015840Ξ997888rarrΣ
119863997888rarr119878 and Γ1015840Ξ997888rarrΦ
119863997888rarr119878)
P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872
) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872
) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)
P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ
119863997888rarr119878and Γ1015840Σ
119872997888rarr119863) and functional
(Γ1015840Ξ997888rarrΦ119863997888rarr119878
and Γ1015840Φ119872997888rarr119863
) characteristics of the actual systemThese patterns could be easily combined to obtain better
results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5
Complexity 5
System
MComposite model
M1 MN Identification calibration
Composition
Modeling and simulation
Observation
Modeling and simulation results
Data assimilation
Data processing
Observation data
Management and control for S D M
EC d
ata-
driv
en a
nd in
telli
gent
pr
oced
ures
Model selection
Domain knowledge
Intelligent proceduresrsquo tuning
System
Data
ModelIntelligentprocedures
ProceduresArtifacts
M1 M
N
Figure 3 Artifacts and procedures within a typical composite solution
P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure
Several important goals may be reached within the pre-sented patterns
(i) automation of complex model management withintelligent solutions DD-models and EC
(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance
(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system
23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned
models 11987210158401 1198721015840
119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel
The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied
(i) to rank and select alternative models(ii) to support model identification calibration compo-
sition and application(iii) to manage artifacts on various conceptual layers in a
systematic way(iv) to infer implicit knowledge from available data and
explicitly presented domain knowledgeThe shown example draws a brief view on the compos-
ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25
24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are
6 Complexity
considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]
(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872
In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)
(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898
Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)
(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr
119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889
119898 119878 times 119872 997888rarr 119876119898
(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878
Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues
25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management
Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)
Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information
Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)
Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)
The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3
26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions
(i) high computational cost due to the multiple runs of amodel
(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure
(iii) complicated tuning of hyperparameters for better ECconvergence
(iv) indistinct definition of genotype boundaries
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 5
System
MComposite model
M1 MN Identification calibration
Composition
Modeling and simulation
Observation
Modeling and simulation results
Data assimilation
Data processing
Observation data
Management and control for S D M
EC d
ata-
driv
en a
nd in
telli
gent
pr
oced
ures
Model selection
Domain knowledge
Intelligent proceduresrsquo tuning
System
Data
ModelIntelligentprocedures
ProceduresArtifacts
M1 M
N
Figure 3 Artifacts and procedures within a typical composite solution
P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure
Several important goals may be reached within the pre-sented patterns
(i) automation of complex model management withintelligent solutions DD-models and EC
(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance
(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system
23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned
models 11987210158401 1198721015840
119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel
The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied
(i) to rank and select alternative models(ii) to support model identification calibration compo-
sition and application(iii) to manage artifacts on various conceptual layers in a
systematic way(iv) to infer implicit knowledge from available data and
explicitly presented domain knowledgeThe shown example draws a brief view on the compos-
ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25
24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are
6 Complexity
considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]
(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872
In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)
(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898
Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)
(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr
119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889
119898 119878 times 119872 997888rarr 119876119898
(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878
Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues
25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management
Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)
Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information
Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)
Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)
The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3
26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions
(i) high computational cost due to the multiple runs of amodel
(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure
(iii) complicated tuning of hyperparameters for better ECconvergence
(iv) indistinct definition of genotype boundaries
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
6 Complexity
considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]
(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872
In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)
(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898
Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)
(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr
119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889
119898 119878 times 119872 997888rarr 119876119898
(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878
Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues
25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management
Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)
Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information
Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)
Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)
The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3
26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions
(i) high computational cost due to the multiple runs of amodel
(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure
(iii) complicated tuning of hyperparameters for better ECconvergence
(iv) indistinct definition of genotype boundaries
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 7
(v) complicated mapping of genotype to phenotypespace
To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution
(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)
(ii) ensemble models (P3) which may be considered asinterpretable and controllable population
(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC
(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability
Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures
3 Application Examples
This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis
31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain
several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ
119863997888rarr119872in P5 (for detailed sensitive analysis
of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach
In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used
Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
8 Complexity
RMSE
(m)
2
4
6
8
1 2 3
10
Drag
log(WCR)
0 minus5 minus10 minus15 minus20
NCEPERA
(a)
Log(RMSE)
minus03
minus02
minus01
0
16
Drag WCR4e-5
coNCEPcoERA
3e-52e-5
1e-5
coEnsemble
14 12 1 08 06
01
02
(b)
Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations
Generation
RMSE
(m)
2 3 4 5 6 7 8 9 10 11 12
08
10
12
14
16
18
Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models
from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5
Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in
the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast
Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 9
Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)
Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration
32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient
To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ
119863997888rarr119878ΓΞ119878997888rarr119863
for analysis of Σ on layer 119878)
we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers
We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872
while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
10 Complexity
00 02 04 06 08 10 Number of non-arranged sequences (normed)
12
35
30
20
15
10
25
05
00
minus05
Leng
th o
f pat
tern
(nor
med
)
0
0
1
1
2
2
3
3
4
4
5
5
AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA
Figure 7 Pareto frontier for CP patterns discovery
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
Generation
10
08
06
04
02
Nor
med
sum
erro
r
Figure 8 Evolutionary convergence during CP pattern discovery
Entrance289 AD
289Discharge
died13 transferred9
home119 cure138OR
170
IC104
CC13
CD2
AD OD
4
CD
48
IC225
104 12
1
12
2 9
CD1111
230
2 39
14
172
Cluster 7
Figure 9 Example of process model showing transfers between hospitalrsquos departments
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 11
Cluster 0Cluster 1Cluster 2
Cluster 3Cluster 4
Cluster 5Cluster 6
0 2 4 6 8 10 12 14
50
100
150
200
250
0
Step
Dist
ance
sym
bols
(a)
Historical CPs Synthetic CPs Target CP
(b)
0
20
40
60
80
100
Num
ber o
f syn
thet
ic C
Ps
Step0 1 2 3 4 5 6 7 8 9
(c)
Step0 1 2 3 4 5 6 7 8 9
o
f CPs
in co
rrec
t clu
ster
100
90
80
70
60
50
40
30
20
(d)
Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster
described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)
Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ
119872997888rarr119863in P5 and ΓΞ997888rarrΣ
119863in P2) We consider conver-
gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient
Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers
33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
12 Complexity
1
01
001
0001
100
Frac
tion
of u
sers
(log
)8 9
12 3 4 5 6 7 89 2 3 4 5 6 7 8 9
10 100
Activity count (log)
RepostPostComment
Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community
Post Repost50
Comment8End
28
1
37
Post124
2
5 Repost
Post
6Entrance198
86
112
4
24
9651
47 Comment
3
21
2
Repost30
1
3Post
1 19
1411
Repost
4 4
Cluster 2
(a)
Entrance1156
Post
905 Repost
251
330125809
Comment1450End
753
5802
4249
124350
1305216 770
53
Cluster 1
(b)
Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles
A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset
consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts
We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 13
CPP152
PPR
187
PRR
79
CRR
82
CCP
56
CCR34
213
1994
87
PPP
1054
660
160
122
67953701
2195
1168
3705
50512
66
2244
2180
RRR
989
49
484654
CCC
736
25056
601
274549
48
60
1013
1999
Cluster 3 1120 users
228
CPR
Figure 13 Example of process model for cluster 3 using n-gram analysis
Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters
Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277
and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models
N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters
That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns
This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation
4 Conclusion and Future Work
Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
14 Complexity
Cluster 3
Cluster 1
Cluster 2
Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity
of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions
(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns
(ii) extended analysis of various EC techniques applicablewithin the approach
(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions
(iv) detailed formalization of expertise and knowledge-based methods within the approach
(v) extending the approach with interactive user-centered modelling and phase space analysis
(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872
Data Availability
The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)
References
[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010
[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006
[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003
[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016
[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004
[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012
[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012
[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017
[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014
[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017
[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 15
[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017
[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia
[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013
[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017
[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016
[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018
[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018
[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017
[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016
[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014
[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005
[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom