Date post: | 08-Apr-2017 |
Category: |
Education |
Upload: | marlon-dumas |
View: | 357 times |
Download: | 4 times |
Offline Process Mining
3
/
event log
discovered modelDiscovery
Conformance
Deviance
Differencediagnostics
Performance
input model
Enhanced modelevent log’
Offline Process Mining: The Apromore Approach
4
/
event log
discovered modelDiscovery
Conformance
Deviance
Differencediagnostics
Performance
input model
Enhanced modelevent log’
BPMN Miner
Log Delta
Analysis
Behavioral Alignment
All integrated into:http://apromore.org
Automated Process Discovery
5
CID Task Time Stamp …
13219 Enter Loan Application 2007-11-09 T 11:20:10 -
13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 -
13220 Enter Loan Application 2007-11-09 T 11:22:40 -
13219 Compute Installments 2007-11-09 T 11:22:45 -
13219 Notify Eligibility 2007-11-09 T 11:23:00 -
13219 Approve Simple Application 2007-11-09 T 11:24:30 -
13220 Compute Installements 2007-11-09 T 11:24:35 -
… … … …
Difference statements
Event log
Input model
PESM
unfold
PESL
merge
Partially Synchronized Product (PSP)
compare
extract differences
Conformance Checking with Behavioral Alignment
Conformance Checking with Behavioral Alignment
Desired conformance output:• task C is optional in the log• the cycle including IGDF is not observed in the log
Log traces:ABCDEHACBDEHABCDFHACBDFHABDEHABDFH
L. Garcia-Banuelos, N.R. van Beest, M. Dumas, M. La Rosa, W. Mertens, Complete and Interpretable Conformance Checking of Business Processes, Technical Report, IEEE Transactions on Software Engineering, in press.
Given two logs, find the differences and root causes for variation or deviance between the two logs
Simple claims and quick Simple claims and slow
Deviance Mining
MODEL
S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013
Deviance Mining via Sequence Classification
• Apply discriminative sequence mining methods to extract features characteristic of one class
• Build classification models (e.g. decision trees)• Extract difference diagnostics from classification model
C. Sun et al. Mining explicit rules for software process evaluation. ICSSP’2013.
Difference statements
Event log
Input model
PESM
unfold
PESL
merge
Partially Synchronized Product (PSP)
compare
extract differences
Log Delta Analysis
Difference statements
Event log
Input model
PESM
unfold
PESL
merge
Partially Synchronized Product (PSP)
compare
extract differences
22
Difference statements
Event log
Input model
PESM
unfold
PESL
merge
Partially Synchronized Product (PSP)
compareextract
differences
N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405
Sequence classification vs. log delta analysis
L1 - Short stay448 cases
7329 events
L2 - Long stay363 cases
7496 events
Sequence classification 106-130 statements
IF |“NursingProgressNotes”| > 7.5 THEN L1IF |“Nursing Progress Notes”| ≤ 7.5 AND |“Nursing Assessment”| > 1.5 THEN L2…
Log delta analysis48 statements
In L1, “Nursing Primary Assessment” is repeated after “Medical Assign” and “Triage Request”, while in L2 it is not…
N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405
Apromore Process Analytics Platform (apromore.org)Open-source, highly scalable, SaaS BPM analytics platform
M. La Rosa, H. Reijers, W. van der Aalst, R. Dijkman, J. Mendling, M. Dumas, L. Garcia-Banuelos “APROMORE: an advanced process model repository”, EXP.SYS.APP. 2011
How likely is it that a running process will become “deviant”?
Will it end up in a negative
outcome?
Will it fail to meet its SLAs in the next 24
hours?
Will it generate abnormal
effort, costs or rework?
Beyond Deviance Mining:Predictive Process Monitoring
20
Debt repayment due Call the debtor Send a reminder Payment received
Predictive Monitoring Example: Debt Recovery Process
Debt repayment due Call the debtor Send a reminder Send a warning Call the debtor Call the debtor
Send to external debt collection agency
Call the debtorSend a reminder Send a warning Call the debtor Call the debtorCall the debtor
Call the debtor
Call the debtor
Call the debtor
Call the debtor Call the debtor
21
Predictive Monitoring Example: Debt Recovery Process
Event log
Classifier
/Outcome
Predictions
Attributes
Trac
es
Predictive Process Monitoring: General Approach
22
Event log
Regressor / structured predictor
Future “paths” prediction
Attributes
Trac
es
23
PredictorDecision tree
learning
Decision tree
Class estimation
Current trace[Data+] Prediction
Predictive Monitoring: Runtime Nearest-Neighbors Approach
Trace ProcessorkNN extraction
(string-edit distance)
Current trace[Event+]
Event log
Similar execution traces
Feature extraction
Labeled samples
Current trace[Data+]
F.M. Maggi, C. Di Francescomarino, M. Dumas, C. Ghidini. Predictive Monitoring of Business Processes. CAiSE'2014
24
• BPI Challenge 2011 dataset• Healthcare process at Dutch hospital• 1141 cases, avg length 14 events/case• Split normal-deviant via 5 predicates: φ1–φ5• Prediction made at:• Start event (initial event)• Early event (ca. ¼ of the trace)• Middle
Evaluation Setup
25
• Reasonably accurate at mid-point (AUC 0.78-0.88)
• High runtime overhead 5-10 secs / prediction
Evaluation Results
26
Predictive Process Monitoring: Cluster & Classify
Pre-processing
Historical execution
traces
Running trace
Runtime
Clustering ClustersControl
flow encoding
Encoded control
flow
CONTROL FLOW
Prefix extraction
Trace Prefixes
Predictive MonitoringControl
flow encoding
Data encoding
Cluster(s) identification
Classification
Prediction Problem
Prediction
Supervised Learning Classifiers
Data encoding
Encoded data
DATALabeling function
AUC of 0.6 to 0.85 with a lot of variation
27
Each technique has its own hyperparametersOther parameters:• Trace prefix size• Voting mechanism• Interval choice in case of interval time predictions
Predictive Process Monitoring: Cluster & Classify with Hyperparameter Optimization
• Four outcome labellings of a large real-life patient treatment dataset
Experimental Settings
Dataset preparation:• Training set (70%)• Validation set (20%)• Testing set (10%)
Identification of the most suitable configurations (among 160)
Evaluation of the identified
configurations (with the testing set)
• No unique best configuration.• Accuracy is consistently high and accuracy on testing set
consistent with the tuning.
Evaluation Results
Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi, Irene Teinemaa . Clustering-Based Predictive Process Monitoring. IEEE Transactions on Services Computing, 2017.
31
• Idea: One classifier per index• Classifier for prefixes of length 1• Classifier for prefixes of length 2• Etc.
• Traces of length m encoded using an index-based schem
• At runtime, classify a trace of length m using the corresponding classifier
Index-Based Multi-Classifier
Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi: Complex Symbolic Sequence Encodings for Predictive Monitoring of Business Processes. Proc. Of BPM 2015, pp. 297-313.
32
• Same as before, but feature vector of a prefix extended with Log-Likelihood Ratio of being in the deviant or regular class according to a Hidden-Markov Model
Index-Based Multi-Classifier + HMM
Text-Extended Index-Based Encoding
37
• Bag-of-N-grams• Weighted bag-of-N-grams• Latent Dirichlet Allocation (LDA)• Paragraph Vector (PV)
Debt Recovery Lead-to-contract
# normal cases 13608 385# deviant cases 417 390Avg # words per doc 11 8# lemmas 11822 2588
Evaluation Setup
38
• Data split: 80% train, 20% test (randomly)• Handling imbalance: oversampling• Classifiers: random forest and logistic regression• Evaluation metrics: F-Score and earliness• Parameter-tuning: grid search with 5-fold cross validation on
training set
Ongoing workLSTM-Based Predictive Process Monitoring
40Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas: Predictive Business Process Monitoring with LSTM Neural Networks. CoRR abs/1612.02130 (2016).
• Accurate, robust techniques to predict case outcome, covering control-flow, structured and textual data
• LSTM-based architecture to predict• Next task + timestamp + resource or other attributes• Remaining execution path and time
• All code available:• Clustering-based method: http://goo.gl/ykozBf• Index-based method: https://goo.gl/BQFk7k• Index-based method with textual features: https://
goo.gl/a2DoWT• LSTM-based method: https://goo.gl/mkQDyy
Online predictive process monitoring
41