Date post: | 18-Jun-2015 |
Category: |
Technology |
Upload: | larca-upc |
View: | 1,696 times |
Download: | 1 times |
www.infer.eu
Adaptive pre-processing for streaming data
Indrė Žliobaitė[email protected], Bournemouth University
●2011 September 22
INFER project 2010-2014
Computational INtelligence platform For Evolving and Robust predictive systems
● EC project within the Marie Curie Industry-Academia Partnerships & Pathways (IAPP), 1,55 MEUR
● Three partners from UK, Germany, Poland
● extended secondments for 23 researchers to move sector and country for industry-academia knowledge sharing
Objectives
● Area 1: computational intelligence● advanced mechanisms for adaptation● multi-component multi-level evolving predictive systems● robustness and complexity management
● Area 2: software engineering● software platform for building robust predictive systems
● Area 3: process industry applications● adaptive and self-monitoring soft sensors for process industry
Objectives
● Area 1: computational intelligence● advanced mechanisms for adaptation● multi-component multi-level evolving predictive systems● robustness and complexity management
● Area 2: software engineering● software platform for building robust predictive systems
● Area 3: process industry applications● adaptive and self-monitoring soft sensors for process industry
Soft sensor – a computational model in process industry.Outputs are computed using sensor readings as inputs.
Pecularities of the problem setting
Typical data streams setting● mostly classification tasks
● not identically distributed over time
Industrial process setting● mostly regression* tasks
● not identically and not independently distributed
● data not iid
Pecularities of the problem setting
Typical data streams setting● mostly classification tasks
● not identically distributed over time
● assumes that data arrives clean and pre-processed
● typically assumes immediate feedback
● optimizes accuracy and speed
Industrial process setting● mostly regression* tasks
● not identically and not independently distributed
● emphasis on data preparation and pre-processing, denoising, handling missing values
● feedback is lagging, costly or not available at all
● +emphasis on robustness (reliability, confidence)
● data not iid
Pecularities of the problem setting
Typical data streams setting● mostly classification tasks
● not identically distributed over time
● assumes that data arrives clean and pre-processed
● typically assumes immediate feedback
● optimizes accuracy and speed
Industrial process setting● mostly regression* tasks
● not identically and not independently distributed
● emphasis on data preparation and pre-processing, denoising, handling missing values
● feedback is lagging, costly or not available at all
● +emphasis on robustness (reliability, confidence)
● data not iid
Adaptive learning systems
Example: data stream
Chemical production plantgiven sensor readingspredict the quality of the output24/7 plant operation
Process changes
Model does not change
source: Evonik Industries
Adaptive online learning
● Data arrives online, neverending● Data distribution is changing over time● Limited access to historical data
● in large data streams – no access
● Predictive models need to have adaptation mechanisms ● update or retrain and replace models to match recent data● otherwise accuracy will degrade over time
Adaptive online learning
strategies
REGULARLY EVOLVINGe.g. training
windows
WITH TRIGGERSe.g. change
detectiors
...
model
...
model model
...
model
...
model
change
+singel model or ensemble of models
Adaptive learningmode
INCREMENTALupdate model
RETRAININGreplace model
oldmodel
... ...
oldmodel
...
newmodel
Adaptive learningmode
INCREMENTALupdate model
INSTANCEupdate with every
new instance
BATCHupdate
in batches
Ensembleadd/ remove
models
FULLretrain with
new data
PARTIALreplace part
of model
RETRAININGreplace model
Current situation
● Many adaptive learning approaches are available● Majority of the existing approaches assume that
● data comes already pre-processed, or– data analysts say that data preparation takes 80-90% of modelling time
● pre-processing is trained at the begining and remains fixed,or – limited adaptivity of the system
● tied, pre-processing adapts whenever predictor adapts
...
Fixed pre-processing
raw data stream
pre-processed
predictions
train pre-processing
train predictor
adapt predictor
...
...
Tied pre-processing
raw data stream
pre-processed
predictions
train pre-processing
and train predictor
adapt predictor
...
re-train pre-processing
adapt predictor
re-train pre-processing
adapt predictor
Current situation
● Many adaptive learning approaches are available● Majority of the existing approaches assume that
● data comes already pre-processed, or– data analysts say that data preparation takes 80-90% of modelling time
● pre-processing is trained at the begining and remains fixed,or – limited adaptivity of the system
● tied, pre-processing adapts whenever predictor adapts● It may be beneficial to decouple adaptation of pre-
processing from adaptation of predictor
... raw data stream
pre-processed
predictions
train pre-processing
and train predictor
adapt predictor
...
re-train pre-processing
re-train pre-processing
adapt predictor
Decoupled adaptivity
Adaptive pre-processing?
Online predictive model
MODEL
...
2 . output prediction
3. receive feedback
4. update model
1. receive current data
...
5. receive new data
Online prediction system
PRE-PROCESSING
...
2 . output prediction
3. receive feedback
4. update model
1. receive current data
...
5. receive new data
PREDICTOR
??
Decoupling adaptivity – why?
● Forced● Different modes of adaptivity: predictor updates
incrementally, pre-processing need retraining (batches)● May be beneficial
● one of the elements may be still good enough– changes in data do not change the relation between concepts
(classes) in data, e.g. change in noise● different amounts of training data required
Different amounts of data for accurate training
● synthetic Gaussian data, binary classification problem● assume known change point
STATIC SITUATION DATA STREAM
Challenges
● Consistency of feature representation over time● Consistency of feedback over time
raw data
prediction
Predictionelement
Pre-processingelement
1. transformed data2. feedback
1. Example: feature representation
If we modify pre-processing element, input to predictive element changes
Challenges
raw data
prediction
Predictionelement
Pre-processingelement
1. transformed data2. feedback
adaptive mode ofPre-processing el. incremental incremental retrain retrainPredictive el. incremental retrain incremental retrain1 transformation evolving evolving shock shock2 feedback evolving shock evolving shock
no prob. small prob. problem prob if not sychron.
Challenges
raw data
prediction
Predictionelement
Pre-processingelement
1. transformed data2. feedback
adaptive mode ofPre-processing el. incremental incremental retrain retrainPredictive el. incremental retrain incremental retrain1 transformation evolving evolving shock shock2 feedback evolving shock evolving shock
no prob. small prob. problem prob if not sychron.
Research questions for adaptive pre-processing
● How to decide● when to adapt pre-processing and when to adapt predictor
● How to integrate● adaptivity of two elements when pre-processing complely
transforms the input space (PCA)● How to handle
● the `shock' of new pre-processing output in the incremental learning mode
● How to monitor and detect● the need for adapting the pre-processing element in very
high dimensional spaces
Some experimental evidence
Case study● 2,5 years of data, readings every 5 min● 86 sensors (features), ~170 th. instances ● classification problem
Strategies● Strategies with fixed training windows
● old-old, old-new, new-old, new-new
● Online strategy selection adaptive pre-processing
Results● Naive Bayes, SVM, CART tree● e.g. NB: online strategy selected
● old-old 58% of times, old-new 15%, new-old 17% and new-new 10%
● it means decoupling is useful
Conclusion
Conclustion● If we want to automate online learning, we need
to automate pre-processing as well● Decoupling of adaptivities may be necessary if
different modes of adaptivity are used● Decoupling may be beneficial to accuracy due
to different amounts of training data required● Experiments with synthetic and real data show
that there is a room for adaptive (decoupled) pre-processing
AcknowledgementsPart of the research leading to these results has received funding from the EC within the Marie Curie Industry and Academia Partnerships and Pathways (IAPP) programme under grant agreement no. 251617.