+ All Categories
Home > Documents > Classification for Lifelog Management

Classification for Lifelog Management

Date post: 06-Jan-2016
Category:
Upload: zea
View: 49 times
Download: 0 times
Share this document with a friend
Description:
Classification for Lifelog Management. September 30, 2008 Sung-Bae Cho. Agenda. Introduction to Classification Discriminant Analysis Decision Tree Density Estimation Artificial Neural Networks Comparison of Classification Methods An Example of Classification for Life Log Analysis. - PowerPoint PPT Presentation
Popular Tags:
47
Classification for Lifelog Management September 30, 2008 Sung-Bae Cho
Transcript
Page 1: Classification for  Lifelog  Management

Classification for Lifelog Management

September 30, 2008

Sung-Bae Cho

Page 2: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 3: Classification for  Lifelog  Management

Classification

• Definition

– A procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items.

• Formally, the problem can be stated as follows:

– Given training data produce a classifier which maps an object to its classification label .

• For example, if the problem is filtering spam, then Xi is some representation of an email and y is either "Spam" or "Non-Spam".

• Classification algorithms are typically used in pattern recognition systems.

3

Page 4: Classification for  Lifelog  Management

Other Types of Classification Problem

• Can consider classification as an estimation problem, where the goal is to estimate a function of the form

– where the feature vector input is , and the function f is typically parameterized by some parameters .

• In the Bayesian approach to this problem, instead of choosing a single parameter vector , the result is integrated over all possible thetas

– with the thetas weighted by how likely they are given the training data D:

4

Page 5: Classification for  Lifelog  Management

Features of Classification

• Structure of a classification task

– Prior probability: relative frequency with which the classes occur in the population

– Criterion for separating the classes: underlying relation that uses observed attributes to distinguish an individual from each class

– Misclassification cost: cost associated with making a wrong classification

• Benefits

– Mechanical classification is faster

– Human operators may have biases

– Cheaper, diagnosis made on external symptoms, avoiding surgery

– Understanding structural relationships between the response and the measured variables

• Issues

– Accuracy, Speed, Comprehensibility

– Time to learn

5

Page 6: Classification for  Lifelog  Management

Examples of Classification Algorithms

• Discriminant analysis

– Linear / Quadratic discriminant

– Logistic regression

– Support vector machine

• Decision tree

• Density estimation

– k-nearest neighbor

• Artificial neural networks

– Perceptron

– Multi layer perceptron

– Naive Bayes classifier

• Probabilistic classification

– Bayesian networks*

– Temporal classification

• Hidden Markov models*

• Dynamic Bayesian networks* * at the next lectures

6

Page 7: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 8: Classification for  Lifelog  Management

Discriminant Analysis

• Definition

– Divide sample space by a series of lines in two dimensions, planes in 3-D and, more generally hyper planes in many dimensions

– The line dividing two classes is drawn to bisect the line joining the centers of those classes

– The direction of the line is determined by the shape of the clusters of points

8

Page 9: Classification for  Lifelog  Management

Discriminant Analysis: Example

9

Page 10: Classification for  Lifelog  Management

Discriminant Analysis: Issues and Properties

• Requires numerical attributes, no missing values

• Dummy variables for categories

• Maximizes separation between the classes in a least square sense

• Attributes must be linearly independent

• No attribute can be constant within each class (add random noise)

• Canonical variates for multiclass

• Specially useful when the class means are ordered, or lie along a simple curve in attribute-space

• Logistic maximizes conditional P(class|x)

• Assumes that the form of the underlying density functions (or ratio) is known

10

Page 11: Classification for  Lifelog  Management

Support Vector Machines

• Definition:

– A set of related supervised learning methods used for classification

– Viewing input data as two sets of vectors in an n-dimensional space

– Constructing a separating hyperplane in the space

• which maximizes the margin between the two data sets

11

Page 12: Classification for  Lifelog  Management

Linear Support Vector Machines

SupportVectors

12

Page 13: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 14: Classification for  Lifelog  Management

Decision Tree

• Definition

– Space is divided into boxes, and at each stage in the procedure, each box is examined to see if it may be split into more boxes

– The split usually being parallel to the coordinate axes

• Structure

14

Page 15: Classification for  Lifelog  Management

Decision Tree: Example

15

Page 16: Classification for  Lifelog  Management

Classification Process of a Tree

• Generate_tests:

– Attr = val, Attr < val, Attr In Subset, …

• Test:

– Gini, Entropy, KS, Twoing, …

• Stop_criterion:

– Num cases, significance (e.g., c2-test), overgrow & prune

• Info:

– Most frequent class ("mode"), …

• Prune:

– Reduced-error, cost complexity, …

Training DataTraining Data

Classification RulesClassification Rules

Learning AlgorithmLearning Algorithm

Testing DataTesting Data

16

Page 17: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 18: Classification for  Lifelog  Management

Density Estimation

• Definition

– Classification rule that assigns x to class Ad if: p(Ad|x) = maxi p(Ai|x), maximizes the a posteriori probability

– For two classes Ai and Aj

– Kernel methods

– Naive Bayes assumes variable independency

– Projection methods

• Example: Nearest Neighbors

– 1-nearest neighbor rule, in which the training set is searched for the ‘nearest’ (in a defined sense) previous example, whose class is then assumed for the new case

– K-nearest neighbor, the natural extension taking the most frequent class in the neighbors

– Parameter free model (distance!)

)(),()(),( xfjicxfijc iijj

18

Page 19: Classification for  Lifelog  Management

Nearest Neighbours: Example

19

Page 20: Classification for  Lifelog  Management

Density Estimation: Issues

• Presence of irrelevant variables affect model

• Require numerical variables, or special distance function for categorical variables

• Variable’s scale affect the model

• All of the sample must be stored

20

Page 21: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 22: Classification for  Lifelog  Management

Artificial Neural Networks

• Black Box Modeling

• An artificial neural network (ANN) is a mathematical model or computational model based on biological neural networks

• It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation

• In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase

22

Page 23: Classification for  Lifelog  Management

w4

w3

w2

w5

w1

I1

sum O f(sum)

I2

I3

I4

I5

Inputs Weights

Output

sum = wn Inf: activation function

The Neuron Model

Oi = fi(wn In)

23

Page 24: Classification for  Lifelog  Management

Multi Layer Perceptron Model

I1

I4

I5

I2

I3O1

feedforward

feedback

e = d1 - o1w31

w33

w34

w32

e31

e33

e34

e32

e: error

d: desired output

o: network output

w: weights

PATTERN

24

Page 25: Classification for  Lifelog  Management

• Universal approximator

• Weights are modified to minimize classification error

• The outputs represent the P ( Class | example )

Artificial Neural Networks: Principles

j

jijii

YyE 2)(2

1

))()(

()(21

dw

wdE

dw

wdEwE

)(wEww

25

Page 26: Classification for  Lifelog  Management

Back-propagation Learning

• The error E is a function of the weights and the training data

• Minimize error E using the steepest descent

E

minimum

Wnew = Wold -E/ Wold

W

: learning rate

Downhill = -E / W

26

Page 27: Classification for  Lifelog  Management

Artificial Neural Networks: Issues

• To make a generalizable network one has to determine a suitable number of hidden nodes

– Too few network does not learn enough…

– Too many over-learning

27

Page 28: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 29: Classification for  Lifelog  Management

Comparison of Classification Methods

• MV: Missing Values

• Cost matrix: learning(L),testing(T) or not(N)

• Interpretability: easily understood classifier

• Method: comprehensibility of the method

• Parameters: good guidelines or automatic selection of parameters

• User-fr: user friendliness

• Data: numerical, categorical

Algo. MV Cost Interpr. Method Param’s User-fr Data

Density N T 1 5 2 N N

Discriminant N T 3 4 4 Y N

Tree Y LT 5 4 5 Y NC

Neural

NetsN T 1 3 3 N N

29

Page 30: Classification for  Lifelog  Management

• Introduction to Classification

– Discriminant Analysis

– Decision Tree

– Density Estimation

– Artificial Neural Networks

• Comparison of Classification Methods

• An Example of Classification for Life Log Analysis

Agenda

Page 31: Classification for  Lifelog  Management

VTT Technical Research Center at Finland

• Representing complex entities: Any single methodology alone is often insufficient

– For example, knowledge representation needs a hierarchy utilizing heterogeneous methodologies (Minsky)

• Laerhoeven and Cakmakci (2000)

– 4 layers (statistical cues, Kohonen clustering, kNN classification, and a Markov chain supervisory layer)

• Advantage of using high-level features

– Control and ‘accurate touch’ to the data: Maintained up to the classification layer

– If the feature itself is discriminative enough: classification layer is not even necessary

Sensing

Service

Modeling & Classifying

Representation

Clustering

Fuzzy Logic

Naïve Bayes

Fuzzy Control

31

Page 32: Classification for  Lifelog  Management

Overview

• Disadvantage of using high-level features

– The development of proper signal processing algorithms is laborous (compared to the user of very low-level features)

32

Page 33: Classification for  Lifelog  Management

Context Representation

• A central problem in utilizing context information in application development

– Difficulty to add meaning of concepts from real world to extracted features

– “When environment loudness is SILENT, set ringing volume LOW” > “When environment loudness is -20dB, set ringing tone volume to 2.5”

• Context extraction: Abstract raw sensor signals and compress information

– How well features describe some parts of the real world context

– Extracted features: Context atoms (the smallest amount of context information)

33

Page 34: Classification for  Lifelog  Management

Context Information Processing (1)

34

Page 35: Classification for  Lifelog  Management

Context Information Processing (2)

• Context atom vector (k=maximum number of context atoms, the values of context atoms are between 0 and 1)

• Advantages

– Composing the ontology for sensor-based context information

• Direct control of various types of applications

– Multidimensional context atom vector C: Utilization of explorative data analysis methods

– Extendable (Straightforward to add new contexts atoms into a vector C)

35

Page 36: Classification for  Lifelog  Management

Some of the Quantized Features

36

Page 37: Classification for  Lifelog  Management

Naive Bayes Classifier

Structure

Joint Probability Distribution

CPT (Conditional probability table)learning

Conditional probability distributioncounting

37

Page 38: Classification for  Lifelog  Management

Experiment

• Two kinds of data

– Measured in controlled conditions (1)

– Real world scenario (2)

• Experiment (1)

– Nine individual context class data sets were recorded (each five times separately in controlled conditions)

– Length of each nine-class data set: 30 seconds (30 sets of 1 seconds atoms)

– Nine sets, each to the sum of 150 vectors

– To create variability in the data

• Five samples of different songs from two albums

• Speech and acceleration-based contexts by four different persons

– Training data set = test data set

38

Page 39: Classification for  Lifelog  Management

Experiment (2)

• Experiment (2)– Home environment– Multiple simultaneous contexts– Only those contexts that can be defined in each phase are counted– Five testees (performed the scenario twice) ten test data set with a

length 6-8 minutes– The last test data set ignored (box wires had been broken)– Final amount of scenario data sets 9 (60 minutes)– Different songs, three different cars– Manual on-the-fly segmentation is not accurate, a lot of disturbances– 1) training data set = test data set 2) cross validation

39

Page 40: Classification for  Lifelog  Management

Recognition Results as an Intensity Graph

Resolution= 1 second (5 second moving average)

40

Page 41: Classification for  Lifelog  Management

Classification Accuracy (9 contexts, 2 BN)

41

Page 42: Classification for  Lifelog  Management

Recognition Results

• Recognition Results for 13 Individual Contexts (9 scenarios)

BNDirect

42

Page 43: Classification for  Lifelog  Management

Context Data

• Data gathering Laptop

• 1 seconds resolution (4 seconds sliding window)

• 3 minutes

• Experiments Offline

43

Page 44: Classification for  Lifelog  Management

Experiments with User Scenario

• Fluctuation of control signals

– Oscillation of various information representation levels

– Moving averaging methods or smoothing in changing font sizes

• Extension (adding new types of contexts)

– Complicate the development of a control system

44

Page 45: Classification for  Lifelog  Management

User Reactions

• Ten cell phone users were interviewed– 5 of them normal users (use basic communication applications)– 5 of them active cell phone users (use actively most of the cell phone

applications and features)

• Interview process– Explaining scenario presented in the table to a user– Explaining to a user how adaptive applications operate in general and

particularly during the user scenario• Control signals and screenshots of adapted service contents

– Users were interviewed

• Application adaptation is an acceptable feature, but mistakes in adaptation are not accepted

• Users want to have control over the device’s operations

45

Page 46: Classification for  Lifelog  Management

Summary

• Classification for context awareness

• Classification method application

– Applying naïve Bayesian classifier in learning and classifying the contexts of a mobile device user

– Mobile device-oriented sensors are embedded (instead of being distributed into the environment)

– To recognize static short-term events (descriptive and generic at the intermediate levels of the context representation layer structure), sequences of these events could be utilized at the upper level

– Naïve Bayesian learning and inference: Straightforward and computationally very efficient

46

Page 47: Classification for  Lifelog  Management

Bibliography

• Book: Machine Learning, Neural and Statistical Classification. Editors: D. Michie, D.J. Spiegelhalter, C.C. Taylor, Statlog Project

• Book: Weiss, S. M. and Kulikowski, C. A. (1991). Computer systems that learn: classification and prediction methods from statistics, neural networks, machine learning and expert systems. Morgan Kaufmann, San Mateo, CA.

• Book: Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. Academic Press, 2nd edition

• Paper: C.M. van der Walt and E. Barnard, “Data characteristics that determine classifier performance”, in Proceedings of the Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa, pp.160-165, 2006

• Classifier showdown: A practical comparison of classification algorithms - Link

• Statistical Pattern Recognition Toolbox for Matlab - Link

47


Recommended