+ All Categories
Home > Documents > Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Date post: 31-Dec-2015
Category:
Upload: eagan-lancaster
View: 36 times
Download: 2 times
Share this document with a friend
Description:
Using Discretization and Bayesian Inference Network Learning for Automatic Filtering Profile Generation. Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang. Contents. Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization - PowerPoint PPT Presentation
Popular Tags:
38
Using Discretization and Bayesi Using Discretization and Bayesi an Inference Network Learning f an Inference Network Learning f or Automatic Filtering Profile or Automatic Filtering Profile Generation Generation Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Transcript
Page 1: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Using Discretization and Bayesian InferencUsing Discretization and Bayesian Inference Network Learning for Automatic Filterine Network Learning for Automatic Filterin

g Profile Generationg Profile Generation

Authors: Wai Lam and Kon Fan Low

Announcer: Kyu-Baek Hwang

Page 2: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 3: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 4: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Information FilteringInformation Filtering

Page 5: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

The Filtering ProfileThe Filtering Profile

Information filtering system deals with users who have a relatively stable and long-term information need.

An information need is usually represented by a filtering profile.

Page 6: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Construction of the Filtering ProfileConstruction of the Filtering Profile

Collect training data through the interactions with users. Ex) gathering user feedback information about the relevance

judgments for a certain information need or topic.

Analyze this kind of training data and construct the filtering profile by machine learning techniques.

Use this filtering profile to determine the relevance of a new document.

Page 7: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

The Uncertainty IssueThe Uncertainty Issue

It is difficult to specify absolutely whether a document is relevant to a topic as it may only partially match with the topic. Ex) “the economic policy of government”

The probabilistic approach is appropriate for this kind of task.

Page 8: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 9: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

An Overview of the ApproachAn Overview of the Approach

Transformation of each document into an internal form

Feature selection

Discretization of the feature value

Gathering training data by interactions with users

Bayesian network learning

- For each topic

Page 10: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 11: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Document RepresentationDocument Representation

All stop words are eliminated. Ex) “the”, “are”, “and”, etc.

Stemming of the remaining words. Ex) “looks” “look”, “looking” “look”, etc.

A document is represented by a vector form. Each element in the vector is either the word frequency or the wor

d weight. The word weight is calculated as follows:

where N is the total number of documents and ni is the number of documents that contains the term i.

iii n

Nfw log

Page 12: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Word Frequency Representation of a Word Frequency Representation of a DocumentDocument

Term id Term Frequency

21 gover 3

17 annouc 1

98 take 3

34 student 4

… … …

Page 13: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Feature SelectionFeature Selection

Expected mutual information measure is given as

where Wi is a feature and Cj denotes the fact that the document is relevant to topic j.

Mutual information measures the information contained in the term Wi about topic j.

A document is represented as follows:

)()(

),(log),(

)()(

),(log),(),(

~

~~

1,0

ji

jiji

b ji

jijiji

CPbWP

CbWPCbWP

CPbWP

CbWPCbWPCWI

).,...,(1 pjjj TTT

Page 14: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 15: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Discretization SchemeDiscretization Scheme

The goal of discretization is to find a mapping m such that the feature value is represented by a discrete value.

The mapping is characterized by a series of threshold levels (0, w1, …, wk) where 0 < w1 < w2 < … < wk.

The mapping m has the following property:

where q is the feature value.

. if ,

if ,

0 if ,0

)( 1

qwk

wqwi

q

qm

k

ii

Page 16: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Predefine Level DiscretizationPredefine Level Discretization

One determine the discretization level k and the threshold values. Ex) Integers between 0 and 15 are discretized into three levels by t

he threshold values 5.5 and 10.5.

Page 17: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Lloyd’s AlgorithmLloyd’s Algorithm

Consider the distribution of feature values. Step 1: determine the discretization level k. Step 2: select the initial threshold levels (y1, y2, …, yk - 1). Step 3: repeat the following steps for all i.

Calculate the mean feature value i of ith region. Generate all possible threshold levels between i and i+1. Select the threshold level which minimizes the following distortio

n measure.

Step 4: If the distortion measure of this new set of threshold levels is less than that of the old set, then go to Step 3.

j

iji qd 2)(

Page 18: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Relevance Dependence Discretization (1/3)Relevance Dependence Discretization (1/3)

Consider the dependency between the feature and the relevance of the topic.

The relevance information entropy is given as

where S is the group of feature values.

),(log),(

),(log),()(Ent~~

SCPSCP

SCPSCPS

jj

jj

Page 19: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Relevance Dependence Discretization (2/3)Relevance Dependence Discretization (2/3)

The partition entropy of the region induced by w is defined as

where S1 is the subset of S with feature values smaller than w and S2 is S – S1.

The more homogeneous of the region, the smaller is the partition entropy.

The partition entropy controls the recursive partition algorithm.

)(Ent||

||)(Ent

||

||);( 2

21

1 SS

SS

S

SSwE

Page 20: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Relevance Dependence Discretization (3/3)Relevance Dependence Discretization (3/3)

A criterion for recursive partition algorithm is as follows:

where (m; S) is defined as

where k number of relevance classes in the partition S; k1 number of relevance classes in the partition S1;

k2 number of relevance classes in the partition S2.

||

);(

||

)1|(|log);( 2

S

Sm

S

SSmGain

)](Ent)(Ent)(Ent[)23(log);( 22112 SkSkSkSm k

Page 21: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 22: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Bayesian Inference for Document Bayesian Inference for Document ClassificationClassification

The probability of Cj given the document by Bayes’ Theorem is as follows:

.),...,(

)()|,...,()|(

1

1

p

p

jj

jjjj

jj TTP

CPCTTPTCP

Page 23: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Background of Bayesian NetworksBackground of Bayesian Networks

The process of inference is to use the evidence of some of the nodes that have observations to find the probability of some of the other nodes in the network.

T1

C

T2T4

T5

T3).,,,,,( 54321 TTTTTCP

Page 24: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Learning Bayesian NetworksLearning Bayesian Networks

Parametric learning The conditional probability for each node is estimated from the

training data.

Structural learning Best-first search MDL score

A classification-based network simplifies the structural learning process.

Page 25: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

MDL Score for Bayesian NetworksMDL Score for Bayesian Networks

The MDL (Minimum Description Length) score for a Bayesian network B is defined as

where X is a node in the network.

The score for each node is calculated as

X

Xtotal XLBL ),()(

).,(),(),(ijiijiiji TjdataTjnetworkTjtotal TLTLTL

Page 26: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Complexity of the Network StructureComplexity of the Network Structure

Lnetwork is the network description length and corresponds to the topological complexity of a network and computed as follows:

where N is the number of training documents, sj is the number of possible states the variable Tji

can take.

ijT

ijij

jiTjnetwork ssN

TL )1(2

log),( 2

Page 27: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Accuracy of the Network StructureAccuracy of the Network Structure

The data description length is given by the following formula:

where M() is the number of cases that match a particular instantiation in the training data.

The more accurate the network, the shorter is this length.

),(

)(log),(),( 2

,ij

ij

ij

ijTij

iij

Tij

T

TT

jTijdata TM

MTMTL

Page 28: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

ContentsContents

Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work

Page 29: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

The Process of Information Filtering based The Process of Information Filtering based on Bayesian Network Learningon Bayesian Network Learning

Gather the training documents. For all training documents, determine the relevance to eac

h topic. Feature selection for each topic.

5 and 10 features were used in the experiments.

Discretization of the feature values. Learn a Bayesian network for each topic.

Set the probability threshold value for the relevance decision.

Each Bayesian network corresponds to the filtering profile.

Page 30: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Document CollectionsDocument Collections

Reuters 21 578 29 topics. In chronological order, first 7 000 documents were chosen as the tr

aining set and the other 14 578 documents were used as test set.

FBIS (Foreign Broadcast Information Service) 38 topics used in TREC (Text REtrieval Conferences). In chronological order, 60 000 documents were chosen as the train

ing set and the other 70 471 documents were used as test set.

Page 31: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Evaluation Metrics for Information Evaluation Metrics for Information RetrievalRetrieval

True

Relevant Non-relevant

Algorithm Relevant n1 n2

Non-relevant n3 n4

.32

231

Utility

)1(

)/( )(precision

)/( (recall)

321

21

4321

2

2

211

311

nnnF

nnF

DnCnBnAn

RS

SRF

nnnS

nnnR

Page 32: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Filtering Performance of the Bayesian Filtering Performance of the Bayesian Network on the Reuters CollectionNetwork on the Reuters Collection

Page 33: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach

Page 34: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach

Page 35: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Filtering Performance of the Bayesian Filtering Performance of the Bayesian Network on the FBIS CollectionNetwork on the FBIS Collection

Page 36: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach

Page 37: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach

Page 38: Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Conclusions and Future WorkConclusions and Future Work

Discretization methods. Structural learning.

Large data

Better performance over naïve Bayesian approach.


Recommended