+ All Categories
Home > Documents > Predictive Analysis of Text

Predictive Analysis of Text

Date post: 13-Jan-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
79
Predictive Analysis of Text: Concepts, Features, and Instances Jaime Arguello [email protected] August 31, 2016 Wednesday, August 31, 16
Transcript
Page 1: Predictive Analysis of Text

Predictive Analysis of Text:Concepts, Features, and Instances

Jaime [email protected]

August 31, 2016

Wednesday, August 31, 16

Page 2: Predictive Analysis of Text

2

• Objective: developing and evaluating computer programs that automatically detect a particular concept in natural language text

Predictive Analysis of Text

Wednesday, August 31, 16

Page 3: Predictive Analysis of Text

3

Predictive Analysisbasic ingredients

1. Training data: a set of positive and negative examples of the concept we want to automatically recognize

2. Representation: a set of features that we believe are useful in recognizing the desired concept

3. Learning algorithm: a computer program that uses the training data to learn a predictive model of the concept

Wednesday, August 31, 16

Page 4: Predictive Analysis of Text

4

Predictive Analysisbasic ingredients

4. Model: a function that describes a predictive relationship between feature values and the presence/absence of the concept

5. Test data: a set of previously unseen examples used to estimate the model’s effectiveness

6. Performance metrics: a set of statistics used to measure the predictive effectiveness of the model

Wednesday, August 31, 16

Page 5: Predictive Analysis of Text

5

Predictive Analysistraining and testing

machine learning

algorithmmodel

labeled examples

new, unlabeled examples

model

predictions

training

testing

Wednesday, August 31, 16

Page 6: Predictive Analysis of Text

6

color size #  slides equal  sides ... label

red big 3 no ... yes

green big 3 yes ... yes

blue small inf yes ... no

blue small 4 yes ... no

........

........

........

red big 3 yes ... yes

Predictive Analysisconcept, instances, and features

conceptfeatures

inst

ance

s

Wednesday, August 31, 16

Page 7: Predictive Analysis of Text

7

Predictive Analysistraining and testing

machine learning

algorithmmodel

labeled examples

new, unlabeled examples

model

predictions

training

testing

color size sides equal  sides ... label

red big 3 no ... yes

green big 3 yes ... yes

blue small inf yes ... no

blue small 4 yes ... no

........

........

........

red big 3 yes ... yes

color size sides equal  sides ... label

red big 3 no ... ???

green big 3 yes ... ???

blue small inf yes ... ???

blue small 4 yes ... ???

........

........

.... ???

red big 3 yes ... ???

color size sides equal  sides ... label

red big 3 no ... yes

green big 3 yes ... yes

blue small inf yes ... no

blue small 4 yes ... no

........

........

........

red big 3 yes ... yes

Wednesday, August 31, 16

Page 8: Predictive Analysis of Text

8

• Is a particular concept appropriate for predictive analysis?

• What should the unit of analysis be?

• How should I divide the data into training and test sets?

• What is a good feature representation for this task?

• What type of learning algorithm should I use?

• How should I evaluate my model’s performance?

Predictive Analysisquestions

Wednesday, August 31, 16

Page 9: Predictive Analysis of Text

9

• Learning algorithms can recognize some concepts better than others

• What are some properties of concepts that are easier to recognize?

Predictive Analysisconcepts

Wednesday, August 31, 16

Page 10: Predictive Analysis of Text

10

• Option 1: can a human recognize the concept?

Predictive Analysisconcepts

Wednesday, August 31, 16

Page 11: Predictive Analysis of Text

11

• Option 1: can a human recognize the concept?

• Option 2: can two or more humans recognize the concept independently and do they agree?

Predictive Analysisconcepts

Wednesday, August 31, 16

Page 12: Predictive Analysis of Text

12

• Option 1: can a human recognize the concept?

• Option 2: can two or more humans recognize the concept independently and do they agree?

• Option 2 is better.

• In fact, models are sometimes evaluated as an independent assessor

• How does the model’s performance compare to the performance of one assessor with respect to another?

‣ One assessor produces the “ground truth” and the other produces the “predictions”

Predictive Analysisconcepts

Wednesday, August 31, 16

Page 13: Predictive Analysis of Text

13

Predictive Analysismeasures agreement: percent agreement

yes no

yes A B

no C D

• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur

(?  +  ?)

(?  +  ?  +  ?  +  ?)

Wednesday, August 31, 16

Page 14: Predictive Analysis of Text

14

yes no

yes A B

no C D

• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur

(A  +  D)

(A  +  B  +  C  +  D)

Predictive Analysismeasures agreement: percent agreement

Wednesday, August 31, 16

Page 15: Predictive Analysis of Text

15

• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur

Predictive Analysismeasures agreement: percent agreement

yes no

yes 5 5 10

no 15 75 90

20 80

%  agreement  =  ???

Wednesday, August 31, 16

Page 16: Predictive Analysis of Text

16

• Percent agreement: percentage of instances for which both assessors agree that the concept occurs or does not occur

Predictive Analysismeasures agreement: percent agreement

yes no

yes 5 5 10

no 15 75 90

20 80

%  agreement  =  (5  +  75)  /  100  =  80%

Wednesday, August 31, 16

Page 17: Predictive Analysis of Text

17

• Problem: percent agreement does not account for agreement due to random chance.

• How can we compute the expected agreement due to random chance?

• Option 1: assume unbiased assessors

• Option 2: assume biased assessors

Predictive Analysismeasures agreement: percent agreement

Wednesday, August 31, 16

Page 18: Predictive Analysis of Text

18

• Option 1: unbiased assessors

Predictive Analysiskappa agreement: chance-corrected % agreement

yes no

yes ?? ?? 50

no ?? ?? 50

50 50

Wednesday, August 31, 16

Page 19: Predictive Analysis of Text

19

• Option 1: unbiased assessors

Predictive Analysiskappa agreement: chance-corrected % agreement

yes no

yes 25 25 50

no 25 25 50

50 50

Wednesday, August 31, 16

Page 20: Predictive Analysis of Text

20

• Option 1: unbiased assessors

Predictive Analysiskappa agreement: chance-corrected % agreement

yes no

yes 25 25 50

no 25 25 50

50 50

random  chance  %  agreement  =  ???

Wednesday, August 31, 16

Page 21: Predictive Analysis of Text

21

• Option 1: unbiased assessors

Predictive Analysiskappa agreement: chance-corrected % agreement

yes no

yes 25 25 50

no 25 25 50

50 50

random  chance  %  agreement  =  (25  +  25)/100  =  50%

Wednesday, August 31, 16

Page 22: Predictive Analysis of Text

22

Predictive Analysiskappa agreement: chance-corrected % agreement

• Kappa agreement: percent agreement after correcting for the expected agreement due to random chance

K =P(a)� P(e)

1 � P(e)

• P(a)  =  percent  of  observed  agreement

• P(e)  =  percent  of  agreement  due  to  random  chance

Wednesday, August 31, 16

Page 23: Predictive Analysis of Text

23

yes noyes 5 5 10no 15 75 90

20 80

Predictive Analysiskappa agreement: chance-corrected % agreement

• Kappa agreement: percent agreement after correcting for the expected agreement due to unbiased chance

P(a) = 5+75100 = 0.80

yes noyes 25 25 50no 25 25 50

50 50

P(e) = 25+25100 = 0.50

K = P(a)�P(e)1�P(e) = 0.80�0.50

1�0.50 = 0.60

Wednesday, August 31, 16

Page 24: Predictive Analysis of Text

24

• Option 2: biased assessors

Predictive Analysiskappa agreement: chance-corrected % agreement

yes no

yes 5 5 10

no 15 75 90

20 80

biased  chance  %  agreement  =  ???

Wednesday, August 31, 16

Page 25: Predictive Analysis of Text

25

yes noyes 5 5 10no 15 75 90

20 80

Predictive Analysiskappa agreement: chance-corrected % agreement

• Kappa agreement: percent agreement after correcting for the expected agreement due to biased chance

P(a) = 5+75100 = 0.80

K = P(a)�P(e)1�P(e) = 0.80�0.74

1�0.74 = 0.23

P(e) =⇣

10100 ⇥ 20

100

⌘+

⇣90

100 ⇥ 80100

⌘= 0.74

Wednesday, August 31, 16

Page 26: Predictive Analysis of Text

26

• INPUT:  unlabeled  data,  annotators,  coding  manual

• OUTPUT:  labeled  data

1. using  the  latest  coding  manual,  have  all  annotators  label  some  previously  unseen  porRon  of  the  data  (~10%)

2. measure  inter-­‐annotator  agreement  (Kappa)

3. IF  agreement  <  X,  THEN:

‣ refine  coding  manual  using  disagreements  to  resolve  inconsistencies  and  clarify  definiRons

‣ return  to  1

• ELSE

‣ have  annotators  label  the  remainder  of  the  data  independently  and  EXIT

Predictive Analysisdata annotation process

Wednesday, August 31, 16

Page 27: Predictive Analysis of Text

27

• What is good (Kappa) agreement?

• It depends on who you ask

• According to Landis and Koch, 1977:

‣ 0.81 - 1.00: almost perfect

‣ 0.61 - 0.70: substantial

‣ 0.41 - 0.60: moderate

‣ 0.21 - 0.40: fair

‣ 0.00 - 0.20: slight

‣ < 0.00: no agreement

Predictive Analysisdata annotation process

Wednesday, August 31, 16

Page 28: Predictive Analysis of Text

28

• Question: requests information about the course content

• Answer: contributes information in response to a question

• Issue: expresses a problem with the course management

• Issue Resolution: attempts to resolve a previously raised issue

• Positive Ack: positive sentiment about a previous post

• Negative Ack: negative sentiment about a previous post

• Other: serves a different purpose

Predictive Analysisdata annotation process

Wednesday, August 31, 16

Page 29: Predictive Analysis of Text

29

except that we included a few additional tips. For the ques-tion category, we indicated that questions can be in the formof a statement (e.g., “I need help with HW Question 3.”).Furthermore, to help distinguish questions from issues, weexplained that asking questions is part of a student’s learningprocess and are not necessarily bad from an instructor’s per-spective. For the answer category, we indicated that answersmay not completely resolve a question, but should provideinformation that is useful in some way. We also indicatedthat mere feedback about a previous question (e.g., “I havethe same question!”) should be labeled as positive or nega-tive acknowledgment. For the issue category, we added thatissues may require corrective action by the course staff andare likely to be considered bad from an instructor’s perspec-tive. Issues may refer to glitches in the course materials orlogistics. For the issue resolution category, we added that is-sue resolutions may simply indicate that the course staff isaware of the problem and working on a solution. An issueresolution may not completely fix the problem. For the pos-itive acknowledgment category, we added that positive sen-timents may include agreement, encouragement, and sup-port. Finally, for the negative acknowledgment category, weadded that negative sentiments may include disagreement,confusion, and frustration.

Snow et al. (2008) evaluated the quality of crowdsourcedlabels across several computational linguistics tasks. Re-sults found that combining as few as four redundant crowd-sourced labels using a majority vote can produce labels com-parable to an expert’s. In a similar fashion, we collectedfive redundant annotations per post and combined theminto gold-standard labels using a majority vote. While postscould be associated with multiple speech act categories, wedecided to treat each speech act category independently. Inthis respect, a post was considered a gold-standard positive

example for a particular speech act if at least 3/5 MTurkworkers selected that speech act and was considered a neg-

ative example otherwise. In total, we collected 14,815 an-notations (2,963 posts ⇥ 5 redundant HITs per post), andworkers were compensated with $0.10 USD per HIT.

Our HITs were implemented as external HITs, meaningthat everything besides recruitment and compensation wasmanaged by our own server. Using an external HIT designallowed us to control the assignment of posts to workers,preventing workers from seeing the same post more thanonce, and to detect and filter careless workers dynamically.MTurk annotation tasks require quality control, and we ad-dressed this in four ways. First, we restricted our HITs toworkers with a 95% acceptance rate or greater. Second, tohelp ensure English language proficiency, we restricted ourHITs to workers in the U.S. Third, workers were exposedto several HITs for which an expert assessor (one of theauthors) thought that the correct speech act was fairly ob-vious. Workers who disagreed with the expert on three ofthese HITs were automatically prevented from completingmore HITs. Finally, in order to avoid having a few workersdo most of our HITs, workers were not allowed to completemore than 165 HITs (about 1% of the total). Ultimately, wecollected annotations from 360 unique workers.

In our first research question (RQ1), we investigatewhether crowdsourced workers can reliably label our speechacts in MOOC forum posts. To answer this question, wemeasured the level of inter-annotator agreement between theMTurk majority vote and an expert assessor. To this end, anexpert assessor (one of the authors) labeled a random sam-ple of 1,000 posts (about a third of the full dataset) withrespect to each speech act category. Then, for each speechact, we measured the Cohen’s Kappa agreement betweenthe MTurk majority vote and the expert. Cohen’s Kappa(c) measures the chance-corrected agreement between twoannotators on the same set of data. Furthermore, in orderto make a full comparison, we also measured the Fleiss’Kappa agreement between MTurk workers across all posts.Fleiss’ Kappa (f) measures the chance-corrected agreementbetween any pair of assessors and is therefore appropriatefor measuring agreement between MTurk workers who werefree to annotate any number of posts (up to a max of 165).

Agreement numbers are provided in Table 2. Two trendsare worth noting. First, across all speech acts, the level ofagreement between MTurk workers was lower than the levelof agreement between the MTurk majority vote and the ex-pert. This result is consistent with previous work (Snow etal. 2008) and suggests that combining redundant crowd-sourced labels improves label quality. Second, agreementbetween the MTurk majority vote and the expert variedacross speech acts. Agreement was “almost perfect” forquestions (c > 0.80), close to “almost perfect” for an-swers (c ⇡ 0.80), and “substantial” for the other speechacts (0.80 � c > 0.60) (Landis and Koch 1977). Over-all, we view these results as encouraging, but with room forimprovement.

The speech acts with the lowest agreement were issue res-olution, negative acknowledgment, and other. As describedin more detail below, issue resolutions and negative ac-knowledgments were fairly infrequent. Assessors may needfurther instructions and examples to reliably recognize thesespeech acts. The other category occurred more frequently,but was still associated with lower agreement. After examin-ing the data, we found posts where MTurk workers were di-vided between other and positive acknowledgment. In manyof these posts, the author’s overall sentiment was positive(e.g., “Hi, I’m **** from ****. Nice to meet you all!”), butthe post did not directly reference a previous post. Futurework may need to provide further instructions to help distin-guish between positive acknowledgment and other.

MTurk Workers MV and Expertf c

Question 0.569 0.893Answer 0.414 0.790

Issue 0.421 0.669Issue Resolution 0.286 0.635

Positive Ack. 0.423 0.768Negative Ack. 0.232 0.633

Other 0.337 0.625

Table 2: Agreement between MTurk workers (f) and be-tween the MTurk majority vote (MV) and the expert (c).

Predictive Analysisdata annotation process

Wednesday, August 31, 16

Page 30: Predictive Analysis of Text

30

• Is a particular concept appropriate for predictive analysis?

• What should the unit of analysis be?

• What is a good feature representation for this task?

• How should I divide the data into training and test sets?

• What type of learning algorithm should I use?

• How should I evaluate my model’s performance?

Predictive Analysisquestions

Wednesday, August 31, 16

Page 31: Predictive Analysis of Text

31

• For many text-mining applications, turning the data into instances for training and testing is fairly straightforward

• Easy case: instances are self-contained, independent units of analysis

‣ text classification: instances = documents

‣ opinion mining: instances = product reviews

‣ bias detection: instances = political blog posts

‣ emotion detection: instances = support group posts

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 32: Predictive Analysis of Text

32

w_1 w_2 w_3 ... w_n label

1 1 0 ... 0 health

0 0 0 ... 0 other

0 0 0 ... 0 other

0 1 0 ... 1 other

........

.... ... 0 ....

1 0 0 ... 1 health

conceptfeatures

inst

ance

sText Classification

predicting health-related documents

Wednesday, August 31, 16

Page 33: Predictive Analysis of Text

33

w_1 w_2 w_3 ... w_n label

1 1 0 ... 0 posiRve

0 0 0 ... 0 negaRve

0 0 0 ... 0 negaRve

0 1 0 ... 1 negaRve

........

.... ... 0 ....

1 0 0 ... 1 posiRve

conceptfeatures

inst

ance

sOpinion Mining

predicting positive/negative movie reviews

Wednesday, August 31, 16

Page 34: Predictive Analysis of Text

34

w_1 w_2 w_3 ... w_n label

1 1 0 ... 0 liberal

0 0 0 ... 0 conservaRve

0 0 0 ... 0 conservaRve

0 1 0 ... 1 conservaRve

........

.... ... 0 ....

1 0 0 ... 1 liberal

conceptfeatures

inst

ance

sBias Detection

predicting liberal/conservative blog posts

Wednesday, August 31, 16

Page 35: Predictive Analysis of Text

35

• A not-so-easy case: relational data

• The concept to be learned is a relation between pairs of objects

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 36: Predictive Analysis of Text

36

Predictive Analysisexample of relational data: Brother(X,Y)

(example  borrowed  and  modified  from  Wi^en  et  al.  textbook)

Wednesday, August 31, 16

Page 37: Predictive Analysis of Text

37

name_1 gender_1 mother_1 father_1 name_2 gender_2 mother_2 father_2 brother

steven male peggy peter graham male peggy peter yes

Ian male grace ray brian male grace ray yes

anna female pam ian nikki female pam ian no

pippa female grace ray brian male grace ray no

steven male peggy peter brian male grace ray no

........

........

........

........

....

anna female pam ian brian male grace ray no

conceptfeatures

inst

ance

sPredictive Analysis

example of relational data: Brother(X,Y)

Wednesday, August 31, 16

Page 38: Predictive Analysis of Text

38

• A not-so-easy case: relational data

• Each instance should correspond to an object pair (which may or may not share the relation of interest)

• May require features that characterize properties of the pair

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 39: Predictive Analysis of Text

39

conceptfeatures

inst

ance

sPredictive Analysis

example of relational data: Brother(X,Y)

(can we think of a better feature representation?)

name_1 gender_1 mother_1 father_1 name_2 gender_2 mother_2 father_2 brother

steven male peggy peter graham male peggy peter yes

Ian male grace ray brian male grace ray yes

anna female pam ian nikki female pam ian no

pippa female grace ray brian male grace ray no

steven male peggy peter brian male grace ray no

........

........

........

........

....

anna female pam ian brian male grace ray no

Wednesday, August 31, 16

Page 40: Predictive Analysis of Text

40

gender_1 gender_2 same  parents brother

male male yes yes

male male yes yes

female female no no

female male yes no

male male no no

........

........

female male no no

conceptfeatures

inst

ance

sPredictive Analysis

example of relational data: Brother(X,Y)

Wednesday, August 31, 16

Page 41: Predictive Analysis of Text

41

• A not-so-easy case: relational data

• There is still an issue that we’re not capturing! Any ideas?

• Hint: In this case, should the predicted labels really be independent?

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 42: Predictive Analysis of Text

42

Brother(A,B)  =  yes

Brother(B,C)  =  yes

Brother(A,C)  =  no

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 43: Predictive Analysis of Text

43

• In this case, what we would really want is:

‣ a method that does joint prediction on the test set

‣ a method whose joint predictions satisfy a set of known properties about the data as a whole (e.g., transitivity)

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 44: Predictive Analysis of Text

44

• There are learning algorithms that incorporate relational constraints between predictions

• However, they are beyond the scope of this class

• We’ll be covering algorithms that make independent predictions on instances

• That said, many algorithms output prediction confidence values

• Heuristics can be used to disfavor inconsistencies

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 45: Predictive Analysis of Text

45

• Examples of relational data in text-mining:

‣ information extraction: predicting that a word-sequence belongs to a particular class (e.g., person, location)

‣ topic segmentation: segmenting discourse into topically coherent chunks

Predictive Analysisturning data into (training and test) instances

Wednesday, August 31, 16

Page 46: Predictive Analysis of Text

46

Predictive Analysistopic segmentation example

A

A

A

A

A

A

A

B

B

B

B

B

B

Wednesday, August 31, 16

Page 47: Predictive Analysis of Text

47

Predictive Analysistopic segmentation example: instances

A

A

A

A

A

A

A

B

B

B

B

B

B

Wednesday, August 31, 16

Page 48: Predictive Analysis of Text

48

Predictive Analysistopic segmentation example: independent instances?

A

A

A

A

A

A

A

B

B

B

B

B

B

splitsplitsplitsplit

Wednesday, August 31, 16

Page 49: Predictive Analysis of Text

49

Predictive Analysistopic segmentation example: independent instances?

A

A

A

A

A

A

A

B

B

B

B

B

B

split

split

split

split

Wednesday, August 31, 16

Page 50: Predictive Analysis of Text

50

• Is a particular concept appropriate for predictive analysis?

• What should the unit of analysis be?

• How should I divide the data into training and test sets?

• What is a good feature representation for this task?

• What type of learning algorithm should I use?

• How should I evaluate my model’s performance?

Predictive Analysisquestions

Wednesday, August 31, 16

Page 51: Predictive Analysis of Text

51

• We want our model to “learn” to recognize a concept

• So, what does it mean to learn?

Predictive Analysistraining and test data

Wednesday, August 31, 16

Page 52: Predictive Analysis of Text

52

• The machine learning definition of learning:

A machine learns with respect to a particular task T, performance metric P, and experience E, if the system improves its performance P at task T following experience E. -- Tom Mitchell

Predictive Analysistraining and test data

Wednesday, August 31, 16

Page 53: Predictive Analysis of Text

53

• We want our model to improve its generalization performance!

• That is, its performance on previously unseen data!

• Generalize: to derive or induce a general conception or principle from particulars. -- Merriam-Webster

• In order to test generalization performance, the training and test data cannot be the same.

• Why?

Predictive Analysistraining and test data

Wednesday, August 31, 16

Page 54: Predictive Analysis of Text

54

Training data + Representationwhat could possibly go wrong?

Wednesday, August 31, 16

Page 55: Predictive Analysis of Text

55

• While we don’t want to test on training data, models usually perform the best when the training and test set are derived from the same “probability distribution”.

• What does that mean?

Predictive Analysistraining and test data

Wednesday, August 31, 16

Page 56: Predictive Analysis of Text

56

Predictive Analysistraining and test data

Data

positive instancesnegative instances

Test DataTraining Data

? ?

Wednesday, August 31, 16

Page 57: Predictive Analysis of Text

57

Predictive Analysistraining and test data

Data

positive instancesnegative instances

Test DataTraining Data

• Is this a good partitioning? Why or why not?

Wednesday, August 31, 16

Page 58: Predictive Analysis of Text

58

Predictive Analysistraining and test data

Data

positive instancesnegative instances

Test DataTraining Data

RandomSample

RandomSample

Wednesday, August 31, 16

Page 59: Predictive Analysis of Text

59

Predictive Analysistraining and test data

Data

positive instancesnegative instances

Test DataTraining Data

• On average, random sampling should produce comparable data for training and testing

Wednesday, August 31, 16

Page 60: Predictive Analysis of Text

60

• Models usually perform the best when the training and test set have:

‣ a similar proportion of positive and negative examples

‣ a similar co-occurrence of feature-values and each target class value

Predictive Analysistraining and test data

Wednesday, August 31, 16

Page 61: Predictive Analysis of Text

61

Predictive Analysistraining and test data

• Caution: in some situations, partitioning the data randomly might inflate performance in an unrealistic way!

• How the data is split into training and test sets determines what we can claim about generalization performance

• The appropriate split between training and test sets is usually determined on a case-by-case basis

Wednesday, August 31, 16

Page 62: Predictive Analysis of Text

62

• Spam detection: should the training and test sets contain email messages from the same sender, same recipient, and/or same timeframe?

• Topic segmentation: should the training and test sets contain potential boundaries from the same discourse?

• Opinion mining for movie reviews: should the training and test sets contain reviews for the same movie?

• Sentiment analysis: should the training and test sets contain blog posts from the same discussion thread?

Predictive Analysisdiscussion

Wednesday, August 31, 16

Page 63: Predictive Analysis of Text

63

• Is a particular concept appropriate for predictive analysis?

• What should the unit of analysis be?

• How should I divide the data into training and test sets?

• What type of learning algorithm should I use?

• What is a good feature representation for this task?

• How should I evaluate my model’s performance?

Predictive Analysisquestions

Wednesday, August 31, 16

Page 64: Predictive Analysis of Text

64

• Linear classifiers

• Decision tree classifiers

• Instance-based classifiers

Predictive Analysisthree types of classifiers

Wednesday, August 31, 16

Page 65: Predictive Analysis of Text

65

• All types of classifiers learn to make predictions based on the input feature values

• However, different types of classifiers combine the input feature values in different ways

• Chapter 3 in the book refers to a trained model as knowledge representation

Predictive Analysisthree types of classifiers

Wednesday, August 31, 16

Page 66: Predictive Analysis of Text

66

Predictive Analysislinear classifiers: perceptron algorithm

y =

⇢1 if w

0

+ Ân

j=1

w

j

x

j

> 0

0 otherwise

Wednesday, August 31, 16

Page 67: Predictive Analysis of Text

67

Predictive Analysislinear classifiers: perceptron algorithm

y =

⇢1 if w

0

+ Ân

j=1

w

j

x

j

> 0

0 otherwise

parameters learned by the modelpredicted value (e.g., 1 = positive, 0 = negative)

Wednesday, August 31, 16

Page 68: Predictive Analysis of Text

68

f_1 f_2 f_3

0.5 1.0 0.2

model weightstest instancew_0 w_1 w_2 w_3

2.0 -­‐5.0 2.0 1.0

output  =  2.0  +  (0.50  x  -­‐5.0)  +  (1.0  x  2.0)  +  (0.2  x  1.0)

output  =  1.7

output  predicRon  =  posiRve

Predictive Analysislinear classifiers: perceptron algorithm

Wednesday, August 31, 16

Page 69: Predictive Analysis of Text

69

Predictive Analysislinear classifiers: perceptron algorithm

(two-­‐feature  example  borrowed  from  Wi^en  et  al.  textbook)

Wednesday, August 31, 16

Page 70: Predictive Analysis of Text

70

Predictive Analysislinear classifiers: perceptron algorithm

(source:  h^p://en.wikipedia.org/wiki/File:Svm_separaRng_hyperplanes.png)

Wednesday, August 31, 16

Page 71: Predictive Analysis of Text

71

Predictive Analysislinear classifiers: perceptron algorithm

x2

x1• Would a linear classifier do well on positive (black) and

negative (white) data that looks like this?

0.5 1.0

0.5

1.0

Wednesday, August 31, 16

Page 72: Predictive Analysis of Text

72

• Linear classifiers

• Decision tree classifiers

• Instance-based classifiers

Predictive Analysisthree types of classifiers

Wednesday, August 31, 16

Page 73: Predictive Analysis of Text

73

Predictive Analysisexample of decision tree classifier: Brother(X,Y)

same  parents

gender_1

gender_2

no

noyes

male female

male female

no

noyes

Wednesday, August 31, 16

Page 74: Predictive Analysis of Text

74

Predictive Analysisdecision tree classifiers

x2

x1• Draw a decision tree that would perform perfectly on

this training data!

0.5 1.0

0.5

1.0

Wednesday, August 31, 16

Page 75: Predictive Analysis of Text

75

• Linear classifiers

• Decision tree classifiers

• Instance-based classifiers

Predictive Analysisthree types of classifiers

Wednesday, August 31, 16

Page 76: Predictive Analysis of Text

76

Predictive Analysisinstance-based classifiers

x2

x1• predict the class associated with the most similar

training examples

0.5 1.0

0.5

1.0

?

Wednesday, August 31, 16

Page 77: Predictive Analysis of Text

77

Predictive Analysisinstance-based classifiers

x2

x1• predict the class associated with the most similar

training examples

0.5 1.0

0.5

1.0

?

Wednesday, August 31, 16

Page 78: Predictive Analysis of Text

78

• Assumption: instances with similar feature values should have a similar label

• Given a test instance, predict the label associated with its nearest neighbors

• There are many different similarity metrics for computing distance between training/test instances

• There are many ways of combining labels from multiple training instances

Predictive Analysisinstance-based classifiers

Wednesday, August 31, 16

Page 79: Predictive Analysis of Text

79

• Is a particular concept appropriate for predictive analysis?

• What should the unit of analysis be?

• How should I divide the data into training and test sets?

• What is a good feature representation for this task?

• What type of learning algorithm should I use?

• How should I evaluate my model’s performance?

Predictive Analysisquestions

Wednesday, August 31, 16


Recommended