Data MiningData Mining
CS 341, Spring 2007CS 341, Spring 2007
Lecture 4: Data Mining Techniques (I)Lecture 4: Data Mining Techniques (I)
© Prentice Hall 2
Review:Review:
Information RetrievalInformation Retrieval– Similarity measuresSimilarity measures– Evaluation Metrics : Precision and RecallEvaluation Metrics : Precision and Recall
Question AnsweringQuestion Answering Web Search EngineWeb Search Engine
– An application of IRAn application of IR– Related to web miningRelated to web mining
© Prentice Hall 3
Data Mining Techniques OutlineData Mining Techniques Outline
StatisticalStatistical– Point EstimationPoint Estimation– Models Based on SummarizationModels Based on Summarization– Bayes TheoremBayes Theorem– Hypothesis TestingHypothesis Testing– Regression and CorrelationRegression and Correlation
Similarity MeasuresSimilarity Measures
Goal:Goal: Provide an overview of basic data Provide an overview of basic data mining techniquesmining techniques
© Prentice Hall 4
Point EstimationPoint Estimation Point Estimate:Point Estimate: estimate a population estimate a population
parameter.parameter. May be made by calculating the parameter for a May be made by calculating the parameter for a
sample.sample. May be used to predict value for missing data.May be used to predict value for missing data. Ex: Ex:
– R contains 100 employeesR contains 100 employees– 99 have salary information99 have salary information– Mean salary of these is $50,000Mean salary of these is $50,000– Use $50,000 as value of remaining employee’s Use $50,000 as value of remaining employee’s
salary. salary. Is this a good idea?Is this a good idea?
© Prentice Hall 5
Estimation ErrorEstimation Error
Bias: Bias: Difference between expected value and Difference between expected value and actual value.actual value.
Mean Squared Error (MSE):Mean Squared Error (MSE): expected value expected value of the squared difference between the of the squared difference between the estimate and the actual value:estimate and the actual value:
Root Mean Square Error (RMSE)Root Mean Square Error (RMSE)
© Prentice Hall 6
Jackknife EstimateJackknife Estimate Jackknife Estimate:Jackknife Estimate: estimate of estimate of
parameter is obtained by omitting one parameter is obtained by omitting one value from the set of observed values.value from the set of observed values.
Named to describe a “handy and useful Named to describe a “handy and useful tool”tool”
Used to reduce biasUsed to reduce bias PropertyProperty: The Jackknife estimator : The Jackknife estimator
lowers the bias from the order of 1/n to lowers the bias from the order of 1/n to 1/n1/n22
© Prentice Hall 7
Jackknife EstimateJackknife Estimate DefinitionDefinition: :
– Divide the sample size n into g groups of Divide the sample size n into g groups of size m each, so n=mg. (often m=1 and size m each, so n=mg. (often m=1 and g=n)g=n)
– estimate estimate jj by ignoring the jth group. by ignoring the jth group.
_ is the average of _ is the average of jj . .
– The Jackknife estimator is The Jackknife estimator is QQ = g = g– (g-1)– (g-1)_. _.
Where Where is an estimator for the parameter theta. is an estimator for the parameter theta.
© Prentice Hall 8
Jackknife Estimator: Example 1Jackknife Estimator: Example 1 Estimate of mean for X={xEstimate of mean for X={x1, x, x2, x, x3,}, n =3, g=3, ,}, n =3, g=3,
m=1, m=1, = = = ( = (xx1+ x+ x2+ x+ x3)/3
11 = ( = (xx2 + x + x3)/2, )/2, 22 = ( = (xx1 + x + x3)/2, )/2, 11 = ( = (xx1 + x + x2)/2, )/2,
_ = (_ = (11 + + 2 2 + + 22)/3 )/3
Q Q = g= g-(g-1) -(g-1) _= 3_= 3-(3-1) -(3-1) _= (_= (xx1 + x+ x2 + x + x3)/3)/3
In this case, the Jackknife Estimator is the In this case, the Jackknife Estimator is the same as the usual estimator.same as the usual estimator.
© Prentice Hall 9
Jackknife Estimator: Example 2Jackknife Estimator: Example 2 Estimate of variance for X={1, 4, 4}, n =3, g=3, Estimate of variance for X={1, 4, 4}, n =3, g=3,
m=1, m=1, = = 2 2
22 = ((1-3)= ((1-3)22 +(4-3) +(4-3)22 +(4-3) +(4-3)22 )/3 = 2 )/3 = 2 11 = = ((4-4)((4-4)22 + (4-4) + (4-4)22 ) /2 = 0 ) /2 = 0,, 22 = = 2.252.25 , , 33 = = 2.252.25 _ = _ = ((11 + + 2 2 + + 22)/3 = 1.5)/3 = 1.5 Q Q = = gg-(g-1) -(g-1) _= 3_= 3-(3-1) -(3-1) __
=3(2)-2(1.5)=3=3(2)-2(1.5)=3
In this case, the Jackknife Estimator is In this case, the Jackknife Estimator is different from the usual estimator.different from the usual estimator.
© Prentice Hall 10
Jackknife Estimator: Jackknife Estimator: Example 2(cont’d)Example 2(cont’d)
In general, apply the Jackknife technique In general, apply the Jackknife technique to the biased estimator to the biased estimator 22
2 2 = = (x (xii – x ) – x )2 2 / n/ n
then the jackknife estimator is sthen the jackknife estimator is s22
ss2 2 = = (x (xii – x ) – x )2 2 / (n -1)/ (n -1) Which is known to be unbiased for Which is known to be unbiased for 22
© Prentice Hall 11
Maximum Likelihood Maximum Likelihood Estimate (MLE)Estimate (MLE)
Obtain parameter estimates that maximize Obtain parameter estimates that maximize the probability that the sample data occurs for the probability that the sample data occurs for the specific model.the specific model.
Joint probability for observing the sample Joint probability for observing the sample data by multiplying the individual probabilities. data by multiplying the individual probabilities. Likelihood function: Likelihood function:
Maximize L.Maximize L.
© Prentice Hall 12
MLE ExampleMLE Example
Coin toss five times: {H,H,H,H,T}Coin toss five times: {H,H,H,H,T}
Assuming a perfect coin with H and T equally Assuming a perfect coin with H and T equally
likely, the likelihood of this sequence is: likely, the likelihood of this sequence is:
However if the probability of a H is 0.8 then:However if the probability of a H is 0.8 then:
© Prentice Hall 13
MLE Example (cont’d)MLE Example (cont’d) General likelihood formula:General likelihood formula:
Estimate for p is then 4/5 = 0.8Estimate for p is then 4/5 = 0.8
© Prentice Hall 14
Expectation-Maximization Expectation-Maximization (EM)(EM)
Solves estimation with incomplete data.Solves estimation with incomplete data. Obtain initial estimates for parameters.Obtain initial estimates for parameters. Iteratively use estimates for missing Iteratively use estimates for missing
data and continue until convergence.data and continue until convergence.
© Prentice Hall 15
EM ExampleEM Example
© Prentice Hall 16
EM AlgorithmEM Algorithm
© Prentice Hall 17
Models Based on SummarizationModels Based on Summarization
Basic concepts to provide an abstraction Basic concepts to provide an abstraction and summarization of the data as a and summarization of the data as a whole.whole.– Statistical concepts: mean, variance, median, mode, Statistical concepts: mean, variance, median, mode,
etc.etc.
Visualization:Visualization: display the structure of the display the structure of the data graphically.data graphically.– Line graphs, Pie charts, Histograms, Scatter plots, Line graphs, Pie charts, Histograms, Scatter plots,
Hierarchical graphsHierarchical graphs
© Prentice Hall 18
Scatter DiagramScatter Diagram
© Prentice Hall 19
Bayes TheoremBayes Theorem
Posterior Probability:Posterior Probability: P(hP(h1|x|xi)) Prior Probability:Prior Probability: P(h P(h1)) Bayes Theorem:Bayes Theorem:
Assign probabilities of hypotheses given a data Assign probabilities of hypotheses given a data value.value.
© Prentice Hall 20
Bayes Theorem ExampleBayes Theorem Example Credit authorizations (hypotheses): Credit authorizations (hypotheses):
hh11=authorize purchase, h=authorize purchase, h2 = authorize after = authorize after further identification, hfurther identification, h3=do not authorize, =do not authorize, hh4= do not authorize but contact police= do not authorize but contact police
Assign twelve data values for all Assign twelve data values for all combinations of credit and income:combinations of credit and income:
From training data: P(hFrom training data: P(h11) = 60%; P(h) = 60%; P(h22)=20%; )=20%;
P(h P(h33)=10%; P(h)=10%; P(h44)=10%.)=10%.
1 2 3 4 Excellent x1 x2 x3 x4 Good x5 x6 x7 x8 Bad x9 x10 x11 x12
© Prentice Hall 21
Bayes Example(cont’d)Bayes Example(cont’d) Training Data:Training Data:
ID Income Credit Class xi 1 4 Excellent h1 x4 2 3 Good h1 x7 3 2 Excellent h1 x2 4 3 Good h1 x7 5 4 Good h1 x8 6 2 Excellent h1 x2 7 3 Bad h2 x11 8 2 Bad h2 x10 9 3 Bad h3 x11 10 1 Bad h4 x9
© Prentice Hall 22
Bayes Example(cont’d)Bayes Example(cont’d) Calculate P(xCalculate P(xii|h|hjj) and P(x) and P(xii))
Ex: P(xEx: P(x77|h|h11)=2/6; P(x)=2/6; P(x44|h|h11)=1/6; P(x)=1/6; P(x22|h|h11)=2/6; P(x)=2/6; P(x88||
hh11)=1/6; P(x)=1/6; P(xii|h|h11)=0 for all other x)=0 for all other xii.. Predict the class for xPredict the class for x44::
– Calculate P(hCalculate P(hjj|x|x44) for all h) for all hjj. . – Place xPlace x4 4 in class with largest value.in class with largest value.– Ex: Ex:
»P(hP(h11|x|x44)=(P(x)=(P(x44|h|h11)(P(h)(P(h11))/P(x))/P(x44)) =(1/6)(0.6)/0.1=1. =(1/6)(0.6)/0.1=1.
»xx4 4 in class hin class h11..
© Prentice Hall 23
Hypothesis TestingHypothesis Testing
Find model to explain behavior by Find model to explain behavior by creating and then testing a hypothesis creating and then testing a hypothesis about the data.about the data.
Exact opposite of usual DM approach.Exact opposite of usual DM approach. HH0 0 – Null hypothesis; Hypothesis to be – Null hypothesis; Hypothesis to be
tested.tested. HH1 1 – Alternative hypothesis– Alternative hypothesis
© Prentice Hall 24
Chi-Square TestChi-Square Test One technique to perform hypothesis testingOne technique to perform hypothesis testing Used to test the association between two Used to test the association between two
observed variable values and determine if a observed variable values and determine if a set of observed values is statistically different.set of observed values is statistically different.
The chi-squared statistic is defines as:The chi-squared statistic is defines as:
O – observed valueO – observed value E – Expected value based on hypothesis.E – Expected value based on hypothesis.
© Prentice Hall 25
Chi-Square TestChi-Square Test Given the average scores of five schools. Given the average scores of five schools.
Determine whether the difference is Determine whether the difference is statistically significant.statistically significant.
Ex: Ex: – O={50,93,67,78,87}O={50,93,67,78,87}– E=75E=75– 22=15.55 and therefore significant=15.55 and therefore significant
Examine a chi-squared significance table. Examine a chi-squared significance table. – with a degree of 4 and a significance level of 95%, with a degree of 4 and a significance level of 95%,
the critical value is 9.488. Thus the variance the critical value is 9.488. Thus the variance between the schools’ scores and the expected between the schools’ scores and the expected value cannot be associated with pure chance.value cannot be associated with pure chance.
© Prentice Hall 26
RegressionRegression
Predict future values based on past valuesPredict future values based on past values Fitting a set of points to a curveFitting a set of points to a curve Linear RegressionLinear Regression assumes linear assumes linear
relationship exists.relationship exists.
y = cy = c00 + c + c11 x x11 + … + c + … + cnn x xnn
– n input variables, (called regressors or predictors)n input variables, (called regressors or predictors)– One out put variable, called responseOne out put variable, called response– n+1 constants, chosen during the modlong n+1 constants, chosen during the modlong
process to match the input examplesprocess to match the input examples
© Prentice Hall 27
Linear Regression -- Linear Regression -- with one input valuewith one input value
© Prentice Hall 28
CorrelationCorrelation
Examine the degree to which the values Examine the degree to which the values for two variables behave similarly.for two variables behave similarly.
Correlation coefficient r:Correlation coefficient r:• 1 = perfect correlation1 = perfect correlation• -1 = perfect but opposite correlation-1 = perfect but opposite correlation• 0 = no correlation0 = no correlation
© Prentice Hall 29
CorrelationCorrelation
Where X, Y are means for X and Y Where X, Y are means for X and Y respectively.respectively.
Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1)Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1)r = ?r = ?
Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10)Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10)r = ?r = ?
© Prentice Hall 30
Similarity MeasuresSimilarity Measures
Determine similarity between two objects.Determine similarity between two objects. Similarity characteristics:Similarity characteristics:
Alternatively, distance measure measure how Alternatively, distance measure measure how unlike or dissimilar objects are.unlike or dissimilar objects are.
© Prentice Hall 31
Similarity MeasuresSimilarity Measures
© Prentice Hall 32
Distance MeasuresDistance Measures
Measure dissimilarity between objectsMeasure dissimilarity between objects
© Prentice Hall 33
Next Lecture:Next Lecture:
Data Mining techniques (II)Data Mining techniques (II)– Decision trees, neural networks and Decision trees, neural networks and
genetic algorithmsgenetic algorithms Reading assignments: Chapter 3Reading assignments: Chapter 3