+ All Categories
Home > Documents > Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics...

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics...

Date post: 19-Dec-2015
Category:
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
50
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected] http://www.ecse.rpi.edu/Homepages/shiv kuma Based in part upon slides of Prof. Raj Jain (OSU) He uses statistics as a drunken man uses lamp-posts – for support rather than for illumination … A. Lang
Transcript
Page 1: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion

Shivkumar KalyanaramanRensselaer Polytechnic Institute

[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Based in part upon slides of Prof. Raj Jain (OSU)

He uses statistics as a drunken man uses lamp-posts – for support rather than for illumination … A. Lang

Page 2: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Why Probability and Statistics: The Empirical Design Method…

Qualitative understanding of essential probability and statistics

Especially the notion of inference and statistical significance

Key distributions & why we care about them… Reference: Chap 12, 13 (Jain), Chap 2-3

(Box,Hunter,Hunter), and http://mathworld.wolfram.com/topics/ProbabilityandStatistics.html

Overview

Page 3: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

Why Care About Prob. & Statistics?

How to makethis empiricaldesign processEFFICIENT??

How to avoidpitfalls in inference!

Page 4: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Probability Think of probability as modeling an experiment The set of all possible outcomes is the sample

space: S

Classic “Experiment”: Tossing a die: S = {1,2,3,4,5,6} Any subset A of S is an event: A = {the

outcome is even} = {2,4,6}

Page 5: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

Probability of Events: Axioms•P is the Probability Mass function if it maps each

event A, into a real number P(A), and:i.)

ii.) P(S) = 1

iii.)If A and B are mutually exclusive events then,

( ) 0 for every event P A A S

( ) ( ) ( )P A B P A P B

Page 6: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Probability of Events

…In fact for any sequence of pair-wise-mutually-exclusive events, we have

1 2 3, , ,... (i.e. 0 for any )i jA A A A A i j

1 1( )n n

n nP A P A

Page 7: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

Other Properties

( ) 1 ( )P A P A

( ) 1P A

( ) ( ) ( ) ( )P A B P A P B P AB

( ) ( )A B P A P B Derived by breaking up above sets into mutually exclusive pieces and comparing to fundamental axioms!!

Page 8: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

Conditional Probability

( | )P A B• = (conditional) probability that the outcome is in A given that we know the outcome in B

•Example: Toss one die.

•Note that:

( )( | ) ( ) 0

( )

P ABP A B P B

P B

( 3 | i is odd)=P i

( ) ( ) ( | ) ( ) ( | )P AB P B P A B P A P B A

Page 9: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Independence Events A and B are independent if P(AB) = P(A)P(B). Also: and Example: A card is selected at random from an

ordinary deck of cards. A=event that the card is an ace. B=event that the card is a diamond.

( )P AB

( )P A

( ) ( )P A P B

( )P B

( | ) ( )P A B P A ( | ) ( )P B A P B

Page 10: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Random Variable as a Measurement

We cannot give an exact description of a sample space in these cases, but we can still describe specific measurements on themThe temperature change produced.The number of photons emitted in one

millisecond.The time of arrival of the packet.

Page 11: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Random Variable as a Measurement

Thus a random variable can be thought of as a measurement on an experiment

Page 12: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

Probability Mass Function for a Random Variable

The probability mass function (PMF) for a (discrete valued) random variable X is:

Note that for

Also for a (discrete valued) random variable X

( ) ( ) ({ | ( ) })XP x P X x P s S X s x

x ( ) 0XP x

( ) 1Xx

P x

Page 13: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

Probability Distribution Function (pdf)

a.k.a. frequency histogram, p.m.f (for discrete r.v.)

Page 14: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Cumulative Distribution Function The cumulative distribution function (CDF) for a

random variable X is

Note that is non-decreasing in x, i.e.

Also and

( ) ( ) ({ | ( ) })XF x P X x P s S X s x ( )XF x

1 2 1 2( ) ( )x xx x F x F x

lim ( ) 1xx

F x

lim ( ) 0xx

F x

Page 15: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

PMF and CDF: Example

Page 16: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

Expectation of a Random Variable The expectation (average) of a (discrete-valued) random variable X

is

Three coins example:

( ) ( ) ( )Xx

X E X xP X x xP x

3

0

1 3 3 1( ) ( ) 0 1 2 3 1.5

8 8 8 8Xx

E X xP x

Page 17: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

Expectation (Mean) = Center of Gravity

Page 18: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Median, Mode

Median = F-1 (0.5), where F = CDF Aka 50% percentile element I.e. Order the values and pick the middle element Used when distribution is skewed

Mode: Most frequent or highest probability value Multiple modes are possible Need not be the “central” element Mode may not exist (eg: uniform distribution) Used with categorical variables

Page 19: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Page 20: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

Page 21: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

Measures of Spread/Dispersion: Why Care?

You can drown in a river of average depth 6 inches!

Page 22: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Variance: second moment around the mean: 2 = E((X-)2)

Standard deviation =

Coefficient of Variation (C.o.V.)= / SIQR= Semi-Inter-Quartile Range (used with

median = 50th percentile) (75th percentile – 25th percentile)/2

Standard Deviation, Coeff. Of Variation, SIQR

Page 23: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Covariance and Correlation: Measures of Dependence

Covariance: =

For i = j, covariance = variance! Independence => covariance = 0 (not vice-versa!)

Correlation (coefficient) is a normalized (or scaleless) form of covariance:

Between –1 and +1. Zero => no correlation (uncorrelated). Note: uncorrelated DOES NOT mean independent!

Page 24: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Continuous-valued Random Variables

So far we have focused on discrete(-valued) random variables, e.g. X(s) must be an integer

Examples of discrete random variables: number of arrivals in one second, number of attempts until success

A continuous-valued random variable takes on a range of real values, e.g. X(s) ranges from 0 to as s varies.

Examples of continuous(-valued) random variables: time when a particular arrival occurs, time between consecutive arrivals.

Page 25: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

Continuous-valued Random Variables

Thus, for a continuous random variable X, we can define its probability density function (pdf)

Note that since is non-decreasing in x we have

for all x.

' ( )( ) ( ) X

Xx

dF xf x F x

dx

( )XF x

( ) 0Xf x

Page 26: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

Properties of Continuous Random Variables

From the Fundamental Theorem of Calculus, we have

In particular,

More generally,

( ) ( ) ( ) ( )b

X X Xaf x dx F b F a P a X b

( ) ( )x

X xF x f x dx

( ) ( ) 1Xfx x dx F

Page 27: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Expectation of a Continuous Random Variable

The expectation (average) of a continuous random variable X is given by

Note that this is just the continuous equivalent of the discrete expectation

( ) ( )XE X xf x dx

( ) ( )Xx

E X xP x

Page 28: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Important (Discrete) Random Variable: Bernoulli

The simplest possible measurement on an experiment: Success (X = 1) or failure (X = 0).

Usual notation:

E(X)=

(1) ( 1) (0) ( 0) 1X XP P X p P P X p

Page 29: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Important (discrete) Random Variables: Binomial

Let X = the number of success in n independent Bernoulli

experiments ( or trials). P(X=0) =

P(X=1) =

P(X=2)=

• In general, P(X = x) =

Binomial Variables are useful for proportions (of successes. Failures) for a small number of repeated experiments. For larger number (n), under certain conditions (p is small), Poisson distribution is used.

Page 30: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Binomial can be skewed or normal

Depends uponp and n !

Page 31: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Important Random Variable:Poisson

A Poisson random variable X is defined by its PMF:

Where > 0 is a constant Exercise: Show that

and E(X) =

Poisson random variables are good for counting frequency of occurrence: like the number of customers that arrive to a bank in one hour, or the number of packets that arrive to a router in one second.

0( ) 1 X

xP x

( ) 0,1,2,...!

x

P X x e xx

Page 32: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

Important Continuous Random Variable: Exponential

Used to represent time, e.g. until the next arrival Has PDF

for some > 0 Properties:

Need to use integration by Parts!0

1( ) 1 and ( )Xf x dx E X

for x 00 for x < 0( ) {

xeXf x

Page 33: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Memoryless Property of the Exponential

An exponential random variable X has the property that “the future is independent of the part”, i.e. the fact that it hasn’t happened yet, tells us nothing about how much longer it will take.

In math terms se

( | ) ( ) for , 0P X s t X t P X s s t

Page 34: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

34

Important Random Variables: Normal

Page 35: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

35

Normal Distribution: PDF & CDF

PDF:

With the transformation:

(a.k.a. unit normal deviate) z-normal-PDF:

z

Page 36: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

36

Why is Gaussian Important?

Uniform distributionlooks nothing like bell shaped (gaussian)!Large spread ()!

Sample mean of uniform distribution (a.k.a sampling distribution), after very few samples looks remarkably gaussian, with decreasing !

CENTRAL LIMIT TENDENCY!

Page 37: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

37

Other interesting facts about Gaussian

Uncorrelated r.vs. + gaussian => INDEPENDENT! Important in random processes (I.e. sequences

of random variables)

Random variables that are independent, and have exactly the same distribution are called IID (independent & identically distributed)

IID and normal with zero mean and variance 2 => IIDN(0, 2 )

Page 38: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

38

Height & Spread of Gaussian Can Vary!

Page 39: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

39

Rapidly Dropping Tail Probability!

Sample mean is a gaussian r.v., with x = & s = /(n)0.5

With larger number of samples, avg of sample means is an excellent estimate of true mean.If (original) is known, invalid mean estimates can be rejected with HIGH confidence!

Page 40: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

40

Confidence Interval

Probability that a measurement will fall within a closed interval [a,b]: (mathworld definition…)

= (1-)

Jain: the interval [a,b] = “confidence interval”; the probability level, 100(1-)= “confidence level”; = “significance level”

Sampling distribution for means leads to high confidence levels, I.e. small confidence intervals

Page 41: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

41

Meaning of Confidence Interval

Page 42: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

42

Statistical Inference: Is A = B ?

• Note: sample mean yA is not A, but its estimate!• Is this difference statistically significant?• Is the null hypothesis yA = yB false ?

Page 43: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

43

Step 1: Plot the samples

Page 44: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

44

Compare to (external) reference distribution (if available)

Since 1.30 is at the tail of the reference distribution, the difference between means is NOT statistically significant!

Page 45: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

45

Random Sampling Assumption!

Under random sampling assumption, and the null hypothesis of yA = yB, we can view the 20 samples from a common population & construct a reference distributions from the samples itself !

Page 46: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

46

t-distribution: Create a Reference Distribution from the Samples Itself!

Page 47: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

47

t-distribution

Page 48: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

48

Statistical Significance with Various Inference Techniques

Normal population assumption not required

Random samplingassumption required

Std.dev. estimated from samples itself!

t-distributionan approx.for gaussian!

Page 49: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

49

Normal, 2 & t-distributions: Useful for Statistical Inference

Page 50: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

50

Relationship between Confidence Intervals and Comparisons of Means


Recommended