+ All Categories
Home > Documents > Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive...

Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive...

Date post: 26-Mar-2015
Category:
Upload: antonio-holmes
View: 225 times
Download: 2 times
Share this document with a friend
Popular Tags:
41
Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference) # Classification of the field of statistics i) Sampling theory ii) Estimation theory iii) Hypothesis testing iv) Curve fitting or Regression v) Analysis of variance
Transcript
Page 1: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Chapter 4. Elements of Statistics

# brief introduction to some concepts of statistics

# descriptive statistics inductive statistics(statistical inference)

# Classification of the field of statisticsi) Sampling theoryii) Estimation theoryiii) Hypothesis testingiv) Curve fitting or Regressionv) Analysis of variance

Page 2: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.2 Sampling Theory–the Sample MeanHow many samples are required

for a given degree of confidence in the result?

# Terminology

- population

N(size of population) very large or ∞

- (random) sample

n(size of sample)

# one of the most important quantities is the sample mean

How close the sample mean might be

to the average value of the population?

Page 3: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Let the sample have the numerical value of x1, x2, … xn

Then, the sample mean is given by

Note that we are interested in the statistical properties of

arbitrary random samples rather than any particular sample.

That is, the sample mean becomes a random variable.

Therefore, it is appropriate to denote the sample mean as

n

i

xin

x1

1

n

i

Xin

x1

1

Page 4: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

We want the mean value of the sample mean

close to the true mean value of the population

the mean value of the sample mean

= the true mean value of the population

The sample mean is a unbiased estimate of the true mean.

But, this is not sufficient to indicate whether the sample mean is a good estimator of the true population mean.

n

i

n

iiXEn

Xin

EXE1 1

][1

]1

[]ˆ[

XXnn

1

X

Page 5: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

The variance of the sample mean 은 ?

N n ≫ 이라 가정 (population 의 특성이 sampling 중에 변하지 않는다 .)

Var mean

square of - square of the mean

n

i

n

jX

nXiXjEX

1 1

2

2 ]1

[)ˆ(

Page 6: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

가정 : statisticallyindep.

따라서 Var

(!)

n

i

n

jX

nXiXjE

1 1

2

2 ][1

XjXi& ji XXiXjE

2][ ji

X 2 ji

nn

nnX

XX

XXnXn

222

2222

2 ])([1ˆ

Page 7: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Where is the true variance of the population As n => ∞, Variance => 0,

Which means that large sample sizes lead to a better estimate

* 참고 : 1)N 이 크지 않을 때 N 이 클 때와 같은 효과를 얻을 수 있는 방법 “sampling with replacement”

2

Page 8: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

2)N 이 작고 replace 할 수 없을 때는Var

N->∞ 앞식으로 수렴N = n 일때는 0 ( 당연 !)

`Two examples : 교재 pp163 ~165 참조

)1

(ˆ2

N

nN

nX

Page 9: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.3 Sampling Theory – The sample Variance

The population variance is needed for determiningthe sample size required to achieve a desired varianceof the sample mean (see eq. 4-4)

Definition(Sample Variance):

The expected value of the sample variance

can be derived easily using

not the true variance , that is, a biased estimate rather than an unbiased one

n

iXXS in 1

22 ˆ1

22 1][

n

nE S

n

j

Xjn

X1

2

2

Page 10: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Now, we redefine the sample variance for having an unbia

sed estimate of the population variance :

Note that these hold for very large N, that is, N=∞.

How about when the population size is not large?

n

iXX

SS

in

n

n

1

2

22

ˆ

~

1

1

1

Page 11: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

# When N is not large, the expected value of S2 is given by

For obtaining an unbiased estimate, we redefine

# The variance of the estimates of the variance :

the variance of S2 :

the variance of :

where is the 4th central moment of the population

22 1

1][

n

n

N

NE S

SS n

n

N

N 22

1

1~

1 2)4( 42~

n

nVar S

n

Var S 4

42

S~2

][4

4 XXE

Page 12: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.4 Sampling Distributions & Confidence Intervalswhat is the probability that the estimates are within specified bounds?

p,d,f 를 알아야 함2 가지 종류 , 그리고 sample mean 에 대해서만 !

normalized sample mean Xi 가 Gaussian and independent 일때

=> Gaussian (0,1)

n

XXZ

ˆ

Page 13: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Xi 가 not Gaussian 이더라도 n=>∞ 이면Z 는 asymptotically Gaussian by the

central limit theorem(n 은 보통 n≥30 은 되어야 함 ; A rule of

thumb)

H.W) Solve the problems in chap.4;4-2.1, 4-2.5, 4-3.1, 4-4.1, 4-5.1, 4-6.1

Page 14: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

를 모를 때 대신에 로 대치그러나

No longer Gaussian =>”Student’s t distribution” with n-1 d.of f.

그림 p170 그림 4-2 참조

S~

1

ˆ~ˆ

nS

XX

nS

XXT

Page 15: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

`pdf of student’s t distribution

Where the gamma heavier tails (n ≥30) n 의 유사 any

= ! integer

1n

2

1)1(

)2

(1

)2

1(

)(2

tf

Tt

T

(.);)1(

)()1( kkk kk k

Page 16: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

( 당연히 )confidence interval 이란 ?

interval estimate ( 어떤 확률을 가지고 구간 내에 존재하는 가를 따짐 )q- percent confidence interval (q/100 의 확률을 갖고 ) 신뢰도

)2

1(,1)2()1( p

n

kXX

n

kX

ˆ

Page 17: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 여 기 서 k 는 q 와 의 pdf 에 의존하는 상수임 .

• k 의 구체적인 값은 p.172 표 .4-1 참조 .

• (q 가 클수록 k 가 커짐 )

kx

kx xdxxfq )(100 ˆ

Page 18: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 예 ) q=95% -> • 가 이 구간에 놓일 확률은 0.95 이다 .• 구간이 작을수록 확률이 적어짐• (q=99% 인 경우는 가 동일 구간이 넓어지나 추정에 필요한 정보 효용성은 떨어짐 !)

196.10ˆ804.9 x

Page 19: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 참고 : q from PDF

• 여기서 F 는 Prob. Distribution for Student’s + function

• (See Appendix F or Table 4-2 page 172 for v = 8 )

)()(100 ˆˆ kxFkxFqxx

Page 20: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.5 Hypothesis Testing

• The question arises; How does one decide to accept or reject a given hypothesis when the sample size and the confidence level are specified?

Page 21: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Two steps; i) to make some hypothesis about the population

• ii) to determine if the observed sample confirms or rejects this hypothesis.

Page 22: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Two tests; one-sided or two-sided.

The average life time of the light bulb >= 1000 hours

100ohms resisters too high or too low

Page 23: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

One-sided test 경우예 ) A capacitor manufacturer claims

that a mean value of breakdown voltage >= 300 V

• a sample of 100 capacitors– >

• 99% confidence level is used• 문 ) Is the manufacturer’s claim valid?• 답 ) We would reject the hypothesis!

)40,400()~,ˆ( 22 VVsx

Page 24: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Normalized r, v, Z

그런데 99% 의 신뢰수준은

5.2100/40

300290

/

n

Xxz

cz cZZ zdzzfzF 99.0)(1)()(

5.233.2 cz

Vx 300Vs 40~

- 2.5 - 2.33

Page 25: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 만약 99.5% 신뢰수준이라면– accept the hypothesis

• 신뢰수준이 낮을수록 구간이 좁아지고 가설을 받아들이기에 less likely

• 즉 more severe requirement 제시• 이것은 의미상 모순적으로 느껴짐

5.2575.2 cz

Page 26: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 이제 유의 수준 (level of significance)으로 재정의하자

• 즉 (100% - 신뢰수준 )• 유의수준이 클수록 more severe!

Page 27: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 예 ) 계속 sample size=9, • no longer Gaussian -> Student’s + distributi

on

• v=n-1=8 dof• 신뢰수준 99%,

– accept the hypothesis

)40,290( 2

75.0/~

ns

Xxt

75.0896.2 ct

Page 28: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• a small sample size 는 t 를 증가시키고

• heavier tail 을 가지고 있는 t distribution 을 를 감소

more likely to exceed the critical valuesmall size less reliable(less severe) than

large size tests

Page 29: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Two-sided test 경우• 예 ) A manufacture of Zener diodes clai

ms that the true mean breakdown voltage = 10V

• 문 ) hypothesis : the true accepts or rejects?

• 100 samples ->• 95% 신뢰수준

)2.1,3.10( 2VV

Page 30: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 답 ) Rejected!

• z is outside the interval,

5.2100/2.1

103.10

/

n

Xxz

96.196.1 z

Page 31: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 문 ) 계속 9 samples

t is inside the interval,

• accepted!– Less severe than a large sample test

75.010/2.1

103.10

/~

ns

Xxt

306.2306.2 t

)2.1,3.10( 2VV

2.5% 2.5%

95%tc=2.306

Page 32: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.6 Curve Fitting and Linear Regression

• 변수들간의 ( 독립변수와 종속변수 ) 간의 함 수 관 계 를 자 료 를 매 개 체 로 하 여 통계적으로 찾아보는 분석방법 즉 , x 와 y의 관련성을 적절한 회귀방정식을 찾아 알아 보려함 .

• 대개 1 차식 (linear) or 2 차식• 반면 다음 절의 상관분석 (correlation analys

is) 는 x 와 y 의 관련성을 상관계수를 구하여 알아 보려함 .

Page 33: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 용어– Scatter diagram ( 산점도 ) data 도시

- n samples

nn yyyxxx ,,,,,, 2121

Page 34: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

- Curve fitting to find a mathematical relationship regression curve (equation) ; resulting curve

Page 35: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

- What is the “best” fit? In a least squares sense

– Let be the errors between the regression curve and the scatter diagram

– 이것을 minimum 으로 하는 미지계수를 정하는 문제임 .

– 먼저 the type of equation to be fitted to the data 를 정하고 미지계수 수가 n 보다 훨씬 작게하면 smoothing 효과 얻음

222

21 n

i

2cxbxay

Page 36: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Linear regression

• 이 최소가

되도록하는 a, b 는 ?

bxay

n

iii bxayJ

1

2)(

Page 37: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 해 )

• 연립방정식을 풀면

n

i

n

iii xbany

a

J

1 10

n

i

n

ii

n

iiii xbxayx

b

J

1 1

2

10

Page 38: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

2

11

2

111

n

ii

n

ii

n

ii

n

ii

n

iii

xxn

yxyxnb

n

xbya

n

ii

n

ii

11

MATLAB in function, p = polyfit(y, x, n)

Page 39: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• A second-order regression ( 교 재 p.180, 표 4-3, 그림 4-6)

0500.4266540.00334.0 2 TTvB

Page 40: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.7 Correlation between Two Sets of Data

• Two data sets correlated or not?

nxxx ,,, 21

n

iixn

x1

1

nyyy ,,, 21

n

iiyn

y1

1

Page 41: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Linear correlation coefficient“ Pearson’s r ”

Usage ; useful in determining the sources of errors예 ) a point-to-point digital communication link

BER(Bit Error Rate) 로 이 link 의 quality 판단BER may fluctuate randomly due to wind

문 ) error source 는 wind 인가 ?wind 속도 20 개 측정치와 resulting BER 과의 correlation test → r=0.891 충분히 크므로 yes!

1r

Gaussianelyapproximat500)( large;randomalso

)()(

))((

1

2

1

2

1

rnr

yyxx

yyxxr

n

ii

n

ii

n

iii


Recommended