+ All Categories
Home > Documents > Patrick Breheny

Patrick Breheny

Date post: 11-Feb-2018
Category:
Upload: andre-ribeiro
View: 219 times
Download: 0 times
Share this document with a friend

of 18

Transcript
  • 7/23/2019 Patrick Breheny

    1/18

    IntroductionThe empirical distribution function

    Introduction; The empirical distribution function

    Patrick Breheny

    August 26

    Patrick Breheny STA 621: Nonparametric Statistics

  • 7/23/2019 Patrick Breheny

    2/18

    IntroductionThe empirical distribution function

    Nonparametric vs. parametric statistics

    The main idea of nonparametric statistics is to makeinferences about unknown quantities without resorting tosimple parametric reductions of the problem

    For example, suppose x F, and we wish to estimate, sayE{X} or P{X > 1}The approach taken by parametric statistics is to assume thatF belongs to a family of distribution functions that can bedescribed by a small number of parameters e.g., the normaldistribution:

    f(x) = 122

    exp(x )222

    These parameters are then estimated, and we make inferenceabout the quantities we were originally interested in (E{X} orP

    {X > 1

    }) based on assuming X

    N(, 2)

    Patrick Breheny STA 621: Nonparametric Statistics

  • 7/23/2019 Patrick Breheny

    3/18

    IntroductionThe empirical distribution function

    Parametric statistics (contd)

    Or suppose we wish to know how E{y} changes with xAgain, the parametric approach is to assume that

    E{y|x} = + x

    We estimate and , then base all future inference on thoseestimates

    Patrick Breheny STA 621: Nonparametric Statistics

  • 7/23/2019 Patrick Breheny

    4/18

    IntroductionThe empirical distribution function

    Shortcomings of the parametric approach

    Both of the aforementioned parametric approach rely on atremendous reduction of the original problem

    They assume that all uncertainty regarding F(x), or E

    {y

    |x

    },

    can be reduced to just two unknown numbers

    If these assumptions are true, then of course, there is nothingwrong with making them

    If they are false, however:

    The resulting statistical inference will be questionableWe might miss interesting patterns in the data

    Patrick Breheny STA 621: Nonparametric Statistics

    I d i

  • 7/23/2019 Patrick Breheny

    5/18

    IntroductionThe empirical distribution function

    The nonparametric approach

    In contrast, nonparametric statistics tries to make as fewassumptions as possible about the data

    Instead of assuming that F(x) is normal, we will allow F(x)

    to be any function (provided, of course, that it satisfies thedefinition of a cdf)

    Instead of assuming that E{y} is linear in x, we will allow itto be any continuous function

    Obviously, this requires the development of a whole new set oftools, as instead of estimating parameters, we will beestimating functions (which are much more complex)

    Patrick Breheny STA 621: Nonparametric Statistics

    I t d ti

  • 7/23/2019 Patrick Breheny

    6/18

    IntroductionThe empirical distribution function

    The four main topics

    We will go over four main areas of nonparametric statistics in thiscourse:

    Estimating aspects of the distribution of a random variableTesting aspects of the distribution of a random variable

    Estimating the density of a random variable

    Estimating the regression function E

    {y

    |x

    }= f(x)

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 7/23/2019 Patrick Breheny

    7/18

    IntroductionThe empirical distribution function

    The empirical distribution function

    We will begin with the problem of estimating a CDF(cumulative distribution function)

    Suppose X F, where F(x) = P(X x) is a distributionfunction

    The empirical distribution function, F, is the CDF that putsmass 1/n at each data point xi:

    F(x) =1

    n

    n

    i=1

    I(xi

    x)

    where I is the indicator function

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 7/23/2019 Patrick Breheny

    8/18

    IntroductionThe empirical distribution function

    The empirical distribution function in R

    R provides the very useful function ecdf for working with theempirical distribution function

    Data Fhat(0.6)

    [1] 0.933667

    plot(Fhat)

    Patrick Breheny STA 621: Nonparametric Statistics

  • 7/23/2019 Patrick Breheny

    9/18

    Introduction

  • 7/23/2019 Patrick Breheny

    10/18

    IntroductionThe empirical distribution function

    Properties of F

    At any fixed value of x,

    E{F(x)} = F(x)V{F(x)} = 1

    nF(x)(1 F(x))

    Note that these two facts imply that

    F(x)P F(x)

    An even stronger proof of convergence is given by theGlivenko-Cantelli Theorem:

    supx

    |F(x) F(x)| a.s. 0

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 7/23/2019 Patrick Breheny

    11/18

    The empirical distribution function

    F as a nonparametric MLE

    The empirical distribution function can be thought of as anonparametric maximum likelihood estimator

    Homework:

    Show that, out of all possible CDFs,

    Fmaximizes

    L(F|x) =n

    i=1PF(xi)

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 7/23/2019 Patrick Breheny

    12/18

    The empirical distribution function

    Confidence intervals vs. confidence bands

    Before moving into the issue of calculating confidenceintervals for F, we need to discuss the notion of a confidenceinterval for a function

    One approach is to fix x and calculate a confidence intervalfor F(x) i.e., find a region C(x) such that, for any CDF F,

    P{F(x) C(x)} 1

    These intervals are referred to as pointwise confidenceintervals

    Patrick Breheny STA 621: Nonparametric Statistics

    Introduction

  • 7/23/2019 Patrick Breheny

    13/18

    The empirical distribution function

    Confidence intervals vs. confidence bands (contd)

    Clearly, however, if there is a 1 probability that F(x) willnot lie in C(x) at each point x, there is greater than a 1 probability that there exists an x such that F(x) will lieoutside C(x)

    Thus, a different approach to inference is to find a confidenceregion C(x) such that, for any CDF F,

    P{F(x) C(x) x} 1

    These intervals are referred to as confidence bands orconfidence envelopes

    Patrick Breheny STA 621: Nonparametric Statistics

    Introductionf

  • 7/23/2019 Patrick Breheny

    14/18

    The empirical distribution function

    Inference regarding F

    We can use the fact that, for each value of x, F(x) follows abinomial distribution with mean F(x) to construct pointwiseintervals for F

    To construct confidence bands, we need a result called theDvoretzky-Kiefer-Wolfowitz inequality, or DKW inequality:

    P

    supx

    |F(x) F(x)| > 2exp(2n2)

    Note that this is a finite-sample, not an asymptotic, result

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionTh i i l di ib i f i

  • 7/23/2019 Patrick Breheny

    15/18

    The empirical distribution function

    A confidence band for F

    Thus, setting the right side of the DKW inequality equal to ,

    = 2exp(2n2)

    log

    2

    = 2n2

    log

    2

    = 2n2

    =1

    2n log 2

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionTh i i l dist ib ti f ti

  • 7/23/2019 Patrick Breheny

    16/18

    The empirical distribution function

    A confidence band for F (contd)

    Thus, the following functions define an upper and lower 1 confidence band for any F and n:

    L(x) = max{F(x) , 0}U(x) = min{F(x) + , 1}

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionThe empirical distribution function

  • 7/23/2019 Patrick Breheny

    17/18

    The empirical distribution function

    Pointwise vs. confidence for nerve pulse data

    0.0 0.5 1.0 1.5

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Time

    F^(x

    )

    Patrick Breheny STA 621: Nonparametric Statistics

    IntroductionThe empirical distribution function

  • 7/23/2019 Patrick Breheny

    18/18

    The empirical distribution function

    Homework

    I claimed that the confidence band based on the DKWinequality worked for any distribution function . . . does it?

    Homework: Generate 100 observations from a N(0, 1)distribution. Compute a 95 percent confidence band for theCDF F. Repeat this 1000 times to see how often theconfidence band contains the true distribution function.Repeat using data generated from a Cauchy distribution.

    Patrick Breheny STA 621: Nonparametric Statistics


Recommended