+ All Categories
Home > Documents > Probability Theory Presentation 12

Probability Theory Presentation 12

Date post: 10-Apr-2018
Category:
Upload: xing-qiu
View: 216 times
Download: 0 times
Share this document with a friend

of 72

Transcript
  • 8/8/2019 Probability Theory Presentation 12

    1/72

    BST 401 Probability Theory

    Xing Qiu Ha Youn Lee

    Department of Biostatistics and Computational BiologyUniversity of Rochester

    October, 14, 2010

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    2/72

    Outline

    1 Radon-Nikodym Theorem

    2 Introduction of Conditional Expectation

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    3/72

    Motivation (I)

    A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous

    ones and discrete ones.

    Continuous probabilities and discrete ones have different

    definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey

    question, how much tax did you pay for year 2008? A small

    but non-trivial proportion of U.S. residents didnt have to

    pay. So you can describe it as a discrete random variable 0

    (did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how

    muchdid they pay, which can be modeled as a continuous

    random variable.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    4/72

    Motivation (I)

    A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous

    ones and discrete ones.

    Continuous probabilities and discrete ones have different

    definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey

    question, how much tax did you pay for year 2008? A small

    but non-trivial proportion of U.S. residents didnt have to

    pay. So you can describe it as a discrete random variable 0

    (did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how

    muchdid they pay, which can be modeled as a continuous

    random variable.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    5/72

    Motivation (I)

    A little refresh of your undergraduate probability theory:There are two types of probability distributions: continuous

    ones and discrete ones.

    Continuous probabilities and discrete ones have different

    definition of density functions (p.d.f.).You can have a mixture of the two. Example: survey

    question, how much tax did you pay for year 2008? A small

    but non-trivial proportion of U.S. residents didnt have to

    pay. So you can describe it as a discrete random variable 0

    (did not pay) and 1 (paid). But thats a bad survey design.Better way: for those who did pay, it is better to record how

    muchdid they pay, which can be modeled as a continuous

    random variable.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    6/72

    Motivation (II)

    The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability

    measure (or L-S measure), can we always decompose it

    into a continuous part and a discrete part?

    The Radon-Nikodym theorem and the Lebesgue

    decomposition theorem are all about the structure of L-Smeasures (probabilities).

    Together they claim that every L-S measure can be

    decomposed into (w.r.t. the L-measure) an absolutely

    continuous part and a singular part.Where the singular part is much like, but not exactly

    restricted to the discrete measures.

    And the absolutely continuous part can be expressed by

    integrating a density function w.r.t. the Lebesgue measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    7/72

    Motivation (II)

    The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability

    measure (or L-S measure), can we always decompose it

    into a continuous part and a discrete part?

    The Radon-Nikodym theorem and the Lebesgue

    decomposition theorem are all about the structure of L-Smeasures (probabilities).

    Together they claim that every L-S measure can be

    decomposed into (w.r.t. the L-measure) an absolutely

    continuous part and a singular part.

    Where the singular part is much like, but not exactly

    restricted to the discrete measures.

    And the absolutely continuous part can be expressed by

    integrating a density function w.r.t. the Lebesgue measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    8/72

    Motivation (II)

    The more challenging problem: are these two the onlytypes of probability measures? I.e., for every probability

    measure (or L-S measure), can we always decompose it

    into a continuous part and a discrete part?

    The Radon-Nikodym theorem and the Lebesgue

    decomposition theorem are all about the structure of L-Smeasures (probabilities).

    Together they claim that every L-S measure can be

    decomposed into (w.r.t. the L-measure) an absolutely

    continuous part and a singular part.

    Where the singular part is much like, but not exactly

    restricted to the discrete measures.

    And the absolutely continuous part can be expressed by

    integrating a density function w.r.t. the Lebesgue measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    9/72

    Motivation (II)

    The more challenging problem: are these two the only

    types of probability measures? I.e., for every probability

    measure (or L-S measure), can we always decompose it

    into a continuous part and a discrete part?

    The Radon-Nikodym theorem and the Lebesgue

    decomposition theorem are all about the structure of L-Smeasures (probabilities).

    Together they claim that every L-S measure can be

    decomposed into (w.r.t. the L-measure) an absolutely

    continuous part and a singular part.

    Where the singular part is much like, but not exactly

    restricted to the discrete measures.

    And the absolutely continuous part can be expressed by

    integrating a density function w.r.t. the Lebesgue measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    10/72

    Motivation (II)

    The more challenging problem: are these two the only

    types of probability measures? I.e., for every probability

    measure (or L-S measure), can we always decompose it

    into a continuous part and a discrete part?

    The Radon-Nikodym theorem and the Lebesgue

    decomposition theorem are all about the structure of L-Smeasures (probabilities).

    Together they claim that every L-S measure can be

    decomposed into (w.r.t. the L-measure) an absolutely

    continuous part and a singular part.

    Where the singular part is much like, but not exactly

    restricted to the discrete measures.

    And the absolutely continuous part can be expressed by

    integrating a density function w.r.t. the Lebesgue measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    11/72

    Motivation (III)

    In this sense, R-N theorem a) defines an abstractderivative between two measures and , denoted as d

    d;

    b) provides a criterion based on which we can check if dd

    exists or not.

    Just like Lebesgue-Stieltjes integral is an extension of the

    usual Riemann integral, Radon-Nikodym derivative is an

    extension of the usual derivative.R

    f(x)dx

    Riemann integral

    =

    R

    f(x)d(x)

    L-S integral

    ,dF(x)

    dx Calculus derivative

    =d

    d= f(x)

    R-N derivative

    ,

    where is the Lebesgue measure, F (f) is thedistribution (density) function of .

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    12/72

    Motivation (III)

    In this sense, R-N theorem a) defines an abstractderivative between two measures and , denoted as d

    d;

    b) provides a criterion based on which we can check if dd

    exists or not.

    Just like Lebesgue-Stieltjes integral is an extension of the

    usual Riemann integral, Radon-Nikodym derivative is an

    extension of the usual derivative.R

    f(x)dx

    Riemann integral

    =

    R

    f(x)d(x)

    L-S integral

    ,dF(x)

    dx Calculus derivative

    =d

    d= f(x)

    R-N derivative

    ,

    where is the Lebesgue measure, F (f) is thedistribution (density) function of .

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    13/72

    Hahn decomposition theorem

    Every signed measure can be decomposed into a positive and

    a negative part.

    Recall the function +/- branch.

    Let be a signed measure. (e.g., almost a measure except

    that (A) can be negative. Analogy: electric charge).

    Partition of the whole space into a positive set + and anegative set . = + , + = .

    For each A F, (A +) 0 and (A ) 0.

    This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    14/72

    Hahn decomposition theorem

    Every signed measure can be decomposed into a positive and

    a negative part.

    Recall the function +/- branch.

    Let be a signed measure. (e.g., almost a measure except

    that (A) can be negative. Analogy: electric charge).

    Partition of the whole space into a positive set + and anegative set . = + , + = .

    For each A F, (A +) 0 and (A ) 0.

    This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    15/72

    Hahn decomposition theorem

    Every signed measure can be decomposed into a positive and

    a negative part.

    Recall the function +/- branch.

    Let be a signed measure. (e.g., almost a measure except

    that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .

    For each A F, (A +) 0 and (A ) 0.

    This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    16/72

    Hahn decomposition theorem

    Every signed measure can be decomposed into a positive and

    a negative part.

    Recall the function +/- branch.

    Let be a signed measure. (e.g., almost a measure except

    that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .

    For each A F, (A +) 0 and (A ) 0.

    This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    17/72

    Hahn decomposition theorem

    Every signed measure can be decomposed into a positive and

    a negative part.

    Recall the function +/- branch.

    Let be a signed measure. (e.g., almost a measure except

    that (A) can be negative. Analogy: electric charge).Partition of the whole space into a positive set + and anegative set . = + , + = .

    For each A F, (A +) 0 and (A ) 0.

    This decomposition is unique up to a null set. If(0) = 0, then + 0 and \0 is an equivalent Hahndecomposition.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    18/72

    Singularity

    1, 2 are measures on the same F.

    They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that

    1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.

    L-measure.

    Not all measures that are singular w.r.t. L-meas are

    discrete measures. a) R2

    , uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    19/72

    Singularity

    1, 2 are measures on the same F.

    They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that

    1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.

    L-measure.

    Not all measures that are singular w.r.t. L-meas are

    discrete measures. a) R2

    , uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    20/72

    Singularity

    1, 2 are measures on the same F.

    They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that

    1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.

    L-measure.

    Not all measures that are singular w.r.t. L-meas are

    discrete measures. a) R2

    , uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    21/72

    Singularity

    1, 2 are measures on the same F.

    They are said to be mutually singular, written as 1 2, ifthey concentrate on disjoint sets. i.e., B F, such that

    1(B) = 0 and 2(Bc) = 0.Examples. Sets with two parts; discrete measures w.r.t.

    L-measure.

    Not all measures that are singular w.r.t. L-meas are

    discrete measures. a)R

    2

    , uniform measure on a circle/line;b) R1, uniform measure on the Cantor set. Wikipedia!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    22/72

    Jordan-Hahn decomposition

    A natural consequence of the Hahn decomposition theorem.

    = + , where +, are two measures (meaning:with positive values) that are mutually singular.

    +(A) = (A +), for all A F.

    (A) = (A ), for all A F.

    At least one of +, must be finite, otherwise() = is not well defined.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    23/72

    Jordan-Hahn decomposition

    A natural consequence of the Hahn decomposition theorem.

    = + , where +, are two measures (meaning:with positive values) that are mutually singular.

    +(A) = (A +), for all A F.

    (A) = (A ), for all A F.

    At least one of +, must be finite, otherwise() = is not well defined.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    24/72

    Jordan-Hahn decomposition

    A natural consequence of the Hahn decomposition theorem.

    = + , where +, are two measures (meaning:with positive values) that are mutually singular.

    +(A) = (A +), for all A F.

    (A) = (A ), for all A F.

    At least one of +, must be finite, otherwise() = is not well defined.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    25/72

    Jordan-Hahn decomposition

    A natural consequence of the Hahn decomposition theorem.

    = + , where +, are two measures (meaning:with positive values) that are mutually singular.

    +(A) = (A +), for all A F.

    (A) = (A ), for all A F.

    At least one of +, must be finite, otherwise() = is not well defined.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    26/72

    Total variation

    Sometimes, +, are called the upper/lower variations of

    .|| = + + is called the total variation of . Sort of theabsolute value of a measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    27/72

    Total variation

    Sometimes, +, are called the upper/lower variations of

    .|| = + + is called the total variation of . Sort of theabsolute value of a measure.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    28/72

    Absolute continuity of measures

    Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

    One interesting observation: (A) = 0 = (A) = 0.

    Definition: is said to be absolute continuous w.r.t. to , iff

    (A) = 0 = (A) = 0 for all A F.Notation: .

    Calculus analogy: if F =

    fdx, F must be continuous

    (w.r.t. the Lebesgue measure).

    The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

    continuous function can be anti-derivatives. Wikipedia has

    an excellent entry on this topic.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    29/72

    Absolute continuity of measures

    Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

    One interesting observation: (A) = 0 = (A) = 0.

    Definition: is said to be absolute continuous w.r.t. to , iff

    (A) = 0 = (A) = 0 for all A F.Notation: .

    Calculus analogy: if F =

    fdx, F must be continuous

    (w.r.t. the Lebesgue measure).

    The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

    continuous function can be anti-derivatives. Wikipedia has

    an excellent entry on this topic.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    30/72

    Absolute continuity of measures

    Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

    One interesting observation: (A) = 0 = (A) = 0.

    Definition: is said to be absolute continuous w.r.t. to , iff

    (A) = 0 = (A) = 0 for all A F

    .Notation: .

    Calculus analogy: if F =

    fdx, F must be continuous

    (w.r.t. the Lebesgue measure).

    The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

    continuous function can be anti-derivatives. Wikipedia has

    an excellent entry on this topic.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    31/72

    Absolute continuity of measures

    Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

    One interesting observation: (A) = 0 = (A) = 0.

    Definition: is said to be absolute continuous w.r.t. to , iff

    (A) = 0 = (A) = 0 for all A F

    .Notation: .

    Calculus analogy: if F =

    fdx, F must be continuous

    (w.r.t. the Lebesgue measure).

    The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

    continuous function can be anti-derivatives. Wikipedia has

    an excellent entry on this topic.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    32/72

    Absolute continuity of measures

    Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

    One interesting observation: (A) = 0 = (A) = 0.

    Definition: is said to be absolute continuous w.r.t. to , iff

    (A) = 0 = (A) = 0 for all A F

    .Notation: .

    Calculus analogy: if F =

    fdx, F must be continuous

    (w.r.t. the Lebesgue measure).

    The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

    continuous function can be anti-derivatives. Wikipedia has

    an excellent entry on this topic.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    33/72

    Absolute continuity of measures

    Exercise (5.8), Page 465: (A) = A gd defines a signedmeasure on F.

    One interesting observation: (A) = 0 = (A) = 0.

    Definition: is said to be absolute continuous w.r.t. to , iff

    (A) = 0 = (A) = 0 for all A F

    .Notation: .

    Calculus analogy: if F =

    fdx, F must be continuous

    (w.r.t. the Lebesgue measure).

    The name absolute continuity implies that it is a strongertype of continuity. Just as in calculus, only certain

    continuous function can be anti-derivatives. Wikipedia has

    an excellent entry on this topic.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    34/72

    Radon-Nikodym Theorem

    If , then there must exists a measurable function g

    such that (A) = A gd.And this g is almost everywhere unique: if h is another

    such function, then ga.e.= h.

    This g is called the density function or the Radon-Nikodym

    derivative of w.r.t. , denoted asd

    d . and apparently are defined on the same -algebra.

    : Lebesgue measure = g : the usual density function.

    If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

    then R-N derivative is the ratio of the two densities

    d

    d=

    g

    f, g =

    d

    dx, f =

    d

    dx.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    35/72

    Radon-Nikodym Theorem

    If , then there must exists a measurable function g

    such that (A) = A gd.And this g is almost everywhere unique: if h is another

    such function, then ga.e.= h.

    This g is called the density function or the Radon-Nikodym

    derivative of w.r.t. , denoted asd

    d . and apparently are defined on the same -algebra.

    : Lebesgue measure = g : the usual density function.

    If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

    then R-N derivative is the ratio of the two densities

    d

    d=

    g

    f, g =

    d

    dx, f =

    d

    dx.

    Qiu, Lee BST 401

    R d Nik d Th

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    36/72

    Radon-Nikodym Theorem

    If , then there must exists a measurable function g

    such that (A) = A gd.And this g is almost everywhere unique: if h is another

    such function, then ga.e.= h.

    This g is called the density function or the Radon-Nikodym

    derivative of w.r.t. , denoted asd

    d . and apparently are defined on the same -algebra.

    : Lebesgue measure = g : the usual density function.

    If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

    then R-N derivative is the ratio of the two densities

    d

    d=

    g

    f, g =

    d

    dx, f =

    d

    dx.

    Qiu, Lee BST 401

    R d Nik d Th

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    37/72

    Radon-Nikodym Theorem

    If , then there must exists a measurable function g

    such that (A) = A gd.And this g is almost everywhere unique: if h is another

    such function, then ga.e.= h.

    This g is called the density function or the Radon-Nikodym

    derivative of w.r.t. , denoted asd

    d . and apparently are defined on the same -algebra.

    : Lebesgue measure = g : the usual density function.

    If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

    then R-N derivative is the ratio of the two densities

    d

    d=

    g

    f, g =

    d

    dx, f =

    d

    dx.

    Qiu, Lee BST 401

    R d Nik d Th

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    38/72

    Radon-Nikodym Theorem

    If , then there must exists a measurable function g

    such that (A) = A gd.And this g is almost everywhere unique: if h is another

    such function, then ga.e.= h.

    This g is called the density function or the Radon-Nikodym

    derivative of w.r.t. , denoted asd

    d . and apparently are defined on the same -algebra.

    : Lebesgue measure = g : the usual density function.

    If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

    then R-N derivative is the ratio of the two densities

    d

    d=

    g

    f, g =

    d

    dx, f =

    d

    dx.

    Qiu, Lee BST 401

    R d Nik d Th

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    39/72

    Radon-Nikodym Theorem

    If , then there must exists a measurable function g

    such that (A) = A gd.And this g is almost everywhere unique: if h is another

    such function, then ga.e.= h.

    This g is called the density function or the Radon-Nikodym

    derivative of w.r.t. , denoted asd

    d . and apparently are defined on the same -algebra.

    : Lebesgue measure = g : the usual density function.

    If both and are absolutely continuous w.r.t to theLebesgue measure (means both usual densities exist),

    then R-N derivative is the ratio of the two densities

    d

    d=

    g

    f, g =

    d

    dx, f =

    d

    dx.

    Qiu, Lee BST 401

    Lebesgue Decomposition Theorem

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    40/72

    Lebesgue Decomposition Theorem

    Radon-Nikodym theorem takes care of the continuous

    measures. Now let us deal with the general case.

    : a reference measure. a signed measure defined onthe same -field F.

    = 1 + 2, where 1 and 2 are signed measures suchthat

    1 (the absolutely continuous part), 2 (thesingular part).

    This decomposition is unique.

    Qiu, Lee BST 401

    Lebesgue Decomposition Theorem

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    41/72

    Lebesgue Decomposition Theorem

    Radon-Nikodym theorem takes care of the continuous

    measures. Now let us deal with the general case.

    : a reference measure. a signed measure defined onthe same -field F.

    = 1 + 2, where 1 and 2 are signed measures suchthat

    1 (the absolutely continuous part), 2 (thesingular part).

    This decomposition is unique.

    Qiu, Lee BST 401

    Lebesgue Decomposition Theorem

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    42/72

    Lebesgue Decomposition Theorem

    Radon-Nikodym theorem takes care of the continuous

    measures. Now let us deal with the general case.

    : a reference measure. a signed measure defined onthe same -field F.

    = 1 + 2, where 1 and 2 are signed measures suchthat

    1 (the absolutely continuous part), 2 (thesingular part).

    This decomposition is unique.

    Qiu, Lee BST 401

    Lebesgue Decomposition Theorem

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    43/72

    Lebesgue Decomposition Theorem

    Radon-Nikodym theorem takes care of the continuous

    measures. Now let us deal with the general case.

    : a reference measure. a signed measure defined onthe same -field F.

    = 1 + 2, where 1 and 2 are signed measures suchthat

    1 (the absolutely continuous part), 2 (thesingular part).

    This decomposition is unique.

    Qiu, Lee BST 401

    Density function revisited

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    44/72

    Density function revisited

    Continuous random variables. Def.

    Reference measure: L-measure.Discrete r.v.s. Def.

    Reference measure: counting measure on the state space.

    Qiu, Lee BST 401

    Density function revisited

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    45/72

    Density function revisited

    Continuous random variables. Def.

    Reference measure: L-measure.Discrete r.v.s. Def.

    Reference measure: counting measure on the state space.

    Qiu, Lee BST 401

    Density function revisited

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    46/72

    Density function revisited

    Continuous random variables. Def.

    Reference measure: L-measure.Discrete r.v.s. Def.

    Reference measure: counting measure on the state space.

    Qiu, Lee BST 401

    Density function revisited

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    47/72

    Density function revisited

    Continuous random variables. Def.

    Reference measure: L-measure.Discrete r.v.s. Def.

    Reference measure: counting measure on the state space.

    Qiu, Lee BST 401

    Its all about averaging

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    48/72

    It s all about averaging

    A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.

    The mathematical expectation of X is the theoretical

    average of X over the whole space .

    We can also do partial average. Say for some reason we

    want to restrict the possible outcomes of X to only

    A = {s1, s2}. Whats the theoretical average of Xconditional on A?

    Answer: A XdPP(A)

    , denoted as E(X|A).

    Qiu, Lee BST 401

    Its all about averaging

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    49/72

    It s all about averaging

    A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.

    The mathematical expectation of X is the theoretical

    average of X over the whole space .

    We can also do partial average. Say for some reason we

    want to restrict the possible outcomes of X to only

    A = {s1, s2}. Whats the theoretical average of Xconditional on A?

    Answer: A XdPP(A)

    , denoted as E(X|A).

    Qiu, Lee BST 401

    Its all about averaging

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    50/72

    It s all about averaging

    A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.

    The mathematical expectation of X is the theoretical

    average of X over the whole space .

    We can also do partial average. Say for some reason we

    want to restrict the possible outcomes of X to only

    A = {s1, s2}. Whats the theoretical average of Xconditional on A?

    Answer: A XdPP(A)

    , denoted as E(X|A).

    Qiu, Lee BST 401

    Its all about averaging

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    51/72

    t s a about a e ag g

    A discrete example. = {s1, s2, . . . , sN}. X : R is ar.v.

    The mathematical expectation of X is the theoretical

    average of X over the whole space .

    We can also do partial average. Say for some reason we

    want to restrict the possible outcomes of X to only

    A = {s1, s2}. Whats the theoretical average of Xconditional on A?

    Answer: A XdPP(A)

    , denoted as E(X|A).

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    52/72

    Conditional expectation and the total expectation

  • 8/8/2019 Probability Theory Presentation 12

    53/72

    p p

    In the same way we can compute E(X|Ac). The total

    expectation is the weighted average of the two conditionalexpectations: EX = P(A)E(X|A) + P(Ac)E(X|Ac).If is a disjoint union of A1, A2, . . . , AK, we may computethe total expectation by first compute the conditional

    expectations on Ak, then take the weighted average of

    these conditional expectations (Equation 1.1f, page 224)

    EX =K

    k=1

    P(Ak)E(X|Ak).

    In fact, it is as easy to compute E(X|B) from E(X|Ak), if Bis a member of G = ({A1, A2, . . . , AK})

    E(X|B) =1

    P(B)

    AkB

    P(Ak)E(X|Ak).

    Qiu, Lee BST 401

    Conditional expectation and the total expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    54/72

    p p

    In the same way we can compute E(X|Ac). The total

    expectation is the weighted average of the two conditionalexpectations: EX = P(A)E(X|A) + P(Ac)E(X|Ac).If is a disjoint union of A1, A2, . . . , AK, we may computethe total expectation by first compute the conditional

    expectations on Ak, then take the weighted average of

    these conditional expectations (Equation 1.1f, page 224)

    EX =K

    k=1

    P(Ak)E(X|Ak).

    In fact, it is as easy to compute E(X|B) from E(X|Ak), if Bis a member of G = ({A1, A2, . . . , AK})

    E(X|B) =1

    P(B)

    AkB

    P(Ak)E(X|Ak).

    Qiu, Lee BST 401

    Cond. Exp. and the -algebra

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    55/72

    p g

    Last slides shows that you can view conditional

    expectation as a r.v. on G.

    Define this random variable in this way:

    Y() = E(X|Ak), Ai.

    This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means

    Y1(B) G for all B B(R) In general X / G because

    X F but F is usually finer than G.2 For all B G,

    B

    YdP =

    BXdP.

    Qiu, Lee BST 401

    Cond. Exp. and the -algebra

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    56/72

    Last slides shows that you can view conditional

    expectation as a r.v. on G.

    Define this random variable in this way:

    Y() = E(X|Ak), Ai.

    This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means

    Y1(B) G for all B B(R) In general X / G because

    X F but F is usually finer than G.2 For all B G,

    B

    YdP =

    BXdP.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    57/72

    Cond. Exp. and the -algebra

  • 8/8/2019 Probability Theory Presentation 12

    58/72

    Last slides shows that you can view conditional

    expectation as a r.v. on G.

    Define this random variable in this way:

    Y() = E(X|Ak), Ai.

    This r.v. satisfies the following properties:1 Y is G measurable, denoted as Y G. It means

    Y1(B) G for all B B(R) In general X / G because

    X F but F is usually finer than G.2 For all B G,

    B

    YdP =

    BXdP.

    Qiu, Lee BST 401

    Conditional expectation and R-N derivative

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    59/72

    Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.

    It turns out that Y = ddP

    , where is a signed measure

    defined on G and P is P restricted on G

    (A) =

    A

    XdP, P : G R, P(B) = P(B).

    This construction shows that the conditional expectation is

    just a special Radon-Nikodym derivative between two

    measures.

    Ha Youn will revisit this subject after the midterm exam.

    Qiu, Lee BST 401

    Conditional expectation and R-N derivative

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    60/72

    Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.

    It turns out that Y = ddP

    , where is a signed measure

    defined on G and P is P restricted on G

    (A) =

    A

    XdP, P : G R, P(B) = P(B).

    This construction shows that the conditional expectation is

    just a special Radon-Nikodym derivative between two

    measures.

    Ha Youn will revisit this subject after the midterm exam.

    Qiu, Lee BST 401

    Conditional expectation and R-N derivative

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    61/72

    Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.

    It turns out that Y = ddP

    , where is a signed measure

    defined on G and P is P restricted on G

    (A) =

    A

    XdP, P : G R, P(B) = P(B).

    This construction shows that the conditional expectation is

    just a special Radon-Nikodym derivative between two

    measures.

    Ha Youn will revisit this subject after the midterm exam.

    Qiu, Lee BST 401

    Conditional expectation and R-N derivative

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    62/72

    Y is denoted as E(X|G) and is called the conditionalexpectation of X given G.

    It turns out that Y = ddP

    , where is a signed measure

    defined on G and P is P restricted on G

    (A) =

    A

    XdP, P : G R, P(B) = P(B).

    This construction shows that the conditional expectation is

    just a special Radon-Nikodym derivative between two

    measures.

    Ha Youn will revisit this subject after the midterm exam.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    t ti

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    63/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    t ti

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    64/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    t ti

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    65/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    66/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    67/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    68/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    69/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    70/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    71/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    Lena Sderberg, an Illustration of conditional

    expectation

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 12

    72/72

    expectation

    is the canvas (512512 pix-els), F is the -algebra gen-erate by these pixels, is thediscrete uniform distribution. A

    graph is a random vector X : R3, X = (R, G, B).

    {, } = F0 F1 . . . F9 = F.

    These figures represent con-

    ditional expectation E(X|Fi).Basically E(X|F0) = E(X),E(X|F) = X.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/

Recommended