Stats 241.3

Post on 31-Dec-2015

41 views 0 download

description

Stats 241.3. Probability Theory Summary. Probability. Axioms of Probability A probability measure P is defined on S by defining for each event E , P [ E ] with the following properties. P [ E ] ≥ 0 , for each E. P [ S ] = 1. Finite uniform probability space. - PowerPoint PPT Presentation

transcript

Stats 241.3

Probability Theory

Summary

Probability

Axioms of Probability

A probability measure P is defined on S by defining for each event E, P[E] with the following properties

1. P[E] ≥ 0, for each E.

2. P[S] = 1.

3. if for all ,i i i iii

P E P E E E i j

1 2 1 2P E E P E P E

Finite uniform probability space

Many examples fall into this category

1. Finite number of outcomes

2. All outcomes are equally likely

3.

no. of outcomes in =

total no. of outcomes

n E n E EP E

n S N

: = no. of elements of n A ANote

To handle problems in case we have to be able to count. Count n(E) and n(S).

Techniques for counting

Basic Rule of countingSuppose we carry out k operations in sequence

Letn1 = the number of ways the first operation

can be performed

ni = the number of ways the ith operation can be performed once the first (i - 1) operations have been completed. i = 2, 3, … , k

Then N = n1n2 … nk = the number of ways the k operations can be performed in sequence.

Basic Counting Formulae1. Permutations: How many ways can you order n

objects

n!2. Permutations of size k (< n): How many ways can

you choose k objects from n objects in a specific order

!

= 1 1!n k

nP n n n k

n k

3. Combinations of size k ( ≤ n): A combination of size k chosen from n objects is a subset of size k where the order of selection is irrelevant. How many ways can you choose a combination of size k objects from n objects (order of selection is irrelevant)

n k

nC

k

1 2 1!

! ! 1 2 1

n n n n kn

n k k k k k

Important Notes

1. In combinations ordering is irrelevant. Different orderings result in the same combination.

2. In permutations order is relevant. Different orderings result in the different permutations.

Rules of Probability

The additive rule

P[A B] = P[A] + P[B] – P[A B]

and

if P[A B] = P[A B] = P[A] + P[B]

The additive rule for more than two events

then

and if Ai Aj = for all i ≠ j.

11

n n

i i i ji i ji

P A P A P A A

i j ki j k

P A A A

1

1 21n

nP A A A

11

n n

i iii

P A P A

The Rule for complements

for any event E

1P E P E

Conditional Probability,Independence

andThe Multiplicative Rule

The conditional probability of A given B is defined to be:

P A BP A B

P B

if 0P B

if 0

if 0

P A P B A P AP A B

P B P A B P B

The multiplicative rule of probability

and

P A B P A P B

if A and B are independent.

This is the definition of independence

1 2 nP A A A

The multiplicative rule for more than two events

1 2 1 3 2 1P A P A A P A A A

1 2 1n n nP A A A A

Independencefor more than 2 events

Definition:

The set of k events A1, A2, … , Ak are called mutually independent if:

P[Ai1 ∩ Ai2 ∩… ∩ Aim

] = P[Ai1] P[Ai2

] …P[Aim]

For every subset {i1, i2, … , im } of {1, 2, …, k }

i.e. for k = 3 A1, A2, … , Ak are mutually independent if:

P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3],

P[A2 ∩ A3] = P[A2] P[A3],

P[A1 ∩ A2 ∩ A3] = P[A1] P[A2] P[A3]

Definition:

The set of k events A1, A2, … , Ak are called pairwise independent if:

P[Ai ∩ Aj] = P[Ai] P[Aj] for all i and j.

i.e. for k = 3 A1, A2, … , Ak are pairwise independent if:

P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3],

P[A2 ∩ A3] = P[A2] P[A3],

It is not necessarily true that P[A1 ∩ A2 ∩ A3] = P[A1]

P[A2] P[A3]

Bayes Rule for probability

P A P B AP A B

P A P B A P A P B A

Let A1, A2 , … , Ak denote a set of events such that

1 1

i ii

k k

P A P B AP A B

P A P B A P A P B A

An generalization of Bayes Rule

1 2 and k i jS A A A A A

for all i and j. Then

Random Variables

an important concept in probability

A random variable , X, is a numerical quantity whose value is determined be a random experiment

Definition – The probability function, p(x), of a random variable, X.

For any random variable, X, and any real number, x, we define

p x P X x P X x

where {X = x} = the set of all outcomes (event) with X = x.

For continuous random variables p(x) = 0 for all values of x.

Definition – The cumulative distribution function, F(x), of a random variable, X.

For any random variable, X, and any real number, x, we define

F x P X x P X x

where {X ≤ x} = the set of all outcomes (event) with X ≤ x.

Discrete Random Variables

For a discrete random variable X the probability distribution is described by the probability function p(x), which has the following properties

1

2. 1ix i

p x p x

1. 0 1p x

3. a x b

P a x b p x

Graph: Discrete Random Variable

p(x)

a x b

P a x b p x

a b

Continuous random variables

For a continuous random variable X the probability distribution is described by the probability density function f(x), which has the following properties :

1. f(x) ≥ 0

2. 1.f x dx

3. .

b

a

P a X b f x dx

Graph: Continuous Random Variableprobability density function, f(x)

1.f x dx

.b

a

P a X b f x dx

The distribution function F(x)

This is defined for any random variable, X.

F(x) = P[X ≤ x]

Properties

1. F(-∞) = 0 and F(∞) = 1.

2. F(x) is non-decreasing (i. e. if x1 < x2 then F(x1) ≤ F(x2) )

3. F(b) – F(a) = P[a < X ≤ b].

4. p(x) = P[X = x] =F(x) – F(x-)

5. If p(x) = 0 for all x (i.e. X is continuous) then F(x) is continuous.

Here limu x

F x F u

6. For Discrete Random Variables

F(x) is a non-decreasing step function with

u x

F x P X x p u

jump in at .p x F x F x F x x

0 and 1F F

0

0.2

0.4

0.6

0.8

1

1.2

-1 0 1 2 3 4

F(x)

p(x)

7. For Continuous Random Variables Variables

F(x) is a non-decreasing continuous function with

x

F x P X x f u du

.f x F x

0 and 1F F F(x)

f(x) slope

0

1

-1 0 1 2x

To find the probability density function, f(x), one first finds F(x) then .f x F x

Some Important Discrete distributions

The Bernoulli distribution

Suppose that we have a experiment that has two outcomes

1. Success (S)2. Failure (F)

These terms are used in reliability testing.Suppose that p is the probability of success (S) and q = 1 – p is the probability of failure (F)This experiment is sometimes called a Bernoulli Trial

Let 0 if the outcome is F

1 if the outcome is SX

Then 0

1

q xp x P X x

p x

The probability distribution with probability function

is called the Bernoulli distribution

0

1

q xp x P X x

p x

0

0.2

0.4

0.6

0.8

1

0 1

p

q = 1- p

The Binomial distribution

We observe a Bernoulli trial (S,F) n times.

0,1,2, ,x n xnp x P X x p q x n

x

where

Let X denote the number of successes in the n trials.Then X has a binomial distribution, i. e.

1. p = the probability of success (S), and2. q = 1 – p = the probability of failure (F)

The Poisson distribution

• Suppose events are occurring randomly and uniformly in time.

• Let X be the number of events occuring in a fixed period of time. Then X will have a Poisson distribution with parameter .

0,1,2,3,4,!

x

p x e xx

The Geometric distribution

Suppose a Bernoulli trial (S,F) is repeated until a success occurs.

X = the trial on which the first success (S) occurs.

The probability function of X is:

p(x) =P[X = x] = (1 – p)x – 1p = p qx - 1

The Negative Binomial distribution

Suppose a Bernoulli trial (S,F) is repeated until k successes occur.

Let X = the trial on which the kth success (S) occurs.

The probability function of X is:

1 , 1, 2,

1k x kx

p x P X x p q x k k kk

The Hypergeometric distribution

Suppose we have a population containing N objects.

Suppose the elements of the population are partitioned into two groups. Let a = the number of elements in group A and let b = the number of elements in the other group (group B). Note N = a + b.

Now suppose that n elements are selected from the population at random. Let X denote the elements from group A.

The probability distribution of X is

a b

x n xp x P X x

N

n

Continuous Distributions

Continuous random variables

For a continuous random variable X the probability distribution is described by the probability density function f(x), which has the following properties :

1. f(x) ≥ 0

2. 1.f x dx

3. .

b

a

P a X b f x dx

Graph: Continuous Random Variableprobability density function, f(x)

1.f x dx

.b

a

P a X b f x dx

0

0.1

0.2

0.3

0.4

0 5 10 15

1

b a

a b

f x

x0

0.1

0.2

0.3

0.4

0 5 10 15

1

b a

a b

f x

x

1

b a

a b

f x

x

Continuous Distributions

The Uniform distribution from a to b

1

0 otherwise

a x bf x b a

The Normal distribution (mean , standard deviation )

2

221

2

x

f x e

0

0.1

0.2

-2 0 2 4 6 8 10

The Exponential distribution

0

0 0

xe xf x

x

The Weibull distribution

A model for the lifetime of objects that do age.

The Weibull distribution with parameters and.

1 0x

f x x e x

The Weibull density, f(x)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1 2 3 4 5

( = 0.5, = 2)

( = 0.7, = 2)

( = 0.9, = 2)

The Gamma distribution

An important family of distributions

The Gamma distribution

Let the continuous random variable X have density function:

1 0

0 0

xx e xf x

x

Then X is said to have a Gamma distribution with parameters and .

Graph: The gamma distribution

0

0.1

0.2

0.3

0.4

0 2 4 6 8 10

( = 2, = 0.9)

( = 2, = 0.6)

( = 3, = 0.6)

Comments

1. The set of gamma distributions is a family of distributions (parameterized by and ).

2. Contained within this family are other distributionsa. The Exponential distribution – in this case = 1, the

gamma distribution becomes the exponential distribution with parameter . The exponential distribution arises if we are measuring the lifetime, X, of an object that does not age. It is also used a distribution for waiting times between events occurring uniformly in time.

b. The Chi-square distribution – in the case = /2 and = ½, the gamma distribution becomes the chi- square (2) distribution with degrees of freedom. Later we will see that sum of squares of independent standard normal variates have a chi-square distribution, degrees of freedom = the number of independent terms in the sum of squares.

Expectation

Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected value of X, E(X) is defined to be:

i ix i

E X xp x x p x

E X xf x dx

and if X is continuous with probability density function f(x)

Expectation of functionsLet X denote a discrete random variable with probability function p(x) then the expected value of X, E[g (X)] is defined to be:

x

E g X g x p x

E X g x f x dx

and if X is continuous with probability density function f(x)

Moments of a Random Variable

the kth moment of X :

kk E X

-

if is discrete

if is continuous

k

x

k

x p x X

x f x dx X

• The first moment of X , = 1 = E(X) is the center of gravity of the distribution of X.

• The higher moments give different information regarding the distribution of X.

the kth central moment of X

0 k

k E X

-

if is discrete

if is continuous

k

x

k

x p x X

x f x dx X

Moment generating functions

Definition

Let X denote a random variable, Then the moment generating function of X , mX(t) is defined by:

if is discrete

if is continuous

tx

xtX

Xtx

e p x X

m t E ee f x dx X

Properties1. mX(0) = 1

0 derivative of at 0.k thX Xm k m t t 2.

kk E X

2 33211 .

2! 3! !kk

Xm t t t t tk

3.

continuous

discrete

k

kk k

x f x dx XE X

x p x X

4. Let X be a random variable with moment generating function mX(t). Let Y = bX + a

Then mY(t) = mbX + a(t)

= E(e [bX + a]t) = eatE(e X[ bt ])

= eatmX (bt)

5. Let X and Y be two independent random variables with moment generating function mX(t) and mY(t) .

Then mX+Y(t) = E(e [X + Y]t) = E(e Xt e Yt)

= E(e Xt) E(e Yt)

= mX (t) mY (t)

6. Let X and Y be two random variables with moment generating function mX(t) and mY(t) and two distribution functions FX(x) and FY(y) respectively.

Let mX (t) = mY (t) then FX(x) = FY(x).

This ensures that the distribution of a random variable can be identified by its moment generating function

M. G. F.’s - Continuous distributions

Name

Moment generating function MX(t)

Continuous Uniform

ebt-eat

[b-a]t

Exponential t

for t <

Gamma t

for t <

2

d.f.

1

1-2t /2

for t < 1/2

Normal et+(1/2)t22

M. G. F.’s - Discrete distributions

Name

Moment generating

function MX(t)

Discrete Uniform

et

N etN-1et-1

Bernoulli q + pet Binomial (q + pet)N

Geometric pet

1-qet

Negative Binomial

pet

1-qet k

Poisson e(et-1)

Note:

The distribution of a random variable X can be described by:

probability function if is discrete1.

probability density function if is continuous

p x X

f x X

3. Moment generating function:

if is discrete

if is continuous

tx

xtX

Xtx

e p x X

m t E ee f x dx X

2. Distribution function:

if is discrete

if is continuous

u x

x

p u X

F xf u du X

Summary of Discrete Distributions

Name

probability function p(x)

Mean

Variance

Moment generating

function MX(t)

Discrete Uniform p(x) =

1N x=1,2,...,N

N+12

N2-112

et

N etN-1et-1

Bernoulli p(x) =

p x=1q x=0

p pq q + pet

Binomial p(x) =

N

x pxqN-x Np Npq (q + pet)N

Geometric p(x) =pqx-1 x=1,2,... 1p

qp2

pet

1-qet

Negative Binomial p(x) =

x-1

k-1 pkqx-k

x=k,k+1,...

kp

kqp2

pet

1-qet k

Poisson p(x) =

x

x! e- x=1,2,... e(et-1)

Hypergeometric

p(x) =

A

x

N-A

n-x

N

n

n

A

N n

A

N

1-AN

N-n

N-1 not useful

Summary of Continuous Distributions

Name

probability density function f(x)

Mean

Variance

Moment generating function MX(t)

Continuous Uniform

otherwise

bxaabxf

0

1)(

a+b2

(b-a)2

12 ebt-eat

[b-a]t

Exponential

00

0)(

x

xlexf

lx

1

12

t

for t <

Gamma

f(x) =

00

0)( f(x)

1

x

xexaG

l lxaa

2

t

for t <

2

d.f.

f(x) = (1/2)

(/2) x e-(1/2)x x ? 0

0 x < 0

1

1-2t /2

for t < 1/2

Normal f(x) =

1

2 e-(x-)2/22

2 et+(1/2)t22

Weibull

f(x) = x e-x x ? 0

0 x < 0

( )+1

( )+2 -[ ]( )+1

not avail.

Jointly distributed Random variables

Multivariate distributions

Discrete Random Variables

The joint probability function;

p(x,y) = P[X = x, Y = y]

1. 0 , 1p x y

2. , 1x y

p x y

3. , ,P X Y A p x y ,x y A

Continuous Random Variables

Definition: Two random variable are said to have joint probability density function f(x,y) if

1. 0 ,f x y

2. , 1f x y dxdy

3. , ,P X Y A f x y dxdy A

Marginal and conditional distributions

Marginal Distributions (Discrete case):

Let X and Y denote two random variables with joint probability function p(x,y) then

the marginal density of X is

,Xy

p x p x y

the marginal density of Y is

,Yx

p y p x y

Marginal Distributions (Continuous case):

Let X and Y denote two random variables with joint probability density function f(x,y) then

the marginal density of X is

,Xf x f x y dy

the marginal density of Y is

,Yf y f x y dx

Conditional Distributions (Discrete Case):

Let X and Y denote two random variables with joint probability function p(x,y) and marginal probability functions pX(x), pY(y) then

the conditional density of Y given X = x

,

Y XX

p x yp y x

p x

conditional density of X given Y = y

,

X YY

p x yp x y

p y

Conditional Distributions (Continuous Case):

Let X and Y denote two random variables with joint probability density function f(x,y) and marginal densities fX(x), fY(y) then

the conditional density of Y given X = x

,

Y XX

f x yf y x

f x

conditional density of X given Y = y

,

X YY

f x yf x y

f y

The bivariate Normal distribution

Let

2 2

1 1 1 1 2 2 2 2

1 1 2 2

1 2 2

2

,1

x x x x

Q x x

1 21

,2

1 2 21 2

1, e

2 1

Q x xf x x

where

This distribution is called the bivariate Normal distribution.

The parameters are 1, 2 , 1, 2 and

Surface Plots of the bivariate Normal distribution

2. The marginal distribution of x2 is Normal with mean 2 and standard deviation 2.

1. The marginal distribution of x1 is Normal with mean 1 and standard deviation 1.

Marginal distributions

Conditional distributions

1. The conditional distribution of x1 given x2 is Normal with:

andmean

standard deviation

11 2 212

2

x

2112 1

2. The conditional distribution of x2 given x1 is Normal with:

andmean

standard deviation

22 1 121

1

x

2221 1

Independence

Two random variables X and Y are defined to be independent if

Definition:

, X Yp x y p x p y if X and Y are discrete

, X Yf x y f x f y if X and Y are continuous

multivariate distributions

k ≥ 2

Definition

Let X1, X2, …, Xn denote n discrete random variables, then

p(x1, x2, …, xn )

is joint probability function of X1, X2, …, Xn if

1

12. , , 1n

nx x

p x x

11. 0 , , 1np x x

1 13. , , , ,n nP X X A p x x 1, , nx x A

Definition

Let X1, X2, …, Xk denote k continuous random variables, then

f(x1, x2, …, xk )

is joint density function of X1, X2, …, Xk if

1 12. , , , , 1n nf x x dx dx

11. , , 0nf x x

1 1 13. , , , , , ,n n nP X X A f x x dx dx A

The Multinomial distribution

Suppose that we observe an experiment that has k possible outcomes {O1, O2, …, Ok } independently n times.

Let p1, p2, …, pk denote probabilities of O1, O2, …, Ok respectively.

Let Xi denote the number of times that outcome Oi occurs in the n repetitions of the experiment.

is called the Multinomial distribution

1 21 1 2

1 2

! , ,

! ! !kxx x

n kk

np x x p p p

x x x

1 21 2

1 2

kxx xk

k

np p p

x x x

The joint probability function of:

The Multivariate Normal distributionRecall the univariate normal distribution

2121

2

x

f x e

the bivariate normal distribution

221

22 12

2

1 ,

2 1

x xx x y yx xx x y y

x y

f x y e

The k-variate Normal distribution

112

1 / 2 1/ 2

1 , ,

2k kf x x f e

x μ x μx

where

1

2

k

x

x

x

x

1

2

k

μ

11 12 1

12 22 2

1 2

k

k

k k kk

Marginal distributions

Definition

Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete random variables with joint probability function

p(x1, x2, …, xq, xq+1 …, xk )

1

12 1 1 , , , ,q n

q q nx x

p x x p x x

then the marginal joint probability function

of X1, X2, …, Xq is

Definition

Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function

f(x1, x2, …, xq, xq+1 …, xk )

12 1 1 1 , , , ,q q n q nf x x f x x dx dx

then the marginal joint probability function

of X1, X2, …, Xq is

Conditional distributions

Definition

Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete random variables with joint probability function

p(x1, x2, …, xq, xq+1 …, xk )

1

1 11 11 1

, , , , , ,

, ,k

q q kq q kq k q k

p x xp x x x x

p x x

then the conditional joint probability function

of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

Definition

Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function

f(x1, x2, …, xq, xq+1 …, xk )

then the conditional joint probability function

of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

Definition

11 11 1

1 1

, , , , , ,

, ,k

q q kq q kq k q k

f x xf x x x x

f x x

Definition

Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function

f(x1, x2, …, xq, xq+1 …, xk )

then the variables X1, X2, …, Xq are independent of Xq+1, …, Xk if

Definition – Independence of sets of vectors

1 1 1 1 1 , , , , , ,k q q q k q kf x x f x x f x x

A similar definition for discrete random variables.

Definition

Let X1, X2, …, Xk denote k continuous random variables with joint probability density function

f(x1, x2, …, xk )

then the variables X1, X2, …, Xk are called mutually independent if

Definition – Mutual Independence

1 1 1 2 2 , , k k kf x x f x f x f x

A similar definition for discrete random variables.

Expectation

for multivariate distributions

Definition

Let X1, X2, …, Xn denote n jointly distributed random variable with joint density function

f(x1, x2, …, xn )

then

1, , nE g X X

1 1 1, , , , , ,n n ng x x f x x dx dx

Some Rules for Expectation

1 11. , ,i i n nE X x f x x dx dx

i i i ix f x dx

Thus you can calculate E[Xi] either from the joint distribution of

X1, … , Xn or the marginal distribution of Xi.

1 1 1 12. n n n nE a X a X b a E X a E X b

The Linearity property

1 1, , , ,q q kE g X X h X X

In the simple case when k = 2

3. (The Multiplicative property) Suppose X1, … , Xq

are independent of Xq+1, … , Xk then

1 1, , , ,q q kE g X X E h X X

E XY E X E Y

if X and Y are independent

Some Rules for Variance

2 2 2Var X XX E X E X

Ex:

2

11P X k

k

32

4P X

Tchebychev’s inequality

83

9P X

154

16P X

1. Var Var Var 2Cov ,X Y X Y X Y

where Cov , = X YX Y E X Y

Cov , 0X Y

and Var Var VarX Y X Y

Note: If X and Y are independent, then

The correlation coefficient XY

Cov , Cov ,=

Var Varxy

X Y

X Y X Y

X Y

:

1. If and are independent than 0.XYX Y Properties

2. 1 1XY

if there exists a and b such thatand 1XY

1P Y bX a

whereXY = +1 if b > 0 and XY = -1 if b< 0

2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y

Some other properties of variance

1 13. Var n na X a X

2 21 1Var Varn na X a X

1 2 1 2 1 12 Cov , 2 Cov ,n na a X X a a X X

2 3 2 3 2 22 Cov , 2 Cov ,n na a X X a a X X

1 12 Cov ,n n n na a X X

2

1

Var 2 Cov ,n

i i i j i ji

a X a a X X

4. Variance: Multiplicative Rule for independent random variables

Suppose that X and Y are independent random variables, then:

2 2X YVar XY Var X Var Y Var Y Var X

Mean and Variance of averages

Let1

1 n

ii

X Xn

Let X1, … , Xn be n mutually independent random variables each having mean and standard deviation (variance 2).

Then X E X

and2

2X Var X

n

The Law of Large Numbers

Let1

1 n

ii

X Xn

Let X1, … , Xn be n mutually independent random variables each having mean

Then for any > 0 (no matter how small)

1 as P X P X n

Conditional Expectation:

Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous random variables with joint probability density function

f(x1, x2, …, xq, xq+1 …, xk )

then the conditional joint probability function

of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

Definition

11 11 1

1 1

, , , , , ,

, ,k

q q kq q kq k q k

f x xf x x x x

f x x

Let U = h( X1, X2, …, Xq, Xq+1 …, Xk )

then the Conditional Expectation of U

given Xq+1 = xq+1 , …, Xk = xk is

Definition

1 1 1 11 1 , , , , , ,k q q k qq q kh x x f x x x x dx dx

1 , , q kE U x x

Note this will be a function of xq+1 , …, xk.

A very useful rule

E U E E U y y

Var U E Var U Var E U y yy y

Then

1 1Let , , , , , ,q mU g x x y y g x y

Let (x1, x2, … , xq, y1, y2, … , ym) = (x, y) denote q + m random variables.

Functions of Random Variables

Methods for determining the distribution of functions of Random Variables

1. Distribution function method

2. Moment generating function method

3. Transformation method

Distribution function method

Let X, Y, Z …. have joint density f(x,y,z, …)Let W = h( X, Y, Z, …)First step

Find the distribution function of WG(w) = P[W ≤ w] = P[h( X, Y, Z, …) ≤ w]

Second stepFind the density function of Wg(w) = G'(w).

Use of moment generating functions

1. Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …).

2. Identify the distribution of W from its moment generating function

This procedure works well for sums, linear combinations, averages etc.

Let x1, x2, … denote a sequence of independent random variables

1 2 1 2

=n nS x x x x x xm t m t m t m t m t

SumsLet S = x1 + x2 + … + xn then

1 1 2 2 1 21 2=

n n nL a x a x a x x x x nm t m t m a t m a t m a t

Linear CombinationsLet L = a1x1 + a2x2 + … + anxn then

Arithmetic MeansLet x1, x2, … denote a sequence of independent random variables coming from a distribution with moment generating function m(t)

1 2

1 1 1

1 1 1

nx

x x xn n n

m t m t m t m t m tn n n

1 2Let , thennx x xx

n

nt

mn

The Transformation Method

Theorem

Let X denote a random variable with probability density function f(x) and U = h(X).

Assume that h(x) is either strictly increasing (or decreasing) then the probability density of U is:

1

1 ( )( )

dh u dxg u f h u f x

du du

The Transfomation Method(many variables)

Theorem

Let x1, x2,…, xn denote random variables with joint probability density function

f(x1, x2,…, xn )

Let u1 = h1(x1, x2,…, xn).u2 = h2(x1, x2,…, xn).

un = hn(x1, x2,…, xn).

define an invertible transformation from the x’s to the u’s

Then the joint probability density function of u1, u2,…, un is given by:

11 1

1

, ,, , , ,

, ,n

n nn

d x xg u u f x x

d u u

1, , nf x x J

where

1

1

, ,

, ,n

n

d x xJ

d u u

Jacobian of the transformation

1 1

1

1

detn

n n

n

dx dx

du du

dx dx

du du

Some important results

Distribution of functions of random variables

The method used to derive these results will be indicated by:

1. DF - Distribution Function Method.

2. MGF - Moment generating function method

3. TF - Transformation method

Student’s t distribution

Let Z and U be two independent random variables with:

1. Z having a Standard Normal distribution

and

2. U having a 2 distribution with degrees of freedom

then the distribution of:Z

tU

12 2

( ) 1t

g t K

12

where

2

K

is:

DF

The Chi-square distribution

Let Z1, Z2, … , Zv be v independent random variables having a Standard Normal distribution, then

has a 2 distribution with degrees of freedom.

2

1i

i

U Z

MGF

DF for = 1

for > 1

Distribution of the sample mean

Let x1, x2, …, xn denote a sample from the normal distribution with mean and variance 2.

and standard deviation x xn

then

has a Normal distribution with:

1

n

ii

xx

n

MGF

If x1, x2, …, xn is a sample from a distribution with mean , and standard deviations then if n is large the sample meanx

The Central Limit theorem

22x n

and variance

x has a normal distribution with mean

standard deviation xn

MGF

Distribution of sums of Gamma R. V.’s

Let X1, X2, … , Xn denote n independent random variables each having a gamma distribution with parameters

(,i), i = 1, 2, …, n.

Then W = X1 + X2 + … + Xn has a gamma distribution with

parameters (, 1 + 2 +… + n).

Distribution of a multiple of a Gamma R. V.

Suppose that X is a random variable having a gamma distribution with parameters (,).

Then W = aX has a gamma distribution with parameters (/a, ). MGF

MGF

Distribution of sums of Binomial R. V.’s

Let X1, X2, … , Xk denote k independent random variables each having a binomial distribution with parameters

(p,ni), i = 1, 2, …, k.

Then W = X1 + X2 + … + Xk has a binomial distribution with

parameters (p, n1 + n2 +… + nk).

Distribution of sums of Negative Binomial R. V.’s

Let X1, X2, … , Xn denote n independent random variables each having a negative binomial distribution with parameters

(p,ki), i = 1, 2, …, n.

Then W = X1 + X2 + … + Xn has a negative binomial distribution

with parameters (p, k1 + k2 +… + kn). MGF

MGF

Beyond Stats 241

Courses that can be taken

after Stats 241

Statistics

What is Statistics?

It is the major mathematical tool of scientific inference – methods for drawing conclusion from data.

Data that is to some extent corrupted by some component of random variation (random noise)

In both Statistics and Probability theory we are concerned with studying random phenomena

In probability theory

The model is known and we are interested in predicting the outcomes and observations of the phenomena.

modeloutcomes and observations

In statistics

The model is unknown

the outcomes and observations of the phenomena have been observed.

We are interested in determining the model from the observations

modeloutcomes and observations

Example - Probability

A coin is tossed n = 100 times

We are interested in the observation, X, the number of times the coin is a head.

Assuming the coin is balanced (i.e. p = the probability of a head = ½.)

100100 1 12 2

x x

p x P X xx

100100 1 for 0, 1, , 1002 xx

Example - Statistics

We are interested in the success rate, p, of a new surgical procedure.

The procedure is performed n = 100 times.

X, the number of successful times the procedure is performed is 82.

The success rate p is unknown.

If the success rate p was known.

Then

1001001

xxp x P X x p px

This equation allows us to predict the value of the observation, X.

In the case when the success rate p was unknown.

Then the following equation is still true the success rate

1001001

xxp x P X x p px

We will want to use the value of the observation, X = 82 to make a decision regarding the value of p.

Introductory Statistics Courses Non calculus Based

Stats 244.3 Stats 245.3

Calculus Based Stats 242.3

Stats 244.3 Statistical concepts and techniques including graphing of distributions, measures of location and variability, measures of association, regression, probability, confidence intervals, hypothesis testing. Students should consult with their department before enrolling in this course to determine the status of this course in their program. Prerequisite(s):A course in a social science or Mathematics A30. 

Stats 245.3An introduction to basic statistical methods including frequency distributions, elementary probability, confidence intervals and tests of significance, analysis of variance, regression and correlation, contingency tables, goodness of fit. Prerequisite(s):

MATH 100, 101, 102, 110 or STAT 103. 

Stats 242.3Sampling theory, estimation, confidence intervals, testing hypotheses, goodness of fit, analysis of variance, regression and correlation. 

Prerequisite(s):MATH 110, 116 and STAT 241. 

Stats 244 and 245

• do not require a calculus prerequisite

• are Recipe courses

Stats 242

• does require calculus and probability (Stats 241) as a prerequisite

• More theoretical class – You learn techniques for developing statistical procedures and thoroughly investigating the properties of these procedures

Statistics Courses beyond Stats 242.3

STAT 341.3

Probability and Stochastic Processes 1/2(3L-1P) Prerequisite(s): STAT 241.

Random variables and their distributions; independence; moments and moment generating functions; conditional probability; Markov chains; stationary time-series.

STAT 342.3 Mathematical Statistics 1(3L-1P) Prerequisite(s): MATH 225 or 276; STAT 241 and 242.

Probability spaces; conditional probability and independence; discrete and continuous random variables; standard probability models; expectations; moment generating functions; sums and functions of random variables; sampling distributions; asymptotic distributions. Deals with basic probability concepts at a moderately rigorous level.Note: Students with credit for STAT 340 may not take this course for credit.

STAT 344.3 Applied Regression Analysis 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics.

Applied regression analysis involving the extensive use of computer software. Includes: linear regression; multiple regression; stepwise methods; residual analysis; robustness considerations; multicollinearity; biased procedures; non-linear regression.Note: Students with credit for ECON 404 may not take this course for credit. Students with credit for STAT 344 will receive only half credit for ECON 404.

STAT 345.3 Design and Analysis of Experiments 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics.

An introduction to the principles of experimental design and analysis of variance. Includes: randomization, blocking, factorial experiments, confounding, random effects, analysis of covariance. Emphasis will be on fundamental principles and data analysis techniques rather than on mathematical theory.

STAT 346.3 Multivariate Analysis 1/2(3L-1P) Prerequisite(s): MATH 266, STAT 241, and 344 or 345.

The multivariate normal distribution, multivariate analysis of variance, discriminant analysis, classification procedures, multiple covariance analysis, factor analysis, computer applications.

STAT 347.3 Non Parametric Methods 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics.

An introduction to the ideas and techniques of non-parametric analysis. Includes: one, two and K samples problems, goodness of fit tests, randomness tests, and correlation and regression.

STAT 348.3 Sampling Techniques 1/2(3L-1P) Prerequisite(s): STAT 242 or 245 or 246 or a comparable course in statistics.

Theory and applications of sampling from finite populations. Includes: simple random sampling, stratified random sampling, cluster sampling, systematic sampling, probability proportionate to size sampling, and the difference, ratio and regression methods of estimation.

STAT 349.3 Time Series Analysis 1/2(3L-1P) Prerequisite(s): STAT 241, and 344 or 345.

An introduction to statistical time series analysis. Includes: trend analysis, seasonal variation, stationary and non-stationary time series models, serial correlation, forecasting and regression analysis of time series data.

STAT 442.3 Statistical Inference 2(3L-1P) Prerequisite(s): STAT 342.

Parametric estimation, maximum likelihood estimators, unbiased estimators, UMVUE, confidence intervals and regions, tests of hypotheses, Neyman Pearson Lemma, generalized likelihood ratio tests, chi-square tests, Bayes estimators.

STAT 443.3 Linear Statistical Models 2(3L-1P) Prerequisite(s): MATH 266, STAT 342, and 344 or 345.

A rigorous examination of the general linear model using vector space theory. Includes: generalized inverses; orthogonal projections; quadratic forms; Gauss-Markov theorem and its generalizations; BLUE estimators; Non-full rank models; estimability considerations.