Exercises of Random Variables - docencia.ac.upc.edu

Post on 14-Feb-2022

1 views 0 download

transcript

1

Exercises of Random Variables

2

Exercise

• Show that the necessary and suficient condition for a random variable on NN to have a geometric distribution is that it should have the property:

– For each natural number n and m.

)()/( nXPmXmnXP >=>+>

3

geometric distribution

• Random variable that models the number of trials until a success or failure.

• requirements :– number of trials is potentially infinite– two outcomes per trial; success and failure– outcomes statistically independent– trials have the same probability of success

L1,2,3,ifor )1()( 1 =−== − ppiXP i

4

Exercise

• Meaning of:

• Probability of waiting n minuts more given that you have waited m is independent of m.– Applications:

• Queue at the bus stop (Relate to Poison rv)• Queue at a hub or a relay (is the model correct?)• Expected survival time

– Illness, or protocol design.

)()/( nXPmXmnXP >=>+>

Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless.

5

Exercise

• Property to be shown:

• Definition: Geometric Random Variable:

• The distribution function is

)()/( nXPmXmnXP >=>+>

L1,2,3,ifor )1()( 1 =−== − ppiXP i

nn

k

kn

k

nk

ni

i

pp

pp

pppppppnXP

=−

−=

=−=−=−=> ∑∑∑∞

=

=

+

+−=

+=

11

)1(

)1( )1( )1()(Series

Geometric001)(nikc.v. 1

1

npnXP => )(

6

Exercise

• If A then B:

)()(

)()/( nXPp

pp

mXPmnXP

mXmnXP nm

mn

>===>

+>=>+>

+

npnXP => )(

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1 6 11 16 21 26 31 36 41 46 51 56 61

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.11 6 11 16 21 26 31 36 41 46 51 56 61

m n+m

npnXP => )( )()/( nXPmXmnXP >=>+>

7

Exercise

• On the other hand If B then A:

)1()()1()(then

and then

property thehas )( that Suppose

11

111

1

122

111

aaaamXPmXPmXP

aaaaaaaaa

aa

aanXP

mmm

mmmmmnmn

nm

mnn

−=−=>−−>==

====

==>

−−

−−+

+

L

)()(

)()/( nXP

mXPmnXP

mXmnXP >=>

+>=>+>

8

Example of a rv with memory

• A Pareto distribution when used to model a queue has memory:

– For each natural number n and m.– Meaning:

• Probability of waiting n minuts more given that you have waited m is greater than at the arrival.

• Richer get richer: "80-20 rule" which says that 20% of the population owns 80% of the wealth.

• The more you wait, the more you are expected to wait

)()/( nXPmXmnXP >>>+>

9

Example of a rv with memory• Examples of uses of the Pareto Distribution:

– * Frequencies of words in longer texts (a few words are used often, lots of words are used infrequently)

– * The sizes of human settlements (few cities, many hamlets/villages)– * File size distribution of Internet traffic which uses the TCP protocol (many smaller

files, few larger ones)– * Clusters of Bose-Einstein condensate near absolute zero– * The values of oil reserves in oil fields (a few large fields, many small fields)– * The length distribution in jobs assigned supercomputers (a few large ones, many

small ones)– * The standardized price returns on individual stocks– * Sizes of sand particles– * Sizes of meteorites– * Numbers of species per genus (There is subjectivity involved: The tendency to

divide a genus into two or more increases with the number of species in it)– * Areas burnt in forest fires

10

Cities and firms

• Zipf distribution of U.S. firm sizes

Axtell, R. L. (2001), "Zipf distribution of U.S. firm sizes", Science

11

Web sites visits

• Distribution of AOL users' visits to various sites on a December day in 1997

Zipf, Power-laws, and Pareto - a ranking tutorial Lada A. Adamic

Comments from B.A. Huberman

12

Word frequencies in a text

13

Speculative Prices

• Mandelbrot’s paper on long tail densities

14

Speculative Prices

• Mandelbrot’s paper on long tail densities– An interesting result

http://classes.yale.edu/fractals/Panorama/ManuFractals/Internet/Internet4.html

15

Burstiness property

• Burstiness in cities & internet trafic

The image below (composed of several satellite pictures) gives an idea of the degree of economic agglomeration in the world economy.

An introduction to geographical economics

Steven Brakman, Harry Garretsen, and Charles van Marrewijk

16

Analisys of the Pareto distribution

• We will compute the value:

• Remember the definition:

• The conditioned probability is:

0 with )( 0 >

=> α

α

mm

mXP

)/( nXmnXP >+>

)()(

)/(mXP

mnXPmXmnXP

>+>

=>+>

17

Analisys of the Pareto distribution

• We will compute the value:

• The conditioned probability is:

ααα

αα

α

α

=>>

+

+

+

+

=

++

=>+>

nn

nXPmn

mmn

m

mnm

mnm

mm

mnmn

mXmnXP

0

00

0

00

0

0

00

)(

)/(

)()(

)/(mXP

mnXPmXmnXP

>+>

=>+>

18

Analisys of the Pareto distribution• Simulation:

– Message: the longer you wait, the more you will wait

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12

Value of n

Pro

bab

ility

P(X>n+10/X>10) P(X>n)

αα

+

+

=>+>mn

mmn

mmXmnXP

00

0)/(α

=>

nn

nXP 0)(

1 and 10 00 == nm

19

Negative Binomial distribution

• Generalization of a Geometric distribution:• Def. Probability of r successes in n

Bernouilli trials. Trials independent and identically distributed.

( )

1 1

1

2 2 2 2

1r=1 (1 ) (1 )

0

1r=2 1 (1 ) (1 )

1

General case

1r

1

N N

N

N N

T

T

T T

HHH HHHH

HH HHHH

H

NT p p p p

TN

N p p pHH HHH

HH H H

p

H

T

r

T T

N

− −

− −

− → − = −

− → − − = −

−−

L1442443

LO

L

L

1 (1 )

1

r N rNp p

r

T

− → − −

M

L

20

Negative Binomial distribution

• General expression:– Probability of r successes in n Bernouilli trials.

Trials independent and identically distributed.

• Examples:– Disk redundancies– Coding theory. Error correction– Banach Matches.

1( ) (1 )

1r N rN

P X r p pr

−− = = − −

21

Banach’s Matches

• ExampleA pipe-smoking mathematician carries, at all times, 2matchboxes, 1 in his left-hand pocket and 1 in hisright-hand pocket. Each time he needs a match he isequally likely to take it from either pocket. Consider themoment when the mathematician first discovers that one ofhis matchboxes is empty. If it is assumed that bothmatchboxes initially contained N matches, what is theprobability that there are exactly k matches in he otherbox, k = 0, 1, ...,N?

See Feller

22

Banach’s Matches

• Note that it is a negative binomial, at least must have N+1 successes in one of the boxes.

• The success number (N+1) occurs at the (N+1)+(N-k)=2N-k trial.

12Prob( ) 2 ( ( 1)) 2 (1 )N N kN k

k P X N p pN

+ −− = = + = −

23

Banach’s Matches

• Applications:– Allocations of files in a disk system.– Heap management.

12Prob( ) 2 ( ( 1)) 2 (1 )N N kN k

k P X N p pN

+ −− = = + = −

24

• Models the number of successes k in a sequence of n draws from a finite population without replacement. – Size of the population: m– Observed successes: k– Favorable objects: r– Number of draws: n

Hypergeometric Random

{ }Wht

{ }Blck{ }Wht{ }Wht

{ }Wht{ }Wht

{ }Blck{ }Blck

{ }Blck

{ }Wht{ }Wht

25

• Random Variable Y=k– Size of the population: m– Observed successes: k– Favorable objects: r– Number of draws: n

Hypergeometric Random

{ }Wht

{ }Blck{ }Wht{ }Wht

{ }Wht{ }Wht

{ }Blck{ }Blck

{ }Blck

{ }Wht{ }Wht

( )

r m rk n k

P Y kmn

− − = =

26

Application: capture-recapture problem

• Lake containing m fish where m is unknown. We capture r of the fish, tag them, and return them to the lake.

• Next we capture n of the fish and observe Y, the number of tagged fish in the sample.

Y rn m

=

Size of the population: mObserved successes: kFavorable objects: rNumber of draws: n

27

Application: capture-recapture problem

• Caveat:– Diffusion problem

takes for granted that the observed value is the meanY rn m

=

( )

r m rk n k

P Y kmn

− − = =

28

Observation

* ( / )p P white observation composition of the urn=

*p

ˆ ˆ( )pf p

Urn:3 White7 Black

( / )P composition of the urn white observation

Application: capture-recapture problem

• Caveat:

– Variability arround the most probable value

takes for granted that the observed value is the meanY rn m

=

29

Example

• A computer cluster of 24 machines, at a given moment has 3 with high load processes. What is the probability of getting k loaded machines if 5 are selected at random?

3 215

( )245

k kP Y k

− = =

3 210 5 19*18*17

( 0) 0.478724 24*23*225

P Y

= = = =

!!

30

Combinatorial Methods.Lotto6/49

• Lotto6/49: 6 numbers+ 1 complementary are selected from 49. A multiple bet means selecting r from the 49 numbers– Probability of guessing k from the winning combination.– Probability of guessing k AND the complentary– Probability of guessing k AND Not the complentary

{ }1i

b { }6i

b{ }5i

b{ }4i

b{ }3i

b{ }2i

b { }7i

b

{ }1 2 3 48 49, , , , ,b b b b bL { }1 2 7, , ,i i iL { }1 2 7, , ,i i ib b bL→ →

Example taken from VÉLEZ , HERNÁNDEZ, Cálculo de Probabilidades

31

Combinatorial Methods.Lotto6/49

• Number of ways for guessing n results.

496

r rk k

− −

{ }1b { }6b{ }5b{ }4b{ }3b{ }2b

{ }1 2, , , ki i iL

Different sets with the non-selected winning numbers.

Different sets with the winning numbers

496

Pr( )496

r rk k

n

− − =

•Probability of guessing k

32

Combinatorial Methods.Lotto6/49

• Probability of guessing k AND the complentary

{ }1i

b { }6ib{ }

5ib{ }4i

b{ }3ib{ }

2ib { }

7ib

( )

49 496 1 6

Pr( )49 49

43 436 6

r r r k r rk k k k

n r k

− − − − − = = −

Different sets with the non-selected winning numbers.

The complementary can be any of the remaining r-k

33

Combinatorial Methods.Lotto6/49

• Probability of guessing k AND NOT the complentary

{ }1i

b { }6i

b{ }5i

b{ }4i

b{ }3i

b{ }2i

b { }7i

b

( )

( )( )

49 49 6 496 1 6

Pr( ) 4349 49

43 436 6

r r r k r rk k k k

n r k

− − + − − − − = = − −

Compementary cannot be

•in the marked r, •nor in (6-k) non-marked but winner numbers.

34

Binomial Random Variables

• Most important discrete probability distribution.• Model:

– Two possible outcomes: Success/Failure– Probabilities: Success=p / Failure=1-p– We compound n independent Bernouilli trials.– Define the random variable:

X=Total number of successes in n indep. Bernouilli trials

35

Binomial Random Variables

• Distribution.

X=Total number of successes in n indep. Bernouilli trials

• Model:– Two possible outcomes: Success/Failure– Probabilities: Success=p / Failure=1-p

– We compound n independent Bernouilli trials.

( ) (1 ) 0,1, 2,3 , k n knP X k p p k n

k−

= = − =

L

Successes in trialsk n

T TT T

TTT

HH H HHHH H HHH H

HHH HH

LL

OL1442443

36

Binomial Random VariablesExample

• Overbooking:– An aircraft has a capacity of 150 tickets. The airline

management sells 160 tickets in order to protect themselves against no-show passengers.

– Experience shows that the probability of a passenger being a no-show is of 0.1. The booked passengers act independengly of each other.

– Given this overbooking strategy, what is the probability that some passengers will be left out?.

Taken from H.Tijms, understanding probability

37

Binomial Random VariablesExample

• Overbooking:– The problem can be seen as 160 independent trials of a

bernouilli experiment with a success rate of 9/10, where a passenger who shows up for the flight is counted as a success.

– We define X=number of passengers that show up.– X is binomially distributed with parameters n=160, and p=9/10.

– The probability is P(X>150)

151

( 150) (1 ) 0.0359n

k n k

k

nP X p p

k−

=

> = − =

more than 150 Successes in 160 trials

T TT T T THH H HH TTL14444244443

Taken from H.Tijms, understanding probability