+ All Categories
Home > Documents > Elements of Probability and Statistics

Elements of Probability and Statistics

Date post: 23-Dec-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
48
Elements of Probability and Statistics Probability Theory provides the mathematical models of phenomena governed by chance. Examples of such phenomena include weather, lifetime of batteries, traffic congestion, stock exchange indices, laboratory measurements, etc. Statistical Theory provides the mathe- matical methods to gauge the accuracy of the probability models based on observations or data. The remaining Lectures are about this topic. Essentially, all models are wrong, but some are useful.” — George E. P. Box. Contents 1 Sets, Experiments and Probability 3 1.1 Rudiments of Set Theory ............................. 3 1.2 Experiments .................................... 5 1.3 Probability .................................... 7 1.4 Conditional Probability .............................. 11 2 Random Variables 15 2.1 Discrete Random Variables and their Distributions .............. 18 2.1.1 Discrete uniform random variables with finitely many possibilities .. 19 2.1.2 Discrete non-uniform random variables with finitely many possibilities 20 2.1.3 Discrete non-uniform random variables with infinitely many possibilities 22 2.2 Continuous Random Variables and Distributions ................ 25 3 Expectations 33 4 Tutorial for Week 1 38 4.1 Preparation Problems (Homework) ....................... 38 4.2 In Tutorial Problems ............................... 39 5 Tutorial for Week 2 43 5.1 Preparation Problems (Homework) ....................... 43 5.2 In Tutorial Problems ............................... 43 List of Tables 1 f (x) and F (x) for the sum of two independent tosses of a fair die RV X . .. 21 2 DF Table for the Standard Normal Distribution................. 47 3 Quantile Table for the Standard Normal Distribution.............. 48 1
Transcript
Page 1: Elements of Probability and Statistics

Elements of Probability and StatisticsProbability Theory provides the mathematical models of phenomena governed by chance.Examples of such phenomena include weather, lifetime of batteries, traffic congestion, stockexchange indices, laboratory measurements, etc. Statistical Theory provides the mathe-matical methods to gauge the accuracy of the probability models based on observations ordata. The remaining Lectures are about this topic.

“Essentially, all models are wrong, but some are useful.” — George E. P. Box.

Contents

1 Sets, Experiments and Probability 3

1.1 Rudiments of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Random Variables 15

2.1 Discrete Random Variables and their Distributions . . . . . . . . . . . . . . 18

2.1.1 Discrete uniform random variables with finitely many possibilities . . 19

2.1.2 Discrete non-uniform random variables with finitely many possibilities 20

2.1.3 Discrete non-uniform random variables with infinitely many possibilities 22

2.2 Continuous Random Variables and Distributions . . . . . . . . . . . . . . . . 25

3 Expectations 33

4 Tutorial for Week 1 38

4.1 Preparation Problems (Homework) . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 In Tutorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Tutorial for Week 2 43

5.1 Preparation Problems (Homework) . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 In Tutorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

List of Tables

1 f(x) and F (x) for the sum of two independent tosses of a fair die RV X. . . 21

2 DF Table for the Standard Normal Distribution. . . . . . . . . . . . . . . . . 47

3 Quantile Table for the Standard Normal Distribution. . . . . . . . . . . . . . 48

1

Page 2: Elements of Probability and Statistics

List of Figures

1 f(x) = P (x) = 16

and F (x) of the fair die toss RV X of Example 2.4 . . . . . 192 f(x) and F (x) of an astragali toss RV X of Example 2.6 . . . . . . . . . . . 213 f(x) and F (x) of RV X for the sum of two independent tosses of a fair die. . 214 Probability density function of the volume of rain in cubic inches over the

lecture theatre tomorrow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 PDF and DF of Normal(µ, σ2) RV for different values of µ and σ2 . . . . . . 30

2

Page 3: Elements of Probability and Statistics

1 Sets, Experiments and Probability

1.1 Rudiments of Set Theory

1. A set is a collection of distinct objects or elements and we enclose the elements bycurly braces. For example, the collection of the two letters H and T is a set and wedenote it by H,T. But the collection H,T,T is not a set (do you see why? thinkdistinct!). Also, recognise that there is no order to the elements in a set, i.e. H,T isthe same as T,H.

2. We give convenient names to sets. For example, we can call the set H,T by A andwrite A = H,T to mean it.

3. If a is an element of A, we write a ∈ A. For example, if A = 1, 2, 3, then 1 ∈ A.

4. If a is not an element of A, we write a /∈ A. For example, if A = 1, 2, 3, then13 /∈ A.

5. We say that a set A is a subset of a set B if every element of A is also an element ofB and write A ⊆ B. For example, 1, 2 ⊆ 1, 2, 3, 4.

6. We say that a set A is not a subset of a set B if at least one element of A is not anelement of B and write A * B. For example, 1, 2 is not a subset of 1, 3, 4 since2 ∈ 1, 2 but 2 /∈ 1, 3, 4 and write 1, 2 * 1, 2, 3, 4 to mean this.

7. We say a set A is equal to a set B and write A = B if and only if A ⊆ B and B ⊆ A

8. The union A ∪B of A and B consists of elements that are in A or in B or in both Aand B. For example, if A = 1, 2 and B = 3, 2 then A ∪B = 1, 2, 3.

9. The intersection A ∩ B of A and B consists of elements that are in both A and B.For example, if A = 1, 2 and B = 3, 2 then A ∩B = 2.

10. The empty set contains no elements and it is the collection of nothing. It is denotedby ∅ = .

11. Given some universal set, say Ω, the Greek letter Omega, the Complement of a setA denoted by Ac is the set of all elements in Ω that are not in A. For example, ifΩ = H,T and A = H then Ac = T. Note that for any set A ⊆ Ω:

Ac ∩ A = ∅, A ∪ Ac = Ω, Ωc = ∅, ∅c = Ω .

12. When we have more than two sets, we can define unions and intersections similarly.The union of m sets

m⋃j=1

Aj = A1 ∪ A2 ∪ · · · ∪ Am

3

Page 4: Elements of Probability and Statistics

consists of elements that are in at least one of the m sets A1, A2, . . . , Am, and theunion of infinitely many sets

∞⋃j=1

Aj = A1 ∪ A2 ∪ · · · ∪ · · ·

consists of elements that are in at least one of the sets A1, A2, . . ..Similarly, the intersection

m⋂j=1

Aj = A1 ∩ A2 ∩ · · · ∩ Am

of m sets consists of elements that are in each of the m sets and the intersection ofinfinitely many sets

∞⋂j=1

Aj = A1 ∩ A2 ∩ · · ·

consists of elements that are in each of the infinitely many sets.

Exercise 1.1 Let Ω = 1, 2, 3, 4, 5, 6, A = 1, 3, 5 and B = 2, 4, 6. By using thedefinitions of sets and set operations find the following sets:

Ac = Bc = Ωc = ∅c =1c = A ∪B = A ∩B = A ∪ Ω =A ∩ Ω = B ∩ Ω = B ∪ Ω = A ∪ Ac =B ∪Bc = etc.

Venn diagrams are visual aids for set operations.

Example 1.1 For three sets A, B and C, the Venn diagrams for A∪B, A∩B and A∩B∩Care:

A

B

Ω

(a) A ∪B

A

B

Ω

(b) A ∩B

A

C

Ω

B

(c) A ∩B ∩ C

4

Page 5: Elements of Probability and Statistics

Exercise 1.2 Let A = 1, 3, 5, 7, 9, 11, B = 1, 2, 3, 5, 8, 13 and C = 1, 2, 4, 8, 16, 32denote three sets. Let us use a Venn diagram to visualise these three sets and their intersec-tions. Can you mark which sets correspond to A, B and C in the figure below.

1.2 Experiments

Definition 1.1 An experiment is an activity or procedure that produces distinct, well-defined possibilities called outcomes. The set of all outcomes is called the sample spaceand is denoted by Ω, the upper-case Greek letter Omega. We denote a typical outcome in Ωby ω, the lower-case Greek letter omega, and a typical sequence of possibly distinct outcomesby ω1, ω2, ω3, . . ..

Example 1.2 Ω = Defective, Non-defective if our experiment is to inspect a light bulb.

Example 1.3 Ω = Heads, Tails if our experiment is to note the outcome of a coin toss.

In Examples 1.2 and 1.3, Ω only has two outcomes and we can refer to the sample space ofsuch two-outcome experiments generically as Ω = ω1, ω2. For instance, the two outcomesof of Example 1.2 are ω1 = Defective and ω2 = Non-defective while those of Example 1.3 areω1 = Heads and ω2 = Tails.

Example 1.4 If our experiment is to roll a die whose faces are marked with the six numericalsymbols or numbers 1, 2, 3, 4, 5, 6 then there are six outcomes corresponding to the numberthat shows on the top. Thus, the sample space Ω for this experiment is 1, 2, 3, 4, 5, 6.

5

Page 6: Elements of Probability and Statistics

Exercise 1.3 Suppose our experiment is to observe whether it will rain or shine tomorrow.What is the sample space for this experiment? Answer: Ω = .

The subsets of Ω are called events. The outcomes ω1, ω2, . . ., when seen as subsets of Ω,such as, ω1, ω2, . . ., are simple events.

Example 1.5 In our roll a die experiment of Example 1.4 with Ω = 1, 2, 3, 4, 5, 6, the setof odd numbered outcomes A = 1, 3, 5 or the set of even numbered outcomes B = 2, 4, 6are examples of events. The simple events are 1, 2, 3, 4, 5, and 6.

Example 1.6 Consider a generic die-tossing experiment by a human experimenter. Clearly,Ω = and A = , B = and C = ω3 are examplesof events. This experiment could correspond to rolling a die whose faces are:

1. sprayed with six different scents (nose!), or

2. studded with six distinctly flavoured candy (tongue!), or

3. contoured with six distinct bumps and pits (touch!), or

4. acoustically discernible at six different frequencies (ears!), or

5. painted with six different colours (eyes!), or

6. marked with six different numbers 1, 2, 3, 4, 5, 6 (eyes!), or , . . .

This example is meant to concretely convince you that an experiment’s sample space ismerely a collection of distinct elements called outcomes and these outcomes have to bediscernible in some well-specified sense to the experimenter!

Definition 1.2 A trial is a single performance of an experiment and it results in an out-come.

Example 1.7 We call a single roll of a die as a trial.

Example 1.8 We call a single toss of a coin as a trial.

An experimenter often performs more that one trial. Repeated trials of an experiment formsthe basis of science and engineering as the experimenter learns about the phenomenon byrepeatedly performing the same mother experiment with possibly different outcomes. Thisrepetition of trials in fact provides the very motivation for the definition of probability in§ 1.3.

Definition 1.3 An n-product experiment is obtained by repeatedly performing n trialsof a mother experiment.

6

Page 7: Elements of Probability and Statistics

Example 1.9 Suppose we toss a coin twice by performing two trials of the coin toss ex-periment of Example 1.3 and use the short-hand H and T to denote the outcome of Headsand Tails, respectively. Then our sample space Ω = HH,HT,TH,TT. Note that this is the2-product experiment of the coin toss mother experiment.

Exercise 1.4 What is the event that at least one Heads occurs in the 2-product experimentof Example 1.9, i.e., tossing a fair coin twice?

Exercise 1.5 What is the sample space of the 3-product experiment of the coin toss exper-iment, i.e., tossing a fair coin thrice?

Definition 1.4 An ∞-product experiment is defined as

limn→∞

n-product experiment of some mother experiment .

Remark 1.5 Loosely speaking, a set that can be enumerated or tagged uniquely by naturalnumbers N = 1, 2, 3, . . . is said to be countably infinite or contain countably manyelements. Some examples of such sets include any finite set, the set of natural numbersN = 1, 2, 3, . . ., the set of non-negative integers 0, 1, 2, 3, . . ., the set of all integers Z =. . . ,−3,−2,−1, 0, 1, 2, 3, . . ., the set of all rational numbers Q = p/q : p, q ∈ Z, q 6= 0,but the set of real numbers R = (−∞,∞) is uncountably infinite.

Example 1.10 The sample space Ω of the∞-product experiment of tossing a coin infinitelymany times has uncountably infinitely many elements and is in bijection with all binarynumbers in the unit interval [0, 1] — just replace H with 1 and T with 0. We cannotenumerate all outcomes in Ω but can show some outcomes:

Ω = HHHH · · · ,HTHH · · · ,THHH · · · ,TTHH · · · , . . . ,. . . ,TTTT · · · ,HTTT · · · ,THTT · · · ,HHTT · · · , . . . , . . . .

1.3 Probability

Definition 1.6 Probability is a function P that assigns real numbers to events, whichsatisfies the following four Axioms:

Axiom (1): for any event A, 0 ≤ P (A) ≤ 1

Axiom (2): if Ω is the sample space then P (Ω) = 1

Axiom (3): if A and B are disjoint, i.e., P (A ∩B) = ∅ then

P (A ∪B) = P (A) + P (B)

7

Page 8: Elements of Probability and Statistics

Axiom (4): if A1, A2, . . . is an infinite sequence of pairwise disjoint or mutuallyexclusive events, i.e., Ai ∩ Aj = ∅ whenever i 6= j, then

P

(∞⋃i=1

Ai

)=∞∑i=1

P (Ai)

These axioms are merely assumptions that are justified and motivated by the frequencyinterpretation of probability in n-product experiments as n tends to infinity,which states that if we repeat an experiment a large number of times then the fraction oftimes the event A occurs will be close to P (A). To be precise, if we let N(A, n) be thenumber of times A occurs in the first n trials then

P (A) = limn→∞

N(A, n)/n

Given P (A) = limn→∞N(A, n)/n, Axiom (1) simply affirms that the fraction of times a givenevent A occurs must be between 0 and 1. If Ω has been defined properly to be the set ofALL possible outcomes, then Axiom (2) simply affirms that the fraction of times somethingin Ω happens is 1. To explain Axiom (3), note that if A and B are disjoint then

N(A ∪B, n) = N(A, n) +N(B, n)

since A ∪ B occurs if either A or B occurs but it is impossible for both to occur. Dividingboth sides of the previous equality by n and letting n→∞, we arrive at Axiom (3).Axiom (3) implies that Axiom (4) holds for a finite number of sets. In many cases the samplespace is finite so Axiom (4) is not relevant or necessary. Axiom (4) is a new assumption forinfinitely many sets as it does not simply follow from Axiom (3) any longer. Axiom (4) ismore difficult to motivate but without it the theory of probability becomes more difficultand less useful, so we will impose this assumption on utilitarian grounds.

The following three Theorems are merely properties of probability.

Theorem 1.7 Complementation Rule. The probability of an event A and its comple-ment Ac in a sample space Ω, satisfy

P (Ac) = 1− P (A) . (1)

Proof: By the definition of complement, we have Ω = A ∪ Ac and A ∩ Ac = ∅. Hence byAxioms 2 and 3,

1 = P (Ω) = P (A) + P (Ac), thus P (Ac) = 1− P (A).

8

Page 9: Elements of Probability and Statistics

Example 1.11 Recall the coin toss experiment of Example 1.3 with Ω = Heads,Tails.Suppose that our coin happens to be fair with P (Heads) = 1/2. Since, Tailsc = Heads,we can apply the complementation rule to find the probability of observing a Tails fromP (Heads) as follows:

P (Tails) = 1− P (Heads) =1

2.

Theorem 1.8 Addition Rule for Mutually Exclusive Events. For mutually exclusiveor pair-wise disjoint events A1, . . . , Am in a sample space Ω,

P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am) = P (A1) + P (A2) + P (A3) + · · ·+ P (Am) . (2)

Proof: This is a consequence of applying Axiom (3) repeatedly:

P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am) = P (A1 ∪ (A2 ∪ · · · ∪ Am))

= P (A1) + P (A2 ∪ (A3 · · · ∪ Am)) = P (A1) + P (A2) + P (A3 · · · ∪ Am) = · · ·= P (A1) + P (A2) + P (A3) + · · ·+ P (Am) .

Example 1.12 Let us observe the number on the first ball that pops out in a New ZealandLotto trial. There are forty balls labelled 1 through 40 for this experiment and so the samplespace Ω = 1, 2, 3, . . . , 39, 40. Because the balls are vigorously whirled around inside theLotto machine before the first one pops out, we can model each ball to pop out first withthe same probability. So, we assign each outcome ω ∈ Ω the same probability of 1

40, i.e., our

probability model for this experiment is:

P (ω) =1

40, for each ω ∈ Ω = 1, 2, 3, . . . , 39, 40 .

NOTE: we sometimes abuse notation and write P (ω) instead of the more accurate butcumbersome P (ω) when writing down probabilities of simple events.Now, let’s check if Axiom (1) is satisfied for simple events in our model for this Lottoexperiment,

0 ≤ P (1) = P (2) = · · · = P (40) =1

40≤ 1

Is Axiom (3) satisfied?For example, disjoint simple events 1 and 2

P (1, 2) = P (1 ∪ 2) = P (1) + P (2) =1

40+

1

40=

2

40

Is Axiom (2) satisfied?Yes, by Equation (2) of the addition rule for mutually exclusive events (Theorem 1.8):

P (Ω) = P (1, 2, . . . , 40) = P

(40⋃i=1

i

)=

40∑i=1

P (i) =1

40+

1

40+ · · ·+ 1

40= 1

9

Page 10: Elements of Probability and Statistics

(a) 1114 NZ Lotto draw frequency from 1987 to 2008. (b) 1114 NZ Lotto draw relative frequency from 1987 to 2008.

Recommended Activity 1.1 Explore the following web sites to learn more about NZ andBritish Lotto. The second link has animations of the British equivalent of NZ Lotto.http://lotto.nzpages.co.nz/

http://understandinguncertainty.org/node/39

Theorem 1.9 Addition Rule for Two Arbitrary Events. For events A and B in asample space,

P (A ∪B) = P (A) + P (B)− P (A ∩B) . (3)

Proof:

P (A ∪B) = P (A ∪ (B ∩ Ac))= P (A) + P (B ∩ Ac) by Axiom (3) and disjointness

= P (A) + P (B)− P (A ∩B)

The last equality P (B ∩Ac) = P (B)−P (A∩B) is due to Axiom (3) and the disjoint unionof B = (B ∩Ac)∪ (A∩B) giving P (B) = P (B ∩Ac) +P (A∩B). It is easy to see this witha Venn diagram.

Exercise 1.6 In English language text, the twenty six letters in the alphabet occur withthe following frequencies:

E 13% R 7.7% A 7.3% H 3.5% F 2.8% M 2.5% W 1.6% X 0.5% J 0.2%T 9.3% O 7.4% S 6.3% L 3.5% P 2.7% Y 1.9% V 1.3% K 0.3% Z 0.1%N 7.8% I 7.4% D 4.4% C 3% U 2.7% G 1.6% B 0.9% Q 0.3%

Suppose you pick one letter at random from a randomly chosen English book from ourcentral library with Ω = A,B,C, . . . ,Z – ignoring upper/lower cases, then what is theprobability of the following events?

(a) P (Z) =

(b) What is the most likely outcome?

(c) P (‘picking any letter’) = P (Ω) =

10

Page 11: Elements of Probability and Statistics

(d) P (E,Z) = — by Axiom (3)

(e) P (‘picking a vowel’) =by Equation (2) of addition rule for mutually exclusive events (Theorem 1.8).

(f) P (‘picking any letter in the word WAZZZUP’) = by Equa-tion (2) of addition rule for mutually exclusive events (Theorem 1.8).

(g) P (‘picking any letter in the word WAZZZUP or a vowel’) == 42.2%

by Equation (3) of addition rule for two arbitrary events (Theorem 1.9).

1.4 Conditional Probability

Conditional probability allows us to make decisions from partial information about an ex-periment.

Definition 1.10 The probability of an event B under the condition that an event A occursis called the conditional probability of B given A and is denoted by P (B|A). In this caseA serves as a new (reduced) sample space, and that probability is the fraction of P (A) whichcorresponds to A ∩B. Thus,

P (B|A) =P (A ∩B)

P (A), if P (A) 6= 0 . (4)

Similarly, the conditional probability of A given B is

P (A|B) =P (A ∩B)

P (B), if P (B) 6= 0 . (5)

Conditional Probability is a probability and therefore all four Axioms of probabilityalso hold for conditional probability of events given the conditioning event A has P (A) > 0.

Axiom (1): For any event B, 0 ≤ P (B|A) ≤ 1.

Axiom (2): P (Ω|A) = 1.

Axiom (3): For any two disjoint events B1 and B2, P (B1∪B2|A) = P (B1|A)+P (B2|A).

Axiom (4): For mutually exclusive or pairwise-disjoint events, B1, B2, . . .,

P (B1 ∪B2 ∪ · · · |A) = P (B1|A) + P (B2|A) + · · · .

Note that the complementation and addition rules also follow for conditional probability.

1. complementation rule for conditional probability:

P (B|A) = 1− P (Bc|A) . (6)

11

Page 12: Elements of Probability and Statistics

2. addition rule for two arbitrary events B1 and B2:

P (B1 ∪B2|A) = P (B1|A) + P (B2|A)− P (B1 ∩B2|A) . (7)

Theorem 1.11 Multiplication Rule. If A and B are events and P (A) 6= 0, P (B) 6= 0,then

P (A ∩B) = P (A)P (B|A) = P (B)P (A|B) . (8)

Proof: Solving for P (A ∩ B) in the Definitions (4) and (5) of conditional probability, weobtain Equation (8) of the above theorem.

Example 1.13 Suppose the NZ All Blacks team is playing in a four team Rugby match. Inthe first round they have a tough opponent that they will beat 40% of the time but if theywin that game they will play against an easy opponent where their probability of success is0.8. What is the probability that they will win the tournament?

If A and B are the events of victory in the first and second games, respectively, then P (A) =0.4 and P (B|A) = 0.8, so by multiplication rule, the probability that they will win thetournament is:

P (A ∩B) = P (A)P (B|A) = 0.4× 0.8 = 0.32 .

Exercise 1.7 In Example 1.13, what is the probability that the All Blacks will win the firstgame but loose the second?

Definition 1.12 Independent events. If events A and B are such that

P (A ∩B) = P (A)P (B),

they are called independent events. Assuming P (A) 6= 0, P (B) 6= 0, we have P (A|B) =P (A), and P (B|A) = P (B). This means that the probability of A does not depend on theoccurrence or nonoccurence of B, and conversely. This justifies the term “independent”.

12

Page 13: Elements of Probability and Statistics

Example 1.14 Suppose you toss a fair coin twice such that the first toss is independent ofthe second. Then,

P (HT) = P (Heads on the first toss ∩ Tails on the second toss) = P (H)P (T) =1

2× 1

2=

1

4.

Similarly, P (HH) = P (TH) = P (TT) = 12× 1

2= 1

4. Thus, P (ω) = 1

4for every ω in the sample

space Ω = HT,HH,TH,TT.

Accordingly, three events A, B, C are independent if and only if

P (A ∩B) = P (A)P (B),

P (B ∩ C) = P (B)P (C),

P (C ∩ A) = P (C)P (A),

P (A ∩B ∩ C) = P (A)P (B)P (C).

Example 1.15 Suppose you independently toss a fair die thrice. What is the probabilityof getting an even outcome in all three trials?Let Ei be the event that the outcome is an even number on the i-th trial. Then, theprobability of getting an even number in all three trials is:

P (E1 ∩ E2 ∩ E3) = P (E1)P (E2)P (E3) = (P (2, 4, 6))3 = (P (2 ∪ 4 ∪ 6))3

= (P (2) + P (4) + P (6))3 =

(1

6+

1

6+

1

6

)3

=

(3

6

)3

=

(1

2

)3

=1

8.

Definition 1.13 Independence of n Events. Similarly, n events A1, . . . , An are calledindependent if

P (A1 ∩ · · · ∩ An) = P (A1)P (A2) · · ·P (An) .

Example 1.16 Suppose you toss a fair coin independently m times. Then each of the 2m

possible outcomes in the sample space Ω has equal probability of 12m due to independence.

Theorem 1.14 Total probability theorem. Suppose B1 ∪ B2 · · · ∪ Bn is a sequence ofevents with positive probability that partition the sample space, i.e., B1 ∪ B2 · · · ∪ Bn = Ω,Bi ∩Bj = ∅ for i 6= j, then

P (A) =n∑i=1

P (A ∩Bi) =n∑i=1

P (A|Bi)P (Bi) . (9)

Proof: The first equality is due to addition rule for mutually exclusive events, A ∩ B1, A ∩B2, . . . , A ∩Bn and the second equality is due to multiplication rule.

13

Page 14: Elements of Probability and Statistics

Exercise 1.8 An well-mixed urn contain five red and ten black balls. We draw two ballsfrom the urn without replacement. What is the probability that the second ball drawn isblack?

Theorem 1.15 Bayes theorem.

P (A|B) =P (A)P (B|A)

P (B). (10)

Proof: The proof is a consequence of the definition of conditional probability and themultiplication rule.

P (A|B) =P (A ∩B)

P (B)=P (B ∩ A)

P (B)=P (B|A)P (A)

P (B)=P (A)P (B|A)

P (B).

Exercise 1.9 Approximately 1% of women aged 40–50 have breast cancer. A woman withbreast cancer has a 90% chance of a positive test from a mammogram, while a womanwithout breast cancer has a 10% chance of a false positive result from the test. What isthe probability that a woman indeed has breast cancer given that she just had a positive test?

14

Page 15: Elements of Probability and Statistics

2 Random Variables

We are used to traditional variables such as x as an “unknown” in the equation:

x+ 3 = 7 ,

where we can solve for x = 7 − 3 = 4. Another example is to use traditional variables torepresent geometric objects such as a line:

y = 3x− 2 ,

where the variable y for the y-axis is determined by the value taken by the variable x, as xvaries over the real line R = (−∞,∞). The variables we have used to represent sequencessuch as:

an∞n=1 = a1, a2, a3, . . . ,

are also traditional. When we wrote functions of a variable, such as x, in:

f(x) =x

x+ 1, for x ≥ 0 ,

the argument x is also a traditional variable. In fact, all of Calculus you have been taughtis by means of such traditional variables.

Question: What is common to all these variables above, such as, x, y, a1, a2, a3, . . . , f(x)?Answer: They are instances of deterministic variables, that is, these traditional variablestake a fixed or deterministic value when we can solve for them.

We need a new kind of variable to deal with real-world situations where the same variablemay take different values in a non-deterministic manner. Random variables do this job forus. Random variables, unlike traditional deterministic variables can take a bunch of differentvalues!In fact, random variables are actually functions! They take you from the “world of randomprocesses and phenomena” to the world of real numbers. In other words, a random variableis a numerical value determined by the outcome of the experiment.

15

Page 16: Elements of Probability and Statistics

Definition 2.1 A Random variable or RV is a function from the sample space Ω to theset of real numbers R:

X(ω) : Ω→ R ,

such that, for every real number x, the corresponding set ω ∈ Ω : X(ω) ≤ x, i.e. the setof outcomes whose numerical value is less than or equal to x, is an event. The probabilityof such events is given by the function F (x) : R → [0, 1] called the distribution functionor DF of the random variable X:

F (x) = P (X ≤ x) = P (ω : X(ω) ≤ x) , for any x ∈ R . (11)

NOTE: Distribution function or DF is sometimes called cumulative distribution function or CDFin pre-calculus treatments of the subject. We will avoid the CDF nomenclature in our treatment.

Example 2.1 Recall the rain or shine experiment of Example 1.3 with sample space Ω =rain, shine. We can associate a random variable X with this experiment as follows:

X(ω) =

1, if ω = rain

0, if ω = shine

Thus, X is 1 if it will it rain tomorrow and 0 otherwise. Note that another equally validdiscrete random variable, say Y , for this experiment is:

Y (ω) =

π, if ω = rain√

2, if ω = shine

A random variable can be chosen to assign each outcome ω ∈ Ω to any real number as theexperimenter desires.

Recall the experiments of Example 1.6 that involved smelling, tasting, touching, hearing,or seeing to discern between outcomes. It becomes very difficult to communicate, processand make decisions based on outcomes of experiments that are discerned in this manner andeven more difficult to record them unambiguously. This is where real numbers can give us ahelping hand.Data are typically random variables that act as numerical placeholders for out-comes of an experiment about some real-world random process or phenomenon. We saidthat the random variable can take one of many values, but we cannot be certain of whichvalue it will take. However, we can make probabilistic statements about the value xthe random variable X will take. This can be done with probabilities.

Theorem 2.2 Probability that the RV X takes a value x in the half-open interval (a, b],i.e., a < x ≤ b, is:

P (a < X ≤ b) = F (b)− F (a) . (12)

16

Page 17: Elements of Probability and Statistics

Proof: Since the events (X ≤ a) = ω : X(ω) ≤ a and (a < X ≤ b) = ω : a < X(ω) ≤ bare mutually exclusive or disjoint events whose union is the event (X ≤ b) = ω : X(ω) ≤ b,Axiom (3) of Definition 1.6 of probability and by Equation (11) in Definition 2.1 of DF,

F (b) = P (X ≤ b) = P (X ≤ a) + P (a < X ≤ b) = F (a) + P (a < X ≤ b) .

Subtraction of F (a) from both sides of the above equation yields Equation (12).

Example 2.2 Recall the fair coin toss experiment of Example 1.11 with Ω = H,T andP (H) = P (T) = 1/2. We can associate a random variable X with this experiment as follows:

X(ω) =

1, if ω = H

0, if ω = T

Note that this choice of values for X equates to counting the number of H in one trial of thefair coin toss experiment. The DF for X is:

F (x) = P (X ≤ x) = P (ω : X(ω) ≤ x) =

P (∅) = 0, if −∞ < x < 0

P (T) = 12, if 0 ≤ x < 1

P (H,T) = P (Ω) = 1, if 1 ≤ x <∞

And the probability that X takes on a specific value x is:

P (X = x) = P (ω : X(ω) = x) =

P (∅) = 0, if x /∈ 0, 1P (T) = 1

2, if x = 0

P (H) = 12, if x = 1

All we are really saying above in detail to show the underlying definitions just amounts to:

P (X = x) =

12

if x = 012

if x = 1

0 otherwise

Example 2.3 Now let us define at a discrete random variable that can take one of sixpossible values from 1, 2, 3, 4, 5, 6 in the toss a fair die experiment. This X gives thenumber that shows up on the top face as we roll a fair six-faced die whose faces are labelled bynumerical symbols 1, 2, 3, 4, 5, 6. Note that here Ω is the set of numerical symbols that labeleach face while each of these symbols are associate with the real number x ∈ 1, 2, 3, 4, 5, 6.Thus,

X(ω) =

1, if ω is the outcome that the die lands with the face labelled by 1 on top

2, if ω is the outcome that the die lands with the face labelled by 2 on top

3, if ω is the outcome that the die lands with the face labelled by 3 on top

4, if ω is the outcome that the die lands with the face labelled by 4 on top

5, if ω is the outcome that the die lands with the face labelled by 5 on top

6, if ω is the outcome that the die lands with the face labelled by 6 on top

17

Page 18: Elements of Probability and Statistics

Example 2.4 Consider the random variable X of the Toss a fair dice experiment of Ex-ample 2.3 with P (X = x) = P (ω : X(ω) = x) = 1

6for each x ∈ 1, 2, 3, 4, 5, 6 and 0

otherwise. The probability that X ≤ 3 can be obtained by

F (3) = P (X ≤ 3) = P (ω : X(ω) ≤ 3) = P (1, 2, 3) = P (1) + P (2) + P (3) =3

6

Exercise 2.1 Similarly, can you complete the following probability statement about thevalue x the random variable X of Example 2.4 will take?

P (X = 1) = P (X = 2) = P (X = 3) = P (X = 4) = P (X = 5) = P (X = 6) = .

2.1 Discrete Random Variables and their Distributions

Definition 2.3 A random variable X and its distribution are discrete if X assumes onlyfinitely many or at most countably many values x1, x2, x3, . . . , called the possible valuesof X, with positive probabilities p1 = P (X = x1), p2 = P (X = x2), p3 = P (X = x3), . . . .

A discrete RV X takes on at most countably many values in R. The rain or shine randomvariables of Example 2.1 and the fair coin toss RV of Example 2.2 can only take two possiblevalues while the toss a fair die RV of Example 2.4 can only take six possible values. Thus,they are examples of discrete random variables. We can study discrete random variables ina general setting.

Definition 2.4 The probability mass function or PMF f of a discrete RV X is:

f(x) = P (X = x) = P (ω : X(ω) = x) =

pi if x = xi, where i = 1, 2, . . .

0 otherwise. (13)

From this we get the values of the Distribution Function F (x) by simply taking sums,

F (x) =∑xi≤x

f(xi) =∑xi≤x

pi , (14)

where for any given x, we sum all the probabilities pi for which xi is smaller than or equalto that of x. Thus, DF F (x) for discrete random variable is a step function with upwardjumps of size pi at the possible values xi of X and constant in between.Out of this class of discrete random variables we will define specific kinds as they arise oftenin applications. We classify discrete random variables into three types for convenience asfollows:

• Discrete uniform random variables with finitely many possibilities

• Discrete non-uniform random variables with finitely many possibilities

• Discrete non-uniform random variables with (countably) infinitely many possibilities

18

Page 19: Elements of Probability and Statistics

2.1.1 Discrete uniform random variables with finitely many possibilities

Definition 2.5 Discrete Uniform Random Variable. We say that a discrete RV X isuniformly distributed over k possible values x1, x2, . . . , xk if its PMF is:

f(xi) =

pi = 1

kif x ∈ x1, x2, . . . , xk ,

0 otherwise .(15)

The DF for discrete uniform RV X is:

F (x) =∑xi≤x

f(xi) =∑xi≤x

pi =

0 if −∞ < x < 1 ,1k

if 1 ≤ x < 2 ,2k

if 2 ≤ x < 3 ,...k−1k

if k − 1 ≤ x < k ,

1 if k ≤ x <∞ .

(16)

Example 2.5 The fair die toss RV X of Example 2.4 is a discrete uniform RV with possiblevalues x1, x2, x3, x4, x5, x6 = 1, 2, 3, 4, 5, 6. Its PMF and DF are given by substitutingk = 6 in Equations 15 and 16, respectively. These functions are depicted in Figure 2.5. Payattention to the and • in the plot to relate them to the Equations 15 and 16. The , • andthe dotted lines are used to depict how the value of f(x) and F (x) jump as x varies.

Figure 1: f(x) = P (x) = 16

and F (x) of the fair die toss RV X of Example 2.4

Exercise 2.2 Plot the PMF and DF in detail along with , • and the dotted lines for thefair coin toss RV X of Example 2.2 and convince yourself that it is also a discrete uniformRV.

19

Page 20: Elements of Probability and Statistics

Exercise 2.3 Recall the first ball that pops out in a New Zealand Lotto trial of Example 1.12.First associate a RV X with this experiment that turns the integer-symbolised ball labelsinto real numbers in the set of possible values 1, 2, 3, . . . , 39, 40. Then, give the PMF andDF for X and know how the plot should look.

Two useful formulae for discrete distributions are readily obtained as follows. For the prob-ability corresponding to intervals we have

P (a < X ≤ b) = F (b)− F (a) =∑

a<xi≤b

pi . (17)

This is the sum of all probabilities pi for which xi satisfies a < xi ≤ b. From this andP (Ω) = 1 we obtain the following formula that the sum of all probabilities is 1.∑

i

pi = 1 . (18)

2.1.2 Discrete non-uniform random variables with finitely many possibilities

Example 2.6 Astragali. Board games involving chance were known in Egypt, 3000 yearsbefore Christ. The element of chance needed for these games was at first provided by tossingastragali, the ankle bones of sheep. These bones could come to rest on only four sides,the other two sides being rounded. The upper side of the bone, broad and slightly convexcounted four; the opposite side broad and slightly concave counted three; the lateral sideflat and narrow, one, and the opposite narrow lateral side, which is slightly hollow, six. Youmay examine an astragali of a kiwi sheep (Ask at Maths & Stats Reception to access it). Asurmised PMF with f(4) = 4

10, f(3) = 3

10, f(1) = 2

10and f(6) = 1

10and DF are shown in

Figure 2.1.2.

Exercise 2.4 Suppose we toss a possibly biased of unfair coin with a given probability0 ≤ p ≤ 1 of H, i.e., P (H) = p and P (T) = 1 − p = q. Associate the RV X to thisexperiment to report the number of Heads in one trial. Compute P (X = 1), P (X =2), P (X = 3), P (X = 0)? Sketch the PDF and DF of X when p takes each of the followingfour values 0, 1/2, 1/3, 2/3, 1. Note that, p is really a parameter of this RV X. We will seemore parametrised random variables in the sequel.

Example 2.7 Let the random variable X denote the sum of two independent tosses ofa fair die. This discrete RV has possible values in 2, 3, 4, . . . , 12. There are a total of6 × 6 = 36 equally likely outcomes (ω1, ω2) ∈ Ω = (1, 1), (1, 2), . . . , (6, 6), where ω1 isthe outcome of the first toss and ω2 is that of the second independent toss. Each suchoutcome (ω1, ω2) has probability 1/36. Now X = 2 occurs in the case of the outcome (1, 1);X = 3 in the case of the two outcomes (1, 2) and (2, 1); X = 4 in the case of the threeoutcomes (1, 3), (2, 2), (3, 1); and so on as shown by the mapping in Figure 2.1.2. Hence,f(x) = P (X = x) and F (x) = P (X ≤ x) have the values shown in Table 2.1.2. Figure 2.1.2shows the plots of f(x) and F (x).

20

Page 21: Elements of Probability and Statistics

Figure 2: f(x) and F (x) of an astragali toss RV X of Example 2.6

(a) X : Ω→ 2, 3, 4, . . . , 11, 12, P (ω) = 136 for any ω ∈ Ω (b) PMF f(x) and DF F (x)

Figure 3: f(x) and F (x) of RV X for the sum of two independent tosses of a fair die.

x 2 3 4 5 6 7 8 9 10 11 12f(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36F (x) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36

Table 1: f(x) and F (x) for the sum of two independent tosses of a fair die RV X.

21

Page 22: Elements of Probability and Statistics

Example 2.8 Compute the probability of a sum of at least 4 and at most 8 from theprobability Table 2.1.2 in Example 2.7. From Equation 17 we get:

P (3 < X ≤ 8) = F (8)− F (3) =26

36− 3

36=

23

36

Recommended Activity 2.1 You can get a nice treatment of the sum of two independenttosses of a fair die in ten minutes and seven seconds by watching the following UTube video:http://www.youtube.com/v/2XToWi9j0Tk&hl=en_US&fs=1&amp;rel=0&amp;border=1

Next we see some the most basic parametric model of discrete random variables.

Definition 2.6 Bernoulli(θ) Random Variable. Given a parameter θ ∈ (0, 1), the prob-ability mass function (PMF) for the Bernoulli(θ) RV X is:

f(x; θ) =

θ if x = 1 ,

1− θ if x = 0 ,

0 otherwise ,

and its DF is:

F (x; θ) =

1 if 1 ≤ x ,

1− θ if 0 ≤ x < 1 ,

0 otherwise

We emphasise the dependence of the probabilities on the parameter θ by specifying it fol-lowing the semicolon in the argument for f and F .

Example 2.9 Let the random variable X return 1 if we observe a H and 0 otherwise whenwe toss a possibly biased coin with parameter θ ∈ (0, 1). Then, P (H) = P (X = 1) = θand X is a Bernoulli(θ) RV.

2.1.3 Discrete non-uniform random variables with infinitely many possibilities

Let us now look at a discrete RV that has countably infinitely many possibilities in the setof natural numbers N = 1, 2, 3, . . ..

Example 2.10 Waiting For the First Heads. Suppose our experiment is to toss a faircoin independently and identically (that is the same coin is tossed in essentially the samemanner independent of the other tosses in each trial) as often as necessary until we have aheads denoted by H. Let the RV X denote the Number of trials until the first H appears.Then, clearly the possible values X can take is 1, 2, 3, . . .. Let us compute the PMF of X

22

Page 23: Elements of Probability and Statistics

by independence of events:

f(1) = P (X = 1) = P (H) =1

2,

f(2) = P (X = 2) = P (TH) =1

2· 1

2=

(1

2

)2

,

f(3) = P (X = 3) = P (TTH) =1

2· 1

2· 1

2=

(1

2

)3

, etc.

and in general

f(x) = P (X = x) =

(1

2

)x, x = 1, 2, . . . .

Example 2.11 Recall experiment in Example 2.4 of tossing a possibly biased coin with afixed parameter θ = P (H), where 0 < θ < 1. Now suppose you use such a coin in the waitingfor the first Heads experiment with RV X in Example 2.10. Confirm that the probabilitiesindeed sum to 1 by the fact that the x-th partial sums Sx = a

(1−rx

1−r

)of the geometric series∑∞

x=0 arx = a+ ar+ ar2 + ar3 + · · · converge to S = a

1−r if −1 < r < 1. Let us compute theθ-specific PMF of X by independence of events:

f(1; θ) = P (X = 1) = P (H) = (1− θ)0θ = θ ,

f(2; θ) = P (X = 2) = P (TH) = (1− θ)1θ ,

f(3; θ) = P (X = 3) = P (TTH) = (1− θ)2θ , etc.

and in generalf(x; θ) = P (X = x) = (1− θ)x−1θ, x = 1, 2, . . . .

And, we already saw that this series converges if 0 < (1− θ) < 1:

limx→∞

F (x; θ) = f(1; θ) + f(2; θ) + f(3; θ) + · · · = θ

1− (1− θ)=θ

θ= 1 .

We have just derived the PMF of a θ-parametric family of discrete random variable that cantake countably infinitely many values in 1, 2, 3, . . .. We also showed that the PMF sumsto 1 as it should. This is called the geometric distribution with “success probability”parameter θ for obvious reasons.

Definition 2.7 Binomial(n, θ) Random Variable. Let the RV X =∑n

i=1Xi be the sumof n independent and identically distributed Bernoulli(θ) RVs, i.e.:

X =n∑i=1

Xi, X1, X2, . . . , XnIID∼ Bernoulli(θ) .

Given two parameters n and θ, the PMF of the Binomial(n, θ) RV X is:

f(x;n, θ) =

(n

x

)θx(1− θ)n−x if x ∈ 0, 1, 2, 3, . . . , n ,

0 otherwise

23

Page 24: Elements of Probability and Statistics

where,(nx

)is: (

n

x

)=n(n− 1)(n− 2) . . . (n− x+ 1)

x(x− 1)(x− 2) · · · (2)(1)=

n!

x!(n− x)!.(

nx

)is read as “n choose x” — the number of ways of choosing x object from n of them.

Example 2.12 Find the probability that seven of ten persons will recover from a tropicaldisease if we can assume independence and the probability is identically 0.80 that any oneof them will recover from the disease.Substituting x = 7, n = 10, and θ = 0.8 into the formula for the binomial distribution, weget:

f(7; 10, 0.8) =

(10

7

)× (0.8)7 × (1− 0.8)10−7 =

10!

(10− 7)!7!× (0.8)7 × (1− 0.8)10−7 ,

and find that the result is 120× (0.8)7 × (1− 0.8)10−7 u 0.20.

Exercise 2.5 Compute the probability of obtaining at least two 6’s in rolling a fair die in-dependently and identically four times.

P (at least two 6’s) =

=

= .

Definition 2.8 Poisson(λ) Random Variable. Given a parameter λ > 0, the PMF ofthe Poisson(λ) RV X is:

f(x;λ) =λx

x!exp(−λ) where x = 0, 1, . . . (19)

For some values of λ, it can be proved that this distribution is obtained as a limiting caseof the Binomial(n, θ) RV, if we let θ → 0 and n → ∞ so that the product nθ approaches afinite value. For instance, λ = nθ may be kept constant. Thus, Poisson(λ) RV is really alimit of Binomial(n, θ) RV as n→∞, θ → 0 and nθ = λ.

Example 2.13 If the probability of producing a defective screw is 0.01, what is the proba-bility that a lot of 100 screws will contain more than 2 defectives?Let the complementary event that there are no more that two defectives be Ac. For itsprobability we use the Binomial(n = 100, θ = 0.01) RV with nθ = 100× 0.01 = 1. Then,

P (Ac) =

(100

0

)× 0.99100 +

(100

1

)× 0.01× 0.9999 +

(100

2

)× 0.012 × 0.9998 = 92.06% .

24

Page 25: Elements of Probability and Statistics

Since θ is very small, we can approximate this by the much more convenient Poisson(λ) RVwith λ = nθ = 100× 0.01 = 1, obtaining

P (Ac) ≈ e−1

(10

0!+

11

1!+

12

2!+

)= e−1

(1 + 1 +

1

2

)= 91.97% .

Thus P (A) = 1 − P (Ac) = 1 − 91.97% = 8.03% under the Poisson(λ = 1) approximation.Since the binomial distribution gives P (A) = 1− P (Ac) = 1− 92.06% = 7.94%, the Poissonapproximation is quite good.

Example 2.14 If on the average, 2 cars enter a certain parking lot per minute, what is theprobability that during any given minute 4 or more cars will enter the lot?To understand that the Poisson distribution is a model of the situation, we imagine theminute to be divided into very many short time intervals, let θ be the (constant) probabilitythat a car will enter the lot during any such short interval, and assume independence of theevents that happen during those intervals. Then we are dealing with a binomial distributionwith very large n and very small θ, which we can approximate by the Poisson distributionwith

λ = nθ = 2 ,

because 2 cars enter on the average. The complementary event of the event four cars ormore during a given minute’ is three cars or fewer enter the lot’ and has the probability

f(0; 2) + f(1; 2) + f(2; 2) + f(3; 2) = e−2

(20

0!+

21

1!+

22

2!+

23

3!

)= 0.857 .

Therefore, the probability of interest is 1− 0.857 = 14.3%.

2.2 Continuous Random Variables and Distributions

Discrete random variables appear in experiments in which we count (defectives in a produc-tion, days of sunshine in Christchurch, customers standing in a line, number of buses thatwill arrive at the Orbiter bus-stop in the next hour, etc.). Continuous random variablesappear in experiments in which we measure (lengths of screws, voltage in a power line, cubicinches of rain on this lecture theatre, etc.).

Example 2.15 The random variable X is the exact amount of rain in inches over the roofof this lecture theatre tomorrow.

X(ω) = x : Ω→ [0,∞) .

This is an example of a continuous random variable that takes one of (uncountably)infinitely many values in the half-line [0,∞). So, when it rains tomorrow, X(ω) will takea value x and this x could be 1.1” of rain or this x could be 1.10000001” of rain, or this xcould be 87.8798787123456”, etc. Do you see why this random variable X can’t take valuesin (−∞, 0)?

25

Page 26: Elements of Probability and Statistics

Figure 4: Probability density function of the volume of rain in cubic inches over the lecturetheatre tomorrow.

Exercise 2.6 For the continuous random variable X of Example 2.15 it is more interestingto make probability statements such as P (X = x) about the actual amount of rain x thatwill fall on the lecture theatre tomorrow. Can you see why the following statements are true?

P (X = 1.1) = P (X = 1.10000001) = P (X = 87.8798787123456) = 0

In fact, for this continuous random variable P (X = x) = 0 for any real number x. Stop tounderstand this!After you have understood that P (X = x) = 0 for any x ∈ R, it should not be surprisingthat P (1.1 ≤ X ≤ 1.10000001) can now be possibly more that 0.

Recommended Activity 2.2 You can get a nice informal treatment of the contents of§ 2.1 and § 2.2 in ten minutes and two seconds by watching the following YouTube video:http://www.youtube.com/v/Fvi9A_tEmXQ&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;border=1

Definition 2.9 Continuous random variables take values on a continuous scale. A randomvariable X is continuous, if its Distribution function F (x) can be given by an integral

F (x) = P (X ≤ x) =

∫ x

−∞f(v)dv (20)

(we write v because x is needed as the upper limit of the integral) whose integrand f(x),called the probability density function (PDF) of the distribution, is non-negative, andis continuous, perhaps except for finitely many x-values. Differentiation gives the relation off to F as

f(x) = F ′(x) (21)

for every x at which f(x) is continuous.

Then we obtain the very important formula for the probability corresponding to an interval:

P (a < X ≤ b) = F (b)− F (a) =

∫ b

a

f(v)dv . (22)

This is the analog of 17

26

Page 27: Elements of Probability and Statistics

From 20 and P (Ω) = 1 we also have the analog of 18:∫ ∞−∞

f(v)dv = 1 .

Continuous random variables are simpler than discrete ones with respect to intervals. Indeed,in the continuous case the four probabilities corresponding to a < X ≤ b, a < X < b,a ≤ X < b, and a ≤ X ≤ b with any fixed a and b (> a) are all the same.

Definition 2.10 Given a distribution function F (x) = P (X ≤ x) = u, the inverse distri-bution function is defined as

F [−1](u) = infx ∈ R : u < F (x) = q ,

where inf is called infimum, which is effectively the smallest element that random variableX can take in order to satisfy the probability condition. It is also known as quantile orpercentile

The next example illustrates notations and typical applications of our present formulae.

Example 2.16 Let X have the density function f(x) = e−x, if (x ≥ 0) and zero otherwise.Find the distribution function. Find the probabilities P (1

4≤ X ≤ 2) and P

(−1

2≤ X ≤ 1

2

).

Find x such that P (X ≤ x)=0.95.

1.

F (x) =

∫ x

0

e−vdv =(−e−v

]x0

= −e−x + 1 = 1− e−x if x ≥ 0

Therefore,

F (x) =

1− e−x if x ≥ 0 ,

0 otherwise .

2.

P

(1

4≤ X ≤ 2

)=

∫ 2

14

e−vdv = F (2)− F (14) = 63.35%

3.

P

(−1

2≤ X ≤ 1

2

)=

∫ 12

0

e−vdv +

∫ 0

− 12

e−vdv = F (12) + 0 = 39.35%

4.P (X ≤ x) = F (x) = 1− e−x = 0.95

Therefore,x = − log(1− 0.95) = 2.9957 .

27

Page 28: Elements of Probability and Statistics

The previous example is a special case of the following parametric family of random variables.

Definition 2.11 Exponential(λ) Random Variable. Given a rate parameter λ > 0, theExponential(λ) random variable X has probability density function given by:

f(x;λ) =

λ exp(−λx) x > 0 ,

0 otherwise ,

and distribution function given by:

F (x;λ) = 1− exp(−λx) .

The Exponential(λ) RV gives the waiting times between successive events of a Poisson(λ)RV that is counting the number of events in unit time.

Example 2.17 At a certain location on highway, the number of cars exceeding the speedlimit by more than 10 kilometers per hour in half an hour is a Poisson(λ = 8.4) randomvariable. What is the probability of a waiting time of less than 5 minutes between carsexceeding the speed limit by more than 10 kilometers per hour?Using half an hour as the unit of time, we have Poisson(λ = 8.4) giving the number ofarrivals in unit time. Therefore, the waiting time is a random variable having an exponentialdistribution with λ = 8.4, and since 5 minutes is 1

6of the unit of time, we find that the

desired probability is ∫ 16

0

8.4e−8.4xdx =(−e−8.4x

] 16

0= −e−1.4 + 1

which is approximately 0.75.

Exercise 2.7 The number of planes arriving per day at a small private airport is a randomvariable having a Poisson distribution parameter equal to 28.8. What is the probability thatthe time between two such arrivals is at least 1 hour?

Definition 2.12 Uniform(a, b) Random Variable. The distribution with the density

f(x) =1

b− aif a < x < b

and f = 0 otherwise is called the uniform distribution on the interval a < x < b. Thecumulative distribution function of uniform RV is

F (x) =

0 x < a ,x−ab−a a ≤ x < b ,

1 x ≥ b .

28

Page 29: Elements of Probability and Statistics

Exercise 2.8 Find a probability density function for the random variable whose distributionfunction is given by

F (x) =

0 x < 0

x 0 ≤ x < 1

1 x ≥ 1 .

f(x) =

f(x) =

.

Example 2.18 A machine pumps cleanser into a process at a rate which has a uniformdistribution in the interval 8.00 to 10.00 litres per minute. What is the pump rate which themachine can be expected to exceed 61% of the time?The density function of this distribution is

f(x) =

0.5 if 8 ≤ x ≤ 10

0 otherwise

Therefore,8 + 0.61/0.5 = 8 + 1.22 = 9.22

The pump rate 9.22 is expected to exceed 61% of the time.

Exercise 2.9 The actual amount of coffee (in grams) in a 230-gram jar filled by a certainmachine is a random variable whose probability density is give by

f(x) =

0 x ≤ 227.515

227.5 < x < 232.5

0 x ≥ 232.5

.

Find the probabilities that a 230-gram jar filled by this machine will contain

(a) at most 228.65 grams of coffee;

(b) anywhere from 229.34 to 231.66 grams of coffee;

(c) at least 229.85 grams of coffee.

29

Page 30: Elements of Probability and Statistics

Next we discuss the normal distribution. This is the most important continuous distributionbecause in applications many random variables are normal random variables (that is, theyhave a normal distribution) or they are approximately normal or can be transformed intonormal random variables in a relatively simple fashion. Furthermore, the normal distributionis a useful approximation of more complicated distributions, and it also occurs in the proofsof various statistical tests.

Definition 2.13 Given a location parameter µ ∈ (−∞,+∞) and a scale parameter σ2 > 0,the normal(µ, σ2) or Gauss(µ, σ2) random variable has probability density function:

f(x;µ, σ2) =1

σ√

2πexp

[−1

2

(x− µσ

)2]

(σ > 0) (23)

This is simpler than it may at first look. f(x;µ, σ2) has these features.

1. µ is the expected value or mean parameter and σ2 is the variance parameter.

2. 1/(σ√

2π) is a constant factor that makes the area under the curve of f(x) from −∞to ∞ equal to 1, as it must be.

3. The curve of f(x) is symmetric with respect to x = µ because the exponent is quadratic.Hence for µ = 0 it is symmetric with respect to the y-axis x = 0.

4. The exponential function decays to zero very fast — the faster the decay the smallerthe standard deviation σ is.

The normal distribution has the distribution function

F (x;µ, σ2) =1

σ√

∫ x

−∞exp

[−1

2

(v − µσ

)2]dv . (24)

Here we needed x as the upper limit of integration and wrote v in the integrand.

Figure 5: PDF and DF of Normal(µ, σ2) RV for different values of µ and σ2

For the corresponding standardised normal distribution with mean 0 and variance 1 wedenote F (z; 0, 1) by Φ(z). Then we simply have from

Φ(z) =1√2π

∫ z

−∞e−u

2/2du .

30

Page 31: Elements of Probability and Statistics

This integral cannot be integrated by one of the methods of calculus. But this is no seriousproblem because its values can be obtained numerically and tabulated. Theses values areneeded in working with the normal distribution. The curve of Φ(z) is S-shaped. It increasesmonotone from 0 to 1 and intersects the vertical axis at 1/2.

Theorem 2.14 The distribution function F (x;µ, σ2) of the Normal(µ, σ2) RV with any µand σ2 is related to standardised Normal(0, 1) RV with DF Φ(z):

F (x) = Φ

(x− µσ

).

Example 2.19 Let X be normal with mean 5 and standard deviation 0.2. Find c or kcorresponding to the given probability

P (X ≤ c) = 95%, Φ

(c− 5

0.2

)= 95%,

c− 5

0.2= 1.645, c = 5.329

P (5− k ≤ X ≤ 5 + k) = 90%, 5 + k = 5.239

P (X ≥ c) = 1%, thus P (X ≥ c) = 99%,c− 5

0.2= 2.326, 5.465 .

Example 2.20 Suppose that the amount of cosmic radiation to which a person is exposedwhen flying by jet across the United States is a random variable having a normal distributionwith a mean of 4.35 mrem and a standard deviation of 0.59 mrem. What is the probabilitythat a person will be exposed to more than 5.20 mrem of cosmic radiation on such a flight?Looking up the entry corresponding to z = 5.20−4.35

0.59= 1.44, and subtracting it from 1. We

get 1− 0.9251 = 0.0749.

Exercise 2.10 Find the probabilites that a random variable having the standard normaldistribution will take on a value

(a) less that 1.72;

(b) less than -0.88;

(c) between 1.30 and 1.75;

31

Page 32: Elements of Probability and Statistics

(d) between -0.25 and 0.45.

32

Page 33: Elements of Probability and Statistics

3 Expectations

We learn of expectations of Random variables as a way to summarise random variables.

Definition 3.1 Expectation of a function g : X→ R of a random variable X is:

E(g(X)) =

∑x g(x)f(x) if X is a discrete RV∫∞−∞ g(x)f(x)dx if X is a continuous RV

Definition 3.2 Population Mean characterises the central location of the RV X. It isthe expectation of the function g(x) = x:

E(X) =

∑x xf(x) if X is a discrete RV∫∞−∞ xf(x)dx if X is a continuous RV

Often, population mean is denoted by µ.

Definition 3.3 Population variance characterises the spread of the variability of the RVX. It is the expectation of the function g(x) = (x− E(X))2:

V (X) = E((X − E(X))2

)=

∑x(x− E(X))2f(x) if X is a discrete RV∫∞−∞(x− E(X))2f(x)dx if X is a continuous RV

Often, population variance is denoted by σ2. Note that σ2 > 0, and σ2 = 0 for a point massrandom variable, that is a discrete random variable which can only take one possible value.

The following formula for variance is very useful: V (X) = E(X2)− (E(X))2 . We can prove

the above formula by expanding the square as follows:

V (X) = E((X − E(X))2

)= E

(X2 − 2XE(X) + (E(X))2

)= E(X2)− E (2XE(X)) + E

((E(X))2

)= E(X2)− 2E(X)E (X) + (E(X))2

= E(X2)− 2(E(X))2 + (E(X))2

= E(X2)− (E(X))2 .

Definition 3.4 Population Standard Deviation is the square root of the variance, andit is often denoted by σ.

Example 3.1 Mean and Variance of fair coin toss. The random variable X = Numberof heads in a single toss of a fair coin has the possible values X = 0 and X = 1 withprobabilities P (X) = 1

2and P (X = 1) = 1

2. From the definition we thus obtain the mean

µ = E(X) = 0 · 1

2+ 1 · 1

2=

1

2,

and the variance as

σ2 = E(X − E(X))2 =

(0− 1

2

)2

· 1

2+

(1− 1

2

)2

· 1

2=

1

4.

33

Page 34: Elements of Probability and Statistics

Exercise 3.1 What are the population mean and variance of a biased coin with P (X =1) = 9

10?

Example 3.2 Recall that the random variable X with density

f(x) =

1b−a if a < x < b

0 otherwise

is uniformly distributed on the interval [a, b]. From the definition of the expectation of acontinuous random variable, we find that

E(X) =

∫ ∞−∞

xf(x)dx =

∫ b

a

xf(x)dx =

∫ b

a

x1

b− adx =

1

b− a

∫ b

a

xdx

=1

b− a

(1

2x2

]x=b

x=a

dx =1

b− a(b2 − a2) =

1

b− a(b+ a)(b− a) = (a+ b)/2 ,

since

E(X2) =

∫ b

a

x2f(x) dx =

∫ b

a

x2 1

b− adx =

1

b− a

∫ b

a

x2 dx =1

b− a

(1

3x3

]x=b

x=a

=1

b− a1

3(b3 − a3)

=1

3

1

b− a(b− a)(b2 + ab+ a2) =

b2 + ab+ a2

3.

Therefore variance is

V (X) = E(X)− (E(X))2 =b2 + ab+ a2

3−(

(a+ b)

2

)2

=b2 + ab+ a2

3− a2 + 2ab+ b2

4

=(b− a)2

12.

Example 3.3 Mean and variance of discrete uniform random variable with 1, 2, . . . , k out-comes, say for the fair k-faced die, based on Faulhaber’s formula for

∑ki=1 i

m, withm ∈ 1, 2,are,

E(X) =1

k(1 + 2 + · · ·+ k) =

1

k

k(k + 1)

2=k + 1

2,

E(X2) =1

k

(12 + 22 + · · ·+ k2

)=

1

k

k(k + 1)(2k + 1)

6=

2k2 + 3k + 1

6,

V (X) = E(X2)− (E(X))2 =2k2 + 3k + 1

6−(k + 1

2

)2

=2k2 + 3k + 1

6−(k2 + 2k + 1

4

)=

8k2 + 12k + 4− 6k2 − 12k − 6

24=

2k2 − 2

24=k2 − 1

12.

34

Page 35: Elements of Probability and Statistics

Exercise 3.2 Find the mean and variance of the discrete uniform random variable X with40 equi-probable outcomes 1, 2, . . . 40. Think of X as the probability model of the first balllabel in one NZ Lotto trial.

Example 3.4 Mean and Variance of Poisson random variable X with parameter λ. Recallthat the Taylor series of eλ

eλ = 1 + λ+λ2

2!+λ3

3!+λ4

4!+ . . . =

∞∑x=0

λx

x!.

By using this fact, the population mean is

E(X) =∞∑x=0

xf(x;λ) =∞∑x=0

xe−λλx

x!= e−λ

∞∑x=0

xλx

x!= e−λ

∞∑x−1=0

λλx−1

(x− 1)!= e−λλeλ = λ .

The population variance is

V (X) = E(X2)− (E(X))2 = λ+ λ2 − λ2 = λ .

since

E(X2) =∞∑x=0

x2 e−λλx

x!= λ e−λ

∞∑x=1

xλx−1

(x− 1)!= λ e−λ

(1 +

1+

3λ2

2!+

4λ3

3!+ ...

)= λ e−λ

([1 +

λ

1+λ2

2!+λ3

3!+ ...

]+

1+

2λ2

2!+

3λ3

3!+ ...

])= λ e−λ

((eλ)

+ λ

(1 +

2!+

3λ2

3!+ ...

))= λ e−λ

(eλ + λ

(1 + λ+

λ2

2!+ ...

))= λ e−λ

(eλ + λ

(eλ))

= λ e−λ(eλ + λ eλ

)= λ(1 + λ)

= λ+ λ2

Note that Poisson RV has the same mean and variance λ.

Exercise 3.3 Show that the expectation and variance of the Exponentially distributed non-negative random variable X with rate parameter λ and density f(x) = λ exp(−λx) is:

E(X) =1

λV (X) =

1

λ2.

35

Page 36: Elements of Probability and Statistics

Exercise 3.4 What is the mean life of a light bulb whose life X [hours] has the densityf(x) = 0.001e−0.001x (x ≥ 0)?

The expectation and variance of the Normal random variable X with mean parameter µ and

variance parameter σ2 with density f(x) = 1√2πσ2

exp(−(x−µ)2

2σ2

)is:

E(X) = µ, V (X) = σ2 .

Theorem 3.5 Chebychev’s Inequality

36

Page 37: Elements of Probability and Statistics

Suppose the random variable X has finite E(X2), then for any constant c > 0 we have

P (|X| ≥ c) ≤ E(X2)

c2.

Proof: We will carry out the proof for a countably valued X and leave the analogous prooffor the density case as an exercise. The idea of the proof is the same for a general randomvariable. Suppose that X takes the values xi with probabilities pi. Then we have

E(X2) =∑i

pix2i .

If we consider only those values xi satisfying the inequality |xi| ≥ c and denote by A thecorresponding set of indices i, namely A = i : |xi| ≥ c, then of course x2

i ≥ c2 for i ∈ A,whereas

P (|X| ≥ c) =∑i∈A

pi.

Then if we sum the index i only over the partial set A, we have

E(X2) ≥∑i∈A

pix2i ≥

∑i∈A

pic2 = c2

∑i∈A

pi = c2P (|X| ≥ c).

Example 3.5 Let X be any continuous random variable with E(X) = µ and V (X) = σ2.Then, if ε = kσ = k standard deviations for some integer k, then

P (|X − µ| ≥ kσ) ≤ σ2

k2σ2=

1

k2,

just as in the discrete case.

37

Page 38: Elements of Probability and Statistics

4 Tutorial for Week 1

4.1 Preparation Problems (Homework)

Exercise 4.1 [§ 1.1] Venn Diagrams.

1. Using Venn diagrams, graph and check the rules

A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)

2. Using Venn diagram, show that A ⊆ B if and only if A ∪B = B.

3. Show that, by the definition of complement, for any subset A of a sample space Ω,(Ac)c = A, Ωc = ∅, ∅c = Ω, A ∪ Ac = Ω, A ∩ Ac = ∅.

Exercise 4.2 [§ 1.2] Find the sample space for the experiment:

1. Tossing 2 coins whose faces are sprayed with black paint denoted by B and white paintdenoted by W

2. Drawing 4 screws from a lot of left-handed and right-handed screws denoted by L andR, respectively.

Exercise 4.3 [§ 1.3] Suppose we pick a letter at random from the word WAIMAKARIRI.What is the sample space Ω and what probabilities should be assigned to the outcomes?

Exercise 4.4 [§ 1.3] In the toss an unfair die experiment with Ω = 1, 2, 3, 4, 5, 6, the prob-ability of the event A = 1, 3, 5 = 1/3. What is the probability of the event B = 2, 4, 6?

Exercise 4.5 [§ 2.1]

1. What gives the greater probability of hitting some target at least once: (a) hitting ina shot with probability 1/2 and firing 1 shot, or (b) hitting in a shot with probability1/4 and firing 2 shots? First guess. Then calculate.

2. In rolling two fair dice, what is the probability of obtaining a sum greater than 4 butnot exceeding 7?

38

Page 39: Elements of Probability and Statistics

3. A local country club has a membership of 600 and operates facilities that include an18-hole championship golf course and 12 tennis courts. Before deciding whether toaccept new members, the club president would like to know how many members regu-larly use each facility. A survey of the membership indicates that 70% regularly use thegolf course, 50% regularly use the tennis courts, and 5% use neither of these facilitiesregularly. Given that a randomly selected member uses the tennis courts regularly,find the probability that they also use the golf course regularly.

4. Let X be the number of years before a particular type of machine will need replacement.Assume that X has the probability function f(1) = 0.1, f(2) = 0.2, f(3) = 0.2,f(4) = 0.2, f(5) = 0.3. Graph f and F . Find the probability that the machine needsno replacement during the first 3 years.

5. A box contains 4 right-handed and 6 left-handed screws. Two screws are drawn atrandom without replacement. Let X be the number of left-handed screws drawn.Find the probabilities P (X = 0), P (X = 1), P (X = 2), P (1 < X < 2), P (X ≤ 1),P (X ≥ 1), P (X > 1), and P (0.5 < X < 10).

6. One number in the following table for the probability function of a random variable Xis incorrect. Which is it, and what should the correct value be?

x 1 2 3 4 5P (X = x) 0.07 0.10 1.10 0.32 0.40

4.2 In Tutorial Problems

Exercise 4.6 [§ 1.1] Using Venn diagram, graph and check De Morgan’s Laws:

1. (A ∪B)c = Ac ∩Bc

2. (A ∩B)c = Ac ∪Bc

Exercise 4.7 [§ 1.2] Graph a sample space for the experiment:

1. Rolling 2 dice each of whose faces are marked by numbers 1,2,3,4, 5 and 6

2. Tossing a coin until the first H appears

3. Rolling a die until the first 6 appears

Exercise 4.8 [§ 1.3] There are seventy five balls in total inside the Bingo Machine. Eachball is labelled by one of the following five letters: B, I, N, G, and O. There are fifteen ballslabelled by each letter. The letter on the first ball that comes out of a BINGO machine afterit has been well-mixed is the outcome of our experiment. Formalise this experiment and theassociated probability model step-by-step.

39

Page 40: Elements of Probability and Statistics

1. First, the sample space is:

2. The probabilities of simple events are:

3. Check if Axiom (1) is satisfied:

4. Is Axiom (3) satisfied for simple events B and I

5. Using the addition rule for mutually exclusive events check that Axiom (2) is satisfiedfor the simple events.

6. Consider the following events: C = B, I,G and D = G, I,N. Using the additionrule for two arbitrary events compute P (C ∪D).

Exercise 4.9 Associate a RV X with the BINGO experiment of Exercise 4.8. Note thatyou can choose any RV for this job. Find the PMF and DF of X. Now define another RV Yfor this same experiment that counts the number of balls labelled by a vowel in the outcomeof one BINGO Trial. Is Y a discrete uniform RV? What is P (Y = 1)?

Exercise 4.10 Durrett (The Monty Hall problem). The problem is named for the hostof the television show Let’s Make A Deal in which contestants were often placed in situationslike the following: Three curtains are numbered 1, 2, and 3. Behind one curtain is a car;behind the other two curtains are donkeys. You pick a curtain, say #1. To build somesuspense the host opens up one of the two remaining curtains, say #3, to reveal a donkey.What is the probability you will win given that there is a donkey behind #3? Should youswitch curtains and pick #2 if you are given the chance?http://www.math.canterbury.ac.nz/SOCR/SOCR Experiments.html

Choose Monty Hall experiment.

http://www.math.canterbury.ac.nz/SOCR/SOCR Games.html

Choose Monty Hall game.

Exercise 4.11 Based on past experience, 70% of students in a certain course pass themidterm exam. The final exam is passed by 80% of those who passed the midterm, but onlyby 40% of those who fail the midterm. What fraction of students pass the final:

Exercise 4.12 A small brewery has two bottling machines. Machine 1 produces 75% ofthe bottles and machine 2 produces 25%. One out of every 20 bottles filled by machine 1 isrejected for some reason, while one out of every 30 bottles filled by machine 2 is rejected.What is the probability that a randomly selected bottle comes from machine 1 given that itis accepted?

40

Page 41: Elements of Probability and Statistics

Exercise 4.13 A process producing microchips, produces 5% defective, at random. Eachmicrochip is tested, and the test will correctly detect a defective one 4/5 of the time, and ifa good microchip is tested the test will declare it is defective with probability 1/10.

(a) If a microchip is chosen at random, and tested to be good, what was the probabilitythat it was defective anyway?

(b) If a microchip is chosen at random, and tested to be defective, what was the probabilitythat it was good anyway?

(c) If 2 microchips are tested and determined to be good, what is the probability that atleast one is in fact defective?

Exercise 4.14 A gale is of force 1, force 2, or force 3, with probabilities 2/3, 1/4, 1/12respectively.Force 1 gales cause damage with probability 1/4;force 2 gales cause damage with probability 2/3;force 3 gales cause damage with probability 5/6.

(a) A gale is reported; what is the probability of it causing damage?

(b) If the gale DID cause damage, what are the probabilities that it was force 1; force 2;force 3?

(c) If the gale DID NOT cause damage, what are the probabilities that it was force 1;force 2; force 3?

Exercise 4.15 Of 200 adults, 176 own one TV set, 22 own two TV sets, and 2 own threeTV sets. A person is chosen at random. What is the probability function of X the numberof TV sets owned by that person?

Exercise 4.16 Suppose a discrete random variable X has probability function give by

x 3 4 5 6 7 8 9 10 11 12 13P (X = x) .07 .01 .09 .01 .16 .25 .20 .03 .02 .11 .05

(a) Construct a row of cumulative probabilities for this table.Using both the probabilities of individual values and cumulative probabilities,compute the probability that

41

Page 42: Elements of Probability and Statistics

(b) X ≤ 5 ,

(c) X > 9 ,

(d) X ≥ 9 ,

(e) X < 12 ,

(f) 5 ≤ X ≤ 9 ,

(g) 4 < X < 11 ,

(h) P (X = 14) ,

(i) P (X < 3) .

42

Page 43: Elements of Probability and Statistics

5 Tutorial for Week 2

5.1 Preparation Problems (Homework)

Exercise 5.1 Four fair coins are tossed simultaneously. Find the probability function of therandom variable X = Number of heads and compute the probabilities of obtaining no heads,precisely 1 head, at least 1 head, not more than 3 heads.

Exercise 5.2 If the probability of hitting a target in a single shot is 10% and 10 shots arefired independently, what is the probability that the target will be hit at least once?

Exercise 5.3 If X has the probability function f(x) = k/2x (x = 0, 1, 2, . . . ), what are kand P (X ≥ 4)?

Exercise 5.4 Let p = 1% be the probability that a certain type of light bulb will fail in a240hr test. Find the probability that a sign consisting of 10 such bulbs will burn 24 hourswith no bulb failures.

Exercise 5.5 Given a density f(x) = k if −4 ≤ x ≤ 4 and 0 elsewhere, what is the k valueGraph f and F .

Exercise 5.6 If the diameter X of axles has the density f(x) = k if 119.9 ≤ x ≤ 120.1 and0 otherwise, how many defectives will a lot of 500 axles approximately contain if defectivesare axles slimmer than 119.92 or thicker than 120.08?Therefore, the probability of ONE defective axles P (defective) = 0.1 + 0.1 = 0.2, and it isexpected that there is 0.2× 500 = 100 defective axles in 500 axles.

5.2 In Tutorial Problems

Exercise 5.7 (Rutherford-Geiger experiments) In 1910, E. Rutherford and H. Geigershowed experimentally that the number of alpha particles emitted per second in a radioactiveprocess is a random variable X having a Poisson distribution. If X has mean 0.5, what isthe probability of observing two or more particles during any given second?

Exercise 5.8 Let p = 1% be the probability that a certain type of light bulb will fail in a240hr test. Find the probability that a sign consisting of 10 such bulbs will burn 24 hourswith no bulb failures.

43

Page 44: Elements of Probability and Statistics

Exercise 5.9 Suppose that a certain type of magnetic tape contains, on the average, 2 de-fects per 100 meters. What is the probability that a roll of tape 300 meters long will contain(a) x defects, (b) no defects?

Exercise 5.10 Find the probability that none of the three bulbs in a traffic signal mustbe replaced during the first 1200 hours of operation if the probability that a bulb must bereplaced is a random variable X with density f(x) = 6[0.25 − (x − 1.5)2] when 1 ≤ x ≤ 2and f(x) = 0 otherwise, where x is time measured in multiples of 1000 hours.

Exercise 5.11 Suppose that certain bolds have length L = 200 + X mm, where X is arandom variable with density f(x) = 3

4(1− x2) if −1 ≤ x ≤ 1 and 0 otherwise. Determine c

so that with a probability of 95% a bolt will have any length between 200− c and 200 + c.

Exercise 5.12 Let the random variable X with density f(x) = ke−x if 0 ≤ x ≤ 2 and0 otherwise be the time after which certain ball bearings are worn out. Find k and theprobability that a bearing will last at least 1 year.

P (X ≥ 1) = 1− P (x < 1)

= 1−∫ 1

0

ke−x

= 1 + k(e−x]1

0

= 1 +e−1 − 1

1− e−2= 0.2689414

Exercise 5.13 Assume that a new light bulb will burn out after t hours, where t is chosenfrom [0,∞) with an exponential density

f(t) = λe−λt .

In this context, λ is often called the failure rate of the bulb.

(a) Assume that λ = 0.01, and find the probability that the bulb will not burn out beforeT hours. This probability is often called the reliability of the bulb.

(b) For what T is the reliability of the bulb = 1/2?

Exercise 5.14 Choose a number B at random from the interval [0, 1] with uniform density.Find the probability that

44

Page 45: Elements of Probability and Statistics

(a) 1/3 < B < 2/3

(b) |B − 1/2| ≤ 1/4

(c) B < 1/4 or 1−B < 1/4

(d) 3B2 < B

Exercise 5.15 IQ scores for school children are standardised so that they are approximatelyNormally distributed with a mean of 100 and a standard deviation of 15. What is approxi-mately the probability that a randomly selected child has an IQ

(a) less than 80?

(b) between 85 and 110?

(c) greater than 120?

Exercise 5.16 We return to IQ scores that are approximately Normally distributed with amean of 100 and a standard deviation of 15.

(a) What is the 80% percentile of IQ scores?

(b) What IQ score is exceeded by only the top 1% of children?

(c) Below what score do only the bottom 30% of children fall?

Exercise 5.17 What is the expected daily profit if a store sells X air conditioners per daywith probability f(10) = 0.1, f(11) = 0.3, f(12) = 0.4, f(13) = 0.2 and the profit perconditioner is $55?

Exercise 5.18 If the mileage (in multiples of 1000 mi) after which a tire must be replacedis given by the random variable X with density f(x) = λe−λx (x > 0), what mileage canyou expect to get on one of these tires? Let λ = 0.04 and find the probability that a tirewill last at least 40000 mi.

Exercise 5.19 A small filling station is supplied with gasoline every Saturday afternoon.Assume that its volume X of sales in ten thousands of gallons has the probability densityf(x) = 6x(1− x) if 0 ≤ x ≤ 1 and 0 otherwise. Determine the mean, the variance.

45

Page 46: Elements of Probability and Statistics

Exercise 5.20 LetX be normal with mean 80 and variance 9. Find P (X > 83), P (X < 81),P (X < 80), and P (78 < X < 82).

Exercise 5.21 If the lifetime X of a certain kind of automobile battery is Normally dis-tributed with a mean of 4 yr and a standard deviation of 1 yr, and the manufacturer wishesto guarantee that battery for 3 yr, what percentage of the batteries will he or she have toreplace under the guarantee?

Exercise 5.22 If the mathematics scores of the SAT college entrance exams for undergrad-uate admission in the U.S. are Normally distributed with mean 480 and standard deviation100 and if some college sets 500 as the minimum score for new students, what percent ofstudents will not reach that score?

Exercise 5.23 A manufacturer produces airmail envelopes whose weight is Normally dis-tributed with mean µ = 1.95 grams and standard deviation σ = 0.025 grams. The envelopesare sold in lots of 1000. How many envelopes in a lot will be heavier that 2 grams?

Exercise 5.24 Find the mean and the variance of the random variable X

1. X = Number a fair die turns up

2. Uniform distribution on [0, 8]

3. f(x) = 2e−2x (x ≥ 0)

46

Page 47: Elements of Probability and Statistics

For any given value z, its cumulative probability Φ(z) was generated by Excel formula NORMSDIST, as NORMSDIST(z).

z Φ(z) z Φ(z) z Φ(z) z Φ(z) z Φ(z) z Φ(z)0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.9778 2.51 0.99400.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.9783 2.52 0.99410.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.9788 2.53 0.99430.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.9793 2.54 0.99450.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946

0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.99480.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.99490.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.99510.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.99520.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953

0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.99550.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.99560.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.99570.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.99590.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960

0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.99610.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.99620.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.99630.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.99640.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965

0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.99660.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.99670.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.99680.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.99690.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970

0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.99710.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.99720.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.99730.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.99740.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974

0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.99750.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.99760.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.99770.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.99770.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978

0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.99790.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.99790.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.99800.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.99810.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981

0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.99820.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.99820.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.99830.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.99840.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984

0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.99850.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.99850.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.99860.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.99860.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987

Table 2: DF Table for the Standard Normal Distribution.

47

Page 48: Elements of Probability and Statistics

For any give probability Φ, its standard normal quantile z(Φ) was generated by Excel Formula NORMSINV, as NORMSINV(Φ).

Φ% z(Φ%) Φ% z(Φ%) Φ% z(Φ%)1 −2.3263 41 −0.2275 81 0.87792 −2.0537 42 −0.2019 82 0.91543 −1.8808 43 −0.1764 83 0.95424 −1.7507 44 −0.1510 84 0.99455 −1.6449 45 −0.1257 85 1.0364

6 −1.5548 46 −0.1004 86 1.08037 −1.4758 47 −0.0753 87 1.12648 −1.4051 48 −0.0502 88 1.17509 −1.3408 49 −0.0251 89 1.226510 −1.2816 50 0.0000 90 1.2816

11 −1.2265 51 0.0251 91 1.340812 −1.1750 52 0.0502 92 1.405113 −1.1264 53 0.0753 93 1.475814 −1.0803 54 0.1004 94 1.554815 −1.0364 55 0.1257 95 1.6449

16 −0.9945 56 0.1510 96 1.750717 −0.9542 57 0.1764 97 1.880818 −0.9154 58 0.2019 98 2.053719 −0.8779 59 0.2275 99 2.326320 −0.8416 60 0.2533

21 −0.8064 61 0.2793 99.1 2.365622 −0.7722 62 0.3055 99.2 2.408923 −0.7388 63 0.3319 99.3 2.457324 −0.7063 64 0.3585 99.4 2.512125 −0.6745 65 0.3853 99.5 2.5758

26 −0.6433 66 0.4125 99.6 2.652127 −0.6128 67 0.4399 99.7 2.747828 −0.5828 68 0.4677 99.8 2.878229 −0.5534 69 0.4959 99.9 3.090230 −0.5244 70 0.5244

31 −0.4959 71 0.5534 99.91 3.121432 −0.4677 72 0.5828 99.92 3.155933 −0.4399 73 0.6128 99.93 3.194734 −0.4125 74 0.6433 99.94 3.238935 −0.3853 75 0.6745 99.95 3.2905

36 −0.3585 76 0.7063 99.96 3.352837 −0.3319 77 0.7388 99.97 3.431638 −0.3055 78 0.7722 99.98 3.540139 −0.2793 79 0.8064 99.99 3.719040 −0.2533 80 0.8416

Table 3: Quantile Table for the Standard Normal Distribution.

48


Recommended