Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 218 times |
Download: | 2 times |
Chapter 7(7b): Statistical Applications in Traffic Engineering
Chapter objectives: By the end of these chapters the student will be able to (We spend 3 lecture periods for this chapter. We do skip simple descriptive stats because they were covered in CE361.):
Lecture number
Lecture Objectives(after these lectures you will be able to)
Lecture 3 (Chap 7a file)
• Apply the basic principles of statistics contained in section 7.1 to traffic data analyses• Explain the characteristics of the normal distribution and read the normal distribution table correctly (section 7.2) and get necessary values from Excel.• Explain the meaning of confidence bounds and determine the confidence interval of the mean (section 7.3)• Determine sample sizes of traffic data collection (section 7.4)• Explain how random variables are added (section 7.5)• Explain the implication of the central limit theorem (section 7.5.1)• Explain the characteristics of various probabilistic distributions useful for traffic engineering studies and choose a correct distribution for the study(section 7.6)
Lecture 4a(Chap 7b file)
• Explain the special characteristics of the Poisson distribution and its usefulness to traffic engineering studies (section 7-7)• Conduct a hypothesis test correctly (two-sided, one-sided, paired test, F-test) (section 7-8)
Lecture 4b(Chap 7 file)
• Conduct a Chi-square test to test hypotheses on an underlying distribution f(x) (section 7-8)
7.7 The Poisson distribution (“counting distribution” or “Random arrival” discrete
probability function)
!)(
x
emxXP
mx
With mean µ = m and variance 2 = m.
If the above characteristic is not met, the Poisson theoretically does not apply.
The binomial distribution tends to approach the Poisson distribution with parameter m = np. Also, the binomial distribution approaches the normal distribution when np/(1-p)>=9
When time headways are exponentially distributed with mean = 1/, the number of arrivals in an interval T is Poisson distributed with mean = m = T. Note that the unit is veh/unit time (arrival rate).
(Read the sample problem in page 144, table 7.5)
7.8 Hypothesis testing
Two distinct choices:
Null hypothesis, H0
Alternative hypothesis: H1
E.g. Inspect 100,000 vehicles, of which 10,000 vehicles are “unsafe.” This is the fact given to us.
H0: The vehicle being tested is “safe.”
H1: The vehicle being tested is “unsafe.”
In this inspection,
15% of the unsafe vehicles are determined to be safe Type II error (bad error)
and 5% of the safe vehicles are determined to be unsafe Type I error (economically bad but safety-wise it is better than Type II error.)
Types of errors
Reality Decision
Reject H0 Accept H0
H0 is true
H1 is true
Type I error
Type II error
Correct
Correct
Reject a correct null hypothesis
Fail to reject a false null hypothesis
We want to minimize especially Type II error.
Steps of the Hypothesis Testing
State the hypothesis
Select the significance level
Compute sample statistics and estimate parameters
Compute the test statistic
Determine the acceptance and critical region of the test statistics
Reject or do not reject H0P(type I error) = (level of significance)
P(type II error ) =
(see the binary case in p. 145/146. to get a feel of Type II error.)
Dependence between , , and sample size n
There is a distinct relationship between the two probability values and and the sample size n for any hypothesis. The value of any one is found by using the test statistic and set values of the other two.
Given and n, determine . Usually the and n values are the most crucial, so they are established and the value is not controlled.
Given and , determine n. Set up the test statistic for and with H0 value and an H1 value of the parameter and two different n values.
The t (or z) statistics is: t or zn
X
)(
7.8.1 Before-and-after tests with two distinct choices
Here we are comparing means; hence divide σ by sqrt(n).
7.8.2 Before-and-after tests with generalized alternative hypothesis
The significance of the hypothesis test is indicated by , the type I error probability. = 0.05 is most common: there is a 5% level of significance, which means that on the average a type I error (reject a true H0) will occur 5 in 100 times that H0 and H1 are tested. In addition, there is a 95% confidence level that the result is correct.
If H1 involves a not-equal relation, no direction is given, so the significance area is equally divided between the two tails of the testing distribution.
If it is known that the parameter can go in only one direction, a one-sided test is performed, so the significance area is in one tail of the distribution.
One-sided upper
Two-sided
0.025 each
0.05
Two-sided or one-sided test
These tests are done to compare the effectiveness of an improvement to a highway or street by using mean speeds.
If you want to prove that the difference exists between the two data samples, you conduct a two-way test. (There is no change.)
If you are sure that there is no decrease or increase, you conduct a one-sided test. (There was no decrease)
Null hypothesis H0: 1 = 2 (there is no increase)
Alternative H1: 1 2
Null hypothesis H0: 1 = 2 (there is no change)
Alternative H1: 1 = 2
Example
Existing After improvement
Sample size 55 55
Mean 60 min 55 min
Standard Deviation
8 min 8 min
53.155
8
55
8 22
2
22
1
21
nnY
96.12/ z 65.1z
At significance level = 0.05 (See Table 7-3.)
The decision point (or typically zc:
For two-sided:
1.96*1.53 = 2.998
For one-sided:
1.65*1.53 =2.525
|µ1 - µ2| = |60-55| = 5 > zc
By either test, H0 is rejected.
7.8.3 Other useful statistical tests
21
21
11 nns
xxt
p
2
11
21
222
211
nn
snsnsp
The t-test (for small samples, n<=30) – Table 7.6:
The F-test (for small samples) – Table 7.7:In using the t-test we assume that the standard deviations of the two samples are the same. To test this hypothesis we can use the F-test.
22
21
s
sF (By definition the larger s is
always on top.)
(See the samples in pages 149 and 151.
7.8.3 Other useful statistical tests (cont)The F-Test to test if s1=s2
When the t-test and other similar means tests are conducted, there is an implicit assumption made that s1=s2. The F-test can test this hypothesis.
22
21
s
sF The numerator variance > The denominator
variance when you compute a F-value.
If Fcomputed ≥ Ftable (n1-1, n2-1, a), then s1≠s2 at a asignificance level.
If Fcomputed < Ftable (n1-1, n2-1,a), then s1=s2 at a asignificance level.
Discuss the problem in p.151.
Paired difference test
You perform a paired difference test only when you have a control over the sequence of data collection.
e.g. Simulation You control parameters. You have two different signal timing schemes. Only the timing parameters are changed. Use the same random number seeds. Then you can pair. If you cannot control random number seeds in simulation, you are not able to do a paired test.
Table 7-8 shows an example showing the benefits of paired testing The only thing changed is the method to collect speed data. The same vehicle’s speed was measure by the two methods.
Paired or not-paired example (table 7.8)
Method 1 Method 2 Difference
Estimated mean
56.9 61.2 4.3
Estimated SD 7.74 7.26 1.5
H0: No increase in test scores (means one-sided or one-tailed)
Not paired: Paired:
74.215
26.7
15
74.7 22
Ys
|56.9 – 61.2| = 4.3 < 4.54 (=1.65*2.74)
Hence, H0 is NOT rejected.
4.3 increase > 0.642 (=1.65*0.388)
Hence, H0 is clearly rejected.
388.015
50.1E