Probability & Statistics for Engineers &; Scientists
Transcript
1. Probability & Statistics for Engineers &;
Scientists
2. Probability & Statistics for Engineers & Scientists
E I G H T H E D I T I O N Ronald E. Walpole Roanoke College Raymond
H. Myers Virginia Polytechnic Institute and State University Sharon
L. Myers Radford University Keying Ye University of Texas at San
Antonio Pearson Education International PEARSON Prentice Hall
3. If you purchased this book within the United States or
Canada you should be aware that it has been wrongfully imported
without the approval of the Publisher or Author. Editor in Chief:
Sally Yagun Production Editor: Lynn Savino Wendel Senior Managing
Editor: Linda Mihatov Belmms Assistant Managing Editor: Bayani
Mendoza de Leon Executive Managing Editor: Kathleen Schiaparelli
Manufacturing Buyer: Maura Zaldivar Manufacturing Manager: Alexis
Heydt-Long Marketing Manager: Halee Dinsey Marketing Assistant:
Jennifer de Leeuwcrk Director of Marketing: Patrice Jones Editorial
Assistant/Print Supplements Editor: Jennifer Urban Art Editor:
Thomas Benfatti Art Director: Heather Scott Creative Director: Juan
R. Lopez Director of Creative Services: Paul Belfanti Cover Photo:
Corbis Royalty Free Art Studio: Laser-words PEARSON Prentice Hall
2007, 2002, 1998. 1993, 1989, 1985, 1978, 1972 Pearson Education,
Inc. Pearson Prentice Hall Pearson Education, Inc. Upper Saddle
River, N.I 07458 All rights reserved. No part, of tins book may be
reproduced, in any form or by any means, without permission in
writing from the publisher. Pearson Prentice Hall is a trademark of
Pearson Education, Inc. 1 0 9 8 7 6 5 4 3 ISBN 0 - 1 3 - 2 0 4 7 6
7 - 5 Pearson Education LTD., London Pearson Education Australia
PTY. Limited, Sydney Pearson Education Singapore, Pte. Ltd. Pearson
Education North Asia Ltd., Hong Kong Pearson Education Canada,
Ltd.. Toronto Pearson Education de Mexico, S.A. de C.V. Pearson
Education-Japan, Tokyo Pearson Education Malaysia, Pte. Ltd.
Pearson Education, Upper Saddle River, New Jersey
4. This book is dedicated to Billy and Julie R.H.M. and S.L.M.
Limin K.Y.
5. Contents Preface xv 1 Introduction to Statistics and Data
Analysis 1 1.1 Overview: Statistical Inference, Samples,
Populations and Exper- imental Design 1 1.2 The Hole of Probability
4 1.3 Sampling Procedures; Collection of Data 7 1.4 Measures of
Location: The Sample Mean and Median 11 Exercises 13 1.5 Measures
of Variability 1 ! Exercises 17 1.0 Discrete and Continuous Data 17
1.7 Statistical Modeling, Scientific Inspection, and Graphical
Diag- nostics ]!) J .8 Graphical Methods and Data Description 20
1.9 General Types of Statistical Studies: Designed Experiment,
Observational Study, and Retrospective Study 25 Exercises 28 2
Probability 31 2.1 Sample Space 31 2.2 Events 34 Exercises 38 '2.3
Counting Sample Points 40 Exercises 47 2.4 Probability of an Event
48 2.5 Additive Rules 52 Exercises 55 2.6 Conditional Probability
58 2.7 Multiplicative Rules 61 Exercises 65
6. Vlll Contents 2.8 Bayes' Rule 68 Exercises 72 Review
Exercises 73 3 Random Variables and Probability Distributions 77
3.1 Concept of a Random Variable 77 3.2 Discrete Probability
Distributions 80 3.3 Continuous Probability Distributions 84
Exercises 88 3.4 Joint Probability Distributions 91 Exercises 101
Review Exercises 103 3.5 Potential Misconceptions and Hazards;
Relationship to Material in Other Chapters 106 4 Mathematical
Expectation 107 4.1 Mean of a Random Variable 107 Exercises 113 4.2
Variance and Covariance of Random Variables 115 Exercises 122 4.3
Means and Variances of Linear Combinations of Random Variables 123
4.4 Chebyshov's Theorem 131 Exercises 134 Review Exercises 136 4.5
Potential Misconceptions and Hazards; Relationship to Material in
Other Chapters 138 5 Some Discrete Probability Distributions 141
5.1 Introduction and Motivation 141 5.2 Discrete Uniform
Distribution 141 5.3 Binomial and Multinomial Distributions 143
Exercises 150 5.4 Hypergeometric Distribution 152 Exercises 157 5.5
Negative Binomial and Geometric Distributions 158 5.6 Poisson
Distribution and the Poisson Process 161 Exercises 165 Review
Exercises 167 5.7 Potential Misconceptions and Hazards:
Relationship to Material in Other Chapters 169
7. Contents IX 6 Some Continuous Probability Distributions 171
6.1 Continuous Uniform Distribution 171 6.2 Normal Distribution 172
6.3 Areas under the Normal Curve 176 6.4 Applications of the Normal
Distribution 182 Exercises 185 6.5 Normal Approximation to the
Binomial 187 Exercises 19;} 6.6 Gamma and Exponential Distributions
194 6.7 Applications of the Exponential and Gamma Distributions 197
6.8 Chi-Squared Distribution 200 6.9 Lognormal Distribution 201
6.10 VVeibull Distribution (Optional) 202 Exercises 205 Review
Exercises 206 6.1 1 Potential Misconceptions and Hazards:
Relationship to Material in Other Chapters 209 7 Functions of
Random Variables (Optional).. 211 7.1 Introduction 211 7.2
Transformations of Variables 211 7.3 Moments and Moment-Generating
Functions 219 Exercises 226 8 Fundamental Sampling Distributions
and Data Descriptions 229 8.1 Random Sampling 229 8.2 Some
Important Statistics 231 Exercises 23 1 8.3 Data Displays and
Graphical Methods 236 8.4 Sampling Distributions 243 8.5 Sampling
Distribution of Means 244 Exercises 251 8.6 Sampling Distribution
of S 251 8.7 ^-Distribution 257 8.8 F-Distribution 261 Exercises
265 Review Exercises 266 8.9 Potential Misconceptions and Hazards;
Relationship to Material in Other Chapters 268
8. Contents 9 One- and Two-Sample Estimation Problems 269 9.1
Introduction 269 9.2 Statistical Inference 269 9.3 Classical
Methods of Estimation 270 9.4 Single Sample: Estimating the Mean
274 9.5 Standard Error of a Point Estimate 280 9.6 Prediction
Intervals 281 9.7 Tolerance Limits 283 Exercises 285 9.8 Two
Samples: Estimating the Difference between Two Means ... 288 9.9
Paired Observations 294 Exercises 297 9.10 Single Sample:
Estimating a Proportion 299 9.11 Two Samples: Estimating the
Difference between Two Proportions 302 Exercises 304 9.12 Single
Sample: Estimating the Variance 306 9.13 Two Samples: Estimating
the Ratio of Two Variances 308 Exercises 310 9.14 Maximum
Likelihood Estimation (Optional) 310 Exercises 315 Review Exercises
315 9.15 Potential Misconceptions and Hazards; Relationship to
Material in Other Chapters 319 10 One- and Two-Sample Tests of
Hypotheses 321 10.1 Statistical Hypotheses: General Concepts 321
10.2 Testing a Statistical Hypothesis 323 10.3 One- and Two-Tailed
Tests 332 10.4 The Use of P-Values for Decision Making in Testing
Hypotheses. 334 Exercises 336 10.5 Single Sample: Tests Concerning
a Single Mean (Variance Known) 338 10.6 Relationship to Confidence
Interval Estimation 341 10.7 Single Sample: Tests on a Single Mean
(Variance Unknown) 342 10.8 Two Samples: Tests on Two Means 345
10.9 Choice of Sample Size for Testing Means 350 10.10 Graphical
Methods for Comparing Means 355 Exercises 357 10.11 One Sample:
Test on a Single Proportion 361 10.12 Two Samples: Tests on Two
Proportions 364 Exercises 366 10.13 One- and Two-Sample Tests
Concerning Variances 367
9. Contents xi Exercises 370 10.14 Goodness-of-Fit Test 371
10.15 Test for Independence (Categorical Data) 374 10.16 Test for
Homogeneity 377 10.17 Testing for Several Proportions 378 10.18
Two-Sample Case Study 380 Exercises 383 Review Exercises 385 10.19
Potential Misconceptions and Hazards; Relationship to Material in
Other Chapters 387 11 Simple Linear Regression and Correlation..
389 11.1 Introduction to Linear Regression 389 11.2 The Simple
Linear Regression Model 390 11.3 Least Squares and the Fitted Model
394 Exercises 397 11.4 Properties of the Least Squares Estimators
400 11.5 Inferences Concerning the Regression Coefficients 402 11.6
Prediction 409 Exercises 412 11.7 Choice of a Regression Model 414
11.8 Analysis-of-Variance Approach 415 11.9 Test for Linearity of
Regression: Data with Repeated Observations 417 Exercises 423 11.10
Data Plots and Transformations 425 11.11 Simple Linear Regression
Case Study 430 11.12 Correlation 432 Exercises 438 Review Exercises
438 11.13 Potential Misconceptions and Hazards; Relationship to
Material in Other Chapters 443 12 Multiple Linear Regression and
Certain Nonlinear Regression Models 445 12.1 Introduction 445 12.2
Estimating the Coefficients 446 12.3 Linear Regression Model Using
Matrices (Optional) 449 Exercises 452 12.4 Properties of the Least
Squares Estimators 456 12.5 Inferences in Multiple Linear
Regression 458 Exercises 464
10. xii Contents 12.6 Choice of a Fitted Model through
Hypothesis Testing 465 12.7 Special Case of Orthogonality
(Optional) 469 Exercises 473 12.8 Categorical or Indicator
Variables 474 Exercises 478 12.9 Sequential Methods for Model
Selection 479 12.10 Study of Residuals and Violation of Assumptions
485 12.11 Cross Validation, C;), and Other Criteria for Model
Selection 490 Exercises 496 12.12 Special Nonlinear Models for
Nonideal Conditions 499 Review Exercises 503 12.13 Potential
Misconceptions and Hazards; Relationship to Material in Other
Chapters 508 13 One-Factor Experiments: General 511 13.1
Analysis-of-Variance Technique 511 13.2 The Strategy of
Experimental Design 512 13.3 One-Way Analysis of Variance:
Completely Randomized Design (One-Way ANOVA) 513 13.4 Tests for the
Equality of Several Variances 518 Exercises 521 13.5
Single-Dcgree-of-Freedom Comparisons 523 13.6 Multiple Comparisons
527 13.7 Comparing Treatments with a Control 531 Exercises 533 13.8
Comparing a Set of Treatments in Blocks 535 13.9 Randomized
Complete Block Designs 537 13.10 Graphical Methods and Model
Checking 544 13.11 Data Transformations In Analysis of Variance)
547 13.12 Latin Squares (Optional) 549 Exercises 551 13.13 Random
Effects Models 555 13.14 Power of Analysis-of-Variance Tests 559
13.15 Case Study 563 Exercises 565 Review Exercises 567 13.16
Potential Misconceptions and Hazards; Relationship to Material in
Other Chapters 571
11. Contents xiii 14 Factorial Experiments (Two or More
Factors) 573 14.1 Introduction 573 14.2 Interaction in the
Two-Factor Experiment 574 14.3 Two-Factor Analysis of Variance 577
Exercises 587 14.4 Three-Factor Experiments 590 Exercises 597 14.5
Model II and III Factorial Experiments 600 14.6 Choice of Sample
Size 603 Exercises 605 Review Exercises 607 14.7 Potential
Misconceptions and Hazards; Relationship to Material in Other
Chapters 609 15 2k Factorial Experiments and Fractions 611 15.1
Introduction 611 15.2 The 2fc Factorial: Calculation of Effects and
Analysis of Variance 612 15.3 Nonreplicated 2k Factorial Experiment
618 15.4 Injection Molding Case Study 619 Exercises 622 15.5
Factorial Experiments in a Regression Setting 625 15.6 The
Orthogonal Design 631 15.7 Factorial Experiments in Incomplete
Blocks 639 Exercises 645 15.8 Fractional Factorial Experiments 647
15.9 Analysis of Fractional Factorial Experiments 653 Exercises 656
15.10 Higher Fractions and Screening Designs 657 15.11 Construction
of Resolution III and IV Designs 658 15.12 Other Two-Level
Resolution III Designs; The Plackett-Burman Designs 660 15.13
Robust Parameter Design 661 Exercises 666 Review Exercises 667
15.14 Potential Misconceptions and Hazards: Relationship to
Material in Other Chapters 669 16 Nonparametric Statistics 671 16.1
Nonparametric Tests 671 16.2 Signed-Rank Test 676
12. xiv Contents Exercises 679 16.3 Wilcoxon Rank-Sum Test 681
16.4 Kruskal-Wallis Test 684 Exercises 686 1G.5 Runs Test. 687 16.6
Tolerance Limits 690 16.7 Rank Correlation Coefficient 690
Exercises 693 Review Exercises 695 17 Statistical Quality Control
697 17.1 Introduction 697 17.2 Nature of the Control Limits 699
17.3 Purposes of the Control Chart 699 17.4 Control Charts for
Variables 700 17.5 Control Charts for Attributes 713 17.6 Cusum
Control Charts 721 Review Exercises 722 18 Bayesian Statistics
(Optional) 725 18.1 Bayesian Concepts 725 18.2 Bayesian Inferences
726 18.3 Bayes Estimates Using Decision Theory Framework 732
Exorcises 734 Bibliography 737 A Statistical Tables and Proofs 741
B Answers to Odd-Numbered Non-Review Exercises 795 Index 811
13. Preface General Approach and Mathematical Level The general
goals for the eighth edition remain the same as those in recent
editions. We feel as if it is important to retain a balance between
theory and applications. Engineers and physical scientists as well
as computer scientists are trained in cal- culus and thus
mathematical support is given when we feel as if the pedagogy- is
enhanced by it. This approach prohibits the material from becoming
a collec- tion of tools with no mathematical roots. Certainly
students with a mathematical background of calculus, and, in a few
cases, linear algebra, have the capability to understand the
concepts more thoroughly and use the resulting tools more intel-
ligently. Otherwise there is a clear danger that, the student will
only be able to apply the material within very narrow bounds. The
new edition contains a substantially larger number of exercises.
These exercises challenge the student to be able to use concepts
from the text to solve problems dealing with many real-life
scientific and engineering situations. The data sets involved in
the exercises are available for download from website at
http://www.prenhaU.com.. The increase in the quantity of exercises
results in a much broader spectrum of areas of applications,
including biomedical, biocngi- neering, business problems, computer
issues, and many others. Even the chapters that deal in
introductory probability theory contain examples and exercises that
carry a broad range of applications that students of science and
engineering will easily recognize as important. As in past
editions, the use of calculus is confined to elementary probability
theory and probability distributions. These topics are discussed in
Chapters 2. 3, 1, 6, and 7. Chapter 7 is an optional chapter that
includes transformations of variables and moment generating
functions. Matrix algebra is used only a, modest amount in linear
regression material in Chapters 11 and 12. For those who desire the
use of more substantial support with matrices, an optional section
in Chapter 12 is available. The instructor who wishes to minimize
the use of matrices may bypass this section with no loss of
continuity. Students using this text should have completed the
equivalent of one semester of differential and integral calculus.
An exposure to matrix algebra would be helpful but not necessary if
the course context excludes I he aforementioned optional section
given in Chapter 12.
14. xvi Preface Content and Course Planning This text is
designed for either a one- or two-semester course. A reasonable
cur- riculum for a one-semester course might include Chapters 1
through 10. Many instructors desire for students to be exposed in
some degree to simple linear re- gression in a one semester course.
Thus one may choose to include a portion of Chapter 11. On the
other hand, some instructors wish to teach a portion of analy- sis
of variance, in which case Chapters 11 and 12 may be excluded in
favor of some portion of Chapter 13, which features one factor
analysis of variance. In order to provide sufficient time for one
or perhaps even both of these topics, the instructor may wish to
eliminate Chapter 7 and/or certain specialized topics in Chapters 5
and 6 (for example, treatment on the gamma, lognormal, and Weibull
distribu- tions, or material on the negative binomial and geometric
distributions). Indeed, some instructors find that in a
one-semester course in which regression analysis and analysis of
variance are of primary interest, there may be topics in Chapter 9
on estimation that may be removed (e.g., maximum likelihood,
prediction in- tervals, and/or tolerance limits). We feel as if the
flexibility exists that allows a one-semester course given any
priorities set down by the instructor. Chapter 1 is an elementary
overview of statistical inference designed for the beginner. It
contains material on sampling and data analysis and contains many-
examples and exercises for motivation. Indeed, some very
rudimentary aspects of experimental design are included along with
an appreciation of graphic techniques and certain vital
characteristics of data collection. Chapters 2, 3, and 4 deal with
basic probability as well as discrete and continuous random
variables. Chapters 5 and 6 cover specific discrete and continuous
distributions with illustrations of their use and relationships
among them. In addition, a substantial number of examples and
exercises are given that illustrate their use. Chapter 7 is an
optional chapter that treats transformation of random variables. An
instructor may wish to cover this material only if he or she is
teaching a more theoretical course. This chapter is clearly the
most mathematical chapter in the text. Chapter 8 contains addi-
tional material on graphical methods as well as a very important
introduction to the notion of a sampling distribution. Probability
plotting is discussed. The mate- rial on sampling distribution is
reinforced by a thorough discussion of the central limit theorem as
well as the distribution of a sample variance under normal i.i.d.
(independently and identically distributed) sampling. The t and F
distributions are introduced along with motivation regarding their
use in chapters that follow. Chapters 9 and 10 contain material on
one and two sample point and interval estimation and hypothesis
testing. Material on confidence intervals, prediction in- tervals,
tolerance intervals, and maximum likelihood estimation in Chapter 9
offeis the instructor considerable flexibility regarding what might
be excluded in a one- semester course. A section on Bayes
estimation that was available in the seventh edition in Chapter 9
has been removed. More attention will be given to this topic in the
"New to This Edition" section that follows. Chapters 11 through 17
contain ample material for a second semester. Simple and multiple
linear regression are contained in Chapters 8 and 12, respectively.
Chapter 12 also contains material on logistic regression, which
finds applications in many areas of engineering and the biological
sciences. The material covered in multiple linear regression is
quite extensive and thus provides flexibility for the
15. Preface xvii instructor. Among the "special topics" to
which an instructor has access are the special case of orthogonal
rcgressors, categorical or indicator variables, sequential methods
for model selection, study of residuals and violation of
assumptions, cross validation and the use of PRESS and C, and, of
course, logistic regression. Chap- ters 13 through 17 contain
topics in analysis of variance, design of experiments,
nonparametric statistics, and quality control. Chapter 15 treats
two-level factorials (with and without blocking) and fractional
factorials, and again flexibility is abun- dant because of the many
"special topics" offered in this chapter. Topics beyond the
standard 2k and fractional 2* designs include blocking and partial
confound- ing, special higher fractions and screening designs,
Plackett-Burman designs, and robust parameter design. All chapters
contain a large number of exercises, considerably more than what
was offered in the seventh edition. More information on exercises
will be given in the "New To This Edition" section. Case Studies
and Computer Software The topical material in two-sample hypothesis
testing, multiple linear regression, analysis of variance, and the
use of two-level factorial experiments is supplemented by case
studies that feature computer printout and graphical material. Both
SAS and MINFTAB are featured. The use of the computer printout
underscores our feeling that the students should have the
experience of reading and interpreting computer printout and
graphics, even if that which is featured in the text is not what is
used by the instructor. Exposure to more than one type of software
can broaden the experience base for the student. There is no reason
to believe that the software in the course will be that which he or
she will be called upon to use in practice following graduation.
Many examples and case studies in the text are supplemented, where
appropriate, by various types of residual plots, quantile plots,
normal probability plots, and others. This is particularly
prevalent in the material used in Chapters 11 through 15. New to
This Edition General 1. There arc 15 20% new problem sets
incorporated, with many new applications demonstrated in
engineering as well as biological, physical, and computer science.
2. There is new and end-ofL chapter review material where
appropriate. This material emphasizes key ideas as well as risks
and hazards that the user of material covered in the chapter must
be aware of. This material will also provide demonstration of how
it is influenced by material in other chapters. 3. A new mini (and
optional) chapter on Bayesian statistics has been incorpo- rated.
The chapter will be a practical offering with applications
emphasized in many fields. 4. There are extensive additional
changes made throughout, based on need per- ceived by authors and
reviewers. The following outlines some specifics.
16. xviii Preface Chapter 1: Introduction to Statistics and
Data Analysis Chapter 1 contains a substantial amount of new
material. There is new exposition on the difference between
discrete and continuous measurements. Many illustra- tions are
given with particular real life applications of discrete
measurements (e.g., numbers of radioactive particles, the number of
personnel responsible for a partic- ular port facility, and the
number of oil tankers arriving each day at a port city). Special
attention is given to situations associated with binary data.
Examples are given in the biomedical field as well as quality
control. New concepts (for this text) are discussed in Chapter 1
which deal with proper- ties of a distribution or sample other than
those that characterize central tendency and variability. Quartiles
and, more generally, quantiles are defined and discussed. The
importance of experimental design and the advantages that it offers
is expanded beyond that of the seventh addition. In this
development important notions that are treated include
randomization, reduction of process variability, and interaction
among factors. The readers are exposed in the first chapter to
different types of statistical studies: the designed experiment,
the observational study, and the retrospective study. Examples are
given of each type and advantages and disadvantages are discussed.
The chapter continues to emphasize graphical procedures and where
they apply. Nineteen new exercises were added to Chapter 1. Some
make use of data from studies conducted at the Virginia Tech
consulting center and some are taken from engineering journals and
others involve historical data. This chapter now contains 30
exercises. Chapter 2: Probability There are new examples and new
exposition to better demonstrate the notion of conditional
probability. Chapter 2 offers 136 total exercises. All new
exercises involve direct applications in science and engineering.
Chapter 3: Random Variables and Probability Distributions There is
new exposition on the notion of "dummy" variables that, play an
important role in the Bernoulli and binomial distributions. There
are many more exercises with new applications. The new review at
the end of the chapter introduces the connection between material
in Chapter 3 with the concept of distribution param- eters and
specific probability distributions discussed in future chapters.
Topics for new exercises include particle size distribution for
missile fuel, mea- surement errors in scientific systems, studies
of time to failure for manufactured washing machines, the
production of electron tubes on an assembly line, arrival time
problems at certain big city intersections, shelf life of a
product, passenger congestion problems in airports, problems with
impurities in batches of chemical product, failure in systems of
electronic components working in parallel, and many others. There
are now 82 exercises in this chapter.
17. Preface x j x Chapter 4: Mathematical Expectation Several
more exercises were added to Chapter 4. Rules for expectations and
vari- ances of linear functions were expanded to cover
approximations for nonlinear functions. Examples are given to
illustrate the use of these rules. The review at the end of Chapter
4 reveals possible difficulties and hazards with practical
applications of the material since most examples and exercises
assume parameters (mean and variance) are known and in true
applications these parameters would be estimated. Reference is made
to Chapter 9, where estimation is discussed. There are now 103
exercises in this chapter. Chapter 5: Some Discrete Probability
Distributions New exercises representing various applications of
the Poisson distribution have been added. Additional exposition has
been added that deals with the Poisson probability function. New
exercises include real life applications of the Poisson, binomial,
and hy- pergeometric distributions. Topics for new exercises
include flaws in manufactured copper wire, highway potholes in need
of repair, patient traffic in an urban hos- pital, airport luggage
screening, homeland security detection of incoming missiles, and
many others. In addition, plots are given that provide the reader
with a clear indication about the nature of both the Poisson and
the binomial distribution as parameters change. There are now 105
exercises in this chapter. Chapter 6: Some Continuous Probability
Distributions Many more examples and exercises dealing in both the
exponential and the gamma distribution have been added. The "lack
of memory" property of the exponential distribution is now
discussed at length and related to the bond between the ex-
ponential and Poisson distributions. The section on the Weibull
distribution is greatly improved and expanded. The extensions
presented focus on the measuring and interpretation of the failure
rate or "hazard rate" and how knowledge of the parameters of the
Weibull allow the user to learn how machines wear or even get
stronger over time. More exercises are given that involve the
Weibull and lognormal distributions. Caution is expressed in the
review much like that in Chapter 5. In practical situations,
guesses or estimates of process parameters of the gamma dis-
tribution in, say, failure rate problems or parameters of either a
gamma or Weibull distribution, may be unstable, thereby introducing
errors in calculations. There are now 84 exercises in this chapter.
Chapter 7: Functions of Random Variables (optional) No major
changes are included in this optional chapter. Chapter 8:
Fundamental Distributions and Data Description There is additional
exposition on the central limit theorem as well as the general
concept of sampling distributions. There are many new exercises.
The summary
18. xx Preface provides important information on t. 2 -. and F.
including how they are used and what assumptions are involved. More
attention is given in Chapter 8 to normal probability plotting. In
addition, the central limit theorem is discussed in more detail in
order that the reader can gain more insight about what size n must
be before normality can be invoked. Plots are given to illustrate
this. Additional exposition is given regarding the normal
approximation to the bi- nomial distribution and how it works in
practical situations. The presentation presents an intuitive
argument that connects the normal approximation of the bi- nomial
to the central limit theorem. The number of exercises in this
chapter is now 75. Chapter 9: One- and Two-Sample Estimation
Problems Many new applications are revealed in new exercises in
this chapter. The summary gives rationale and hazards associated
with the so-called large sample confidence interval. The importance
of the assumption of normality and the conditions under which it is
assumed are discussed. Early in the chapter the development of
confidence intervals offers a pragmatic discussion about why one
must begin with the "known er' case. It is suggested that these
kinds of situations do not actually occur in practice but
consideration of the known s case initially provides a structure
that allows the more useful "unknown CT" to be understood more
easily by students. One-sided bounds of all types are now presented
and discussion is given as to when they are used as opposed to the
two-sided counterparts. New examples are given which require the
use of the one-sided intervals. These include confidence intervals,
prediction intervals, and tolerance intervals. The concept of a
mean squared error of an estimator is discussed. Thus the notion of
bias and variance can be brought together in the general comparison
of estimators. Twenty-seven new exercises are included in Chapter
9. There are now 111 exercises in this chapter. Chapter 10: One-
and Two-Sided Tests of Hypotheses We have an entirely restructured
exposition on the introduction to hypothesis testing. It is
designed to help the student have a clear picture of what is being
ac- complished and not being accomplished in hypothesis testing.
The notion that we rarely, if ever, "accept the null hypothesis''
is discussed with illustrations. There is also a thorough
discussion with examples, of how one should structure or set up the
null and alternative hypotheses. The notion that rejection implies
"sam- ple evidence refutes HQ" and that HQ is actually the logical
complement to Hi is discussed precisely with several examples. Much
is said about the concept of "fail to reject HQ'1 and what it means
in practical situations. The summary pro- duces "misconceptions and
hazards" which reveals problems in drawing the wrong conclusions
when the analyst "fails to reject" the null hypothesis. In
addition, "robustness" is discussed, which deals with the nature of
the sensitivity of various tests of hypotheses to the assumption of
normality. There are now 115 exercises in this chapter.
19. Preface xxi Chapter 11: Simple Linear Regression Many new
exercises are added in simple linear regression. Special exposition
is given to the pitfalls in the use of R2 , the coefficient of
determination. Much additional emphasis is given to graphics and
diagnostics dealing in regression. The summary deals with hazards
that one may encounter if diagnostics are not used. It is
emphasized that diagnostics provide "checks" on the validity of
assumptions. These diagnostics include data plots, plots of
student.ized residuals, and normal probability plots of residuals.
An important presentation is made early in the chapter about the
nature of linear models in science and engineering. It is pointed
out that these are often empirical models that are simplifications
of more complicated and unknown struc- tures. More emphasis is
given in this chapter on data plotting. "Regression through the
origin'' is discussed in an exercise. More discussion is given on
what it means when H0: /? = 0 is rejected or not rejected. Plots
are used for illustration. There are now 68 exercises in this
chapter. Chapter 12: Multiple Linear Regression Additional
treatment is given in this chapter on the pitfalls of R2 . The
discussion centers around the need to compromise between the
attempt to achieve a "good fit" to the data and the inevitable loss
in error degrees of freedom that is experienced when one
"overfits." In that regard the "adjusted R2 "' is defined and
discussed with examples. In addition, the CV (coefficient of
variation) is discussed and interpreted as a measure that can be
used to compare competing models. Several new exercises are present
to provide the reader experience in comparing competing models
using real data sets. Additional treatment is given to the topic of
"categorical regressors" with graphical tools used to support the
underlying concepts. Additional exercises are given to illustrate
practical uses of logistic regression in both industrial and
biomedical research areas. There are now 72 exercises in this
chapter. Chapter 13: One-Factor Experiments: General The discussion
of Tukey's test on multiple comparisons is expanded considerably.
More is presented on the notion of error rate and o>values in
the context of simul- taneous confidence intervals. A new and
important section is given on "Data Transformation in Analysis of
Variance." A contrast is made with the discussion in Chapters 11
and 12 dealing with transformation to produce a good fit in
regression. A brief presentation is given regarding the robustness
of analysis of variance to the assumption of homo- geneous
variance. This discussion is connected to previous sections on
diagnostic plots to detect violations in assumptions. Additional
mention is made of the root causes of violation of the homogeneous
variance assumption and how it is often a natural occurrence when
the variance is a function of the mean. Transformations are
discussed that can be used to ac- commodate the problem. Examples
and exercises are used for illustration. Several new exercises were
added. The total number of exercises in this chapter is 67.
20. xxn Preface Chapter 14: Factorial Experiments (Two or More
Factors) Considerable attention is given to the concept, of
interaction and interaction plots quite early in the chapter.
Examples are given in which scientific interpretations of
interaction are given using graphics. New exercises highlight the
use of graphics including diagnostic plots of residuals. Several
new exercises appear in this chapter. All include experimental data
from chemical and biological sciences and all include emphasis on
graphical analysis. There are 43 exercises in this chapter. Chapter
15: 2k Factorial Experiments and Fractions Early in this chapter
new material has been added to highlight and illustrate the role of
two-level designs as screening experiments. In this regard they are
often part of a sequential plan in which the scientist or engineer
is attempting to learn about the process, assess the role of the
candidate factors, and give insight that will aid in determining
the most fruitful region of experimentation. The notion of
fractional factorial designs is motivated early. The motivation of
the notion of "effects" and the graphical procedures that are used
in determining "active effects" are discussed in more detail with
examples. The chapter uses considerably more graphical
illustrations and geometric displays to motivate the concepts for
both full and fractional factorials. In addition, graph- ical
depictions are used to illustrate the available lack-of-fit
information when one augments the two-level design with center
runs. In the development and discussion of fractional factorial
designs, the procedure for constructing the fraction is greatly
simplified and made much more intuitively appealing. "Added
columns" that are selected according to the desired alias struc-
ture are used with several examples. We feel as if the reader can
now gain a better understanding of what is gained (and lost) by
using fractions. This represents a major simplification from the
previous edition. For the first time a substantial table is given
that allows the reader to construct two-level designs of resolution
HI and IV. Eighteen new exercises were added to this chapter. The
total number of exercises in this chapter is now 50. Chapter 16:
Nonparametric Statistics No major changes are included. The total
number of exercises is 41. Chapter 17: Statistical Quality Control
No major changes are included. The total number of exercises is 10.
Chapter 18: Bayesian Statistics (optional) This chapter is
completely new in the eighth edition. The material on Bayesian
statistics in the seventh edition (in Chapter 9) was removed in
favor of featuring this subject in a new self-contained chapter.
This chapter treats the pragmatic and highly useful elements of
Bayesian statis- tics of which students in science and engineering
should be aware. The chapter
21. Preface xxiii presents the important concept of subjective
probability in conjunction with the notion that in many
applications population parameters are truly not constant but
should be treated as random variables. Point and interval
estimation is treated from a Bayesian point of view and practical
examples are displayed, This chapter is relatively short (ten
pages) and contains 9 examples and 11 exercises. Acknowledgements
We are indebted to those colleagues who reviewed the previous
editions of this book and provided many helpful suggestions for
this edition. They are: Andre Adler, Illinois institute of
Technology. Georgiana Baker, University of South Carolina-, Barbara
Bonnie, University of Minnesota-. Nirmal Devi, Embry Riddle; Ruxu
Du, University of Miami; Stephanie Edwards. Demidji State
University. Charles McAl- lister, Louisiana State University;
Judith Miller, Georgetown University, Timothy Raymond, Bucknell
University; Dennis Webster, Louisiana State University; Blake
Whitten, University of Iowa; Michael Zabarankin, Stevens Institute
of Technology. We would like to thank the editorial and production
services provided by nu- merous people from Prentice Hall,
especially the editor in chief Sally Yagan, pro- duction editor
Lynn Savino Wcndel, and copy editor Patricia Daly. Many useful
comments, suggestions and proof-readings by Richard Charnigo. Jr.,
Michael An- derson, Joleen Beltrami and George Lobcll are greatly
appreciated. We thank the Virginia Tech Statistical Consulting
Center which was the source of many real-life data sets. In
addition we thank Linda Douglas who worked hard in helping the
preparation of the manuscript. R.H.M. S.L.M. K.Y.
22. Chapter 1 Introduction to Statistics and Data Analysis 1.1
Overview: Statistical Inference, Samples, Populations, and
Experimental Design Beginning in the 1980s and continuing into the
twenty-first century: an inordi- nate amount of attention has been
focused on improvement of quality in American industry. Much has
been said and written about the Japanese "industrial mira- cle,"
which began in the middle of the twentieth century. The Japanese
were able to succeed where we and other countries had
failed-namely, to create an atmo- sphere that allows the production
of high-quality products. Much of the success of the Japanese has
been attributed to the use of statistical methods and statistical
thinking among management personnel. Use of Scientific Data The use
of statistical methods in manufacturing, development of food
products, computer software, pharmaceutical, and many other areas
involves the gathering of information or scientific data. Of
course, the gathering of data is nothing new. It has been done for
well over a thousand years. Data have been collected, summa- rized,
reported, and stored for perusal. However, there is a profound
distinction between collection of scientific information and
inferential statistics. It is the latter that has received rightful
attention in recent decades. The offspring of inferential
statistics has been a large "toolbox" of statistical methods
employed by statistical practitioners. These statistical methods
are de- signed to contribute to the process of making scientific:
judgments in the face of uncertainty and variation. The product
density of a particular material from a manufacturing process will
not always be the same. Indeed, if the process in- volved is a
batch process rather than continuous, there will be variation in
material density between not only the batches (batch-to-batch
variation) that come off the line, but also within-batch variation.
Statistical methods are used to analyze data from a process such as
this one in order to gain more sense of where in the prr cess
changes may be made to improve the quality of the process. In this,
qur"
23. Chapter 1 Introduction to Statistics and Data Analysis may
well be defined in relation to closeness to a target density value
in harmony with what portion of the time this closeness criterion
is met. An engineer may be concerned with a specific instrument
that is used to measure sulfur monoxide in the air during pollution
studies. If the engineer has doubts about the effectiveness of the
instrument, there are two sources of variation that must be dealt
with. The first is the variation in sulfur monoxide values that are
found at the same locale on the same day. The second is the
variation between values observed and the true sulfur monoxide that
is in the air at the time. If either of these two sources of
variation is exceedingly large (according to some standard set by
the engineer), the instrument may need to be replaced. In a
biomedical study of a new drug that reduces hypertension, 85% of
patients experienced relief while it is generally recognized that
the current or "old drug" brings relief to 80% of patients that
have chronic hypertension. However, the new drug is more expensive
to make and may result in certain side effects. Should the new drug
be adopted? This is a problem that is encountered (often with much
more complexity) frequently by pharmaceutical firms in harmony with
the FDA (Federal Drug Administration). Again, the consideration of
variation needs to be taken into account. The "85%" value is based
on a certain number of patients chosen for the study. Perhaps if
the study were repeated with new patients the observed number of
"successes" would be 75%! It is the natural variation from study to
study that must be taken into account in the decision process.
Clearly this variation is important since variation from patient to
patient is endemic to the problem. Variability in Scientific Data
In the problems discussed above the statistical methods used
involve dealing with variability and in each case the variability
to be studied is that encountered in scientific data. If the
observed product density in the process is always the same and is
always on target, there would be no need for statistical methods.
If the device for measuring sulfur monoxide always gives the same
value and the value is accurate (i.e., it is correct), no
statistical analysis is needed. If there was no patient-to-patient
variability inherent in the response to the drug (i.e., it either
always brings relief or not), life would be simple for scientists
in the pharmaceu- tical firms and FDA and no statistician would be
needed in the decision process. Inferential statistics has produced
an enormous number of analytical methods that allow for analysis of
data from systems like those described above. This reflects the
true nature of the science that we call inferential statistics,
namely that of using techniques that allow us to go beyond merely
reporting data but, rather, allow the drawing of conclusions (or
inferences) about the scientific system. Statisticians make use of
fundamental laws of probability and statistical inference to draw
con- clusions about scientific systems. Information is gathered in
the form of samples, or collections of observations. The process of
sampling is introduced in Chapter 2 and the discussion continues
throughout the entire book. Samples are collected from populations
that are collections of all individuals or individual items of a
particular type. At times a population signifies a scientific
system. For example, a manufacturer of computer boards may wish to
eliminate defects. A sampling process may involve collecting
information on 50 computer boards sampled randomly from the
process. Here, the population is all computer
24. 1.1 Overview: Statistical Inference, Samples, Populations
and Experimental Design 3 boards manufactured by the firm over a
specific period of time. In a drug experi- ment, a sample of
patients is taken and each is given a specific drug to reduce blood
pressure. The interest is focused on drawing conclusions about the
population of those who suffer from hypertension. If an improvement
is made in the computer board process and a second sample of boards
is collected, any conclusions drawn regarding the effectiveness of
the change in process should extend to the entire population of
computer boards produced under the "improved process." Often, it is
very important to collect scientific data in a systematic way, with
planning being high on the agenda. At times the planning is, by
necessity, quite limited. We often focus only on certain properties
or characteristics of the items or objects in the population. This
characteristic has particular engineering or, say, bi- ological
importance to the "customer," the scientist or engineer who seeks
to learn about the population. For example, in one of the
illustrations above the quality of the process had to do with the
product density of the output of a process. An engineer may need to
study the effect of process conditions, temperature, humidity,
amount of a particular ingredient, and so on. He or she can
systematically move these factors to whatever levels are suggested
according to whatever prescription or experimental design is
desired. However, a forest scientist who is interested in a study
of factors that influence wood density in a certain kind of tree
cannot necessarily design an experiment. In this case it may
require an observational study in which data are collected in the
field but factor levels could not be pre- selected. Both of these
types of studies lend themselves to methods of statistical
inference. In the former, the quality of the inferences will depend
on proper plan- ning of the experiment. In the latter, the
scientist is at the mercy of what can be gathered. For example, it
is sad if an agronomist is interested in studying the effect of
rainfall on plant yield and the data are gathered during a drought.
One should gain an insight into the importance of statistical
thinking by man- agers and the use of statistical inference by
scientific personnel. Research scientists gain much from scientific
data. Data provide understanding of scientific phenom- ena. Product
and process engineers learn more in their off-line efforts to
improve the process. They also gain valuable insight by gathering
production data (on- line monitoring) on a regular basis. This
allows for determination of necessary modifications in order to
keep the process at a desired level of quality. There are times
when a scientific practitioner wishes only to gain some sort of
summary of a set of data represented in the sample. In other words,
no infer- ential statistics are used. Rather a set of single-number
statistics or descriptive statistics is helpful. These numbers give
a sense of center of location of the data, variability in the data,
and the general nature of the distribution of observations in the
sample. Though no specific statistical methods leading to
statistical in- ference are incorporated, much can be learned. At
times, descriptive statistics are accompanied by graphics. Modern
statistical software packages allow for com- putation of means,
medians, standard deviations, and other single-number statistics as
well as produce graphs that show a "footprint" of the nature of the
sample. Definitions and illustrations of the single-number
statistics, as well as descriptions of graphical methods including
histograms, stem-and-leaf plots, dot plots, and box plots, will be
given in sections that follow.
25. Chapter 1 Introduction to Statistics and Data Analysis 1.2
The Role of Probability In this book, Chapters 2 to 6 deal with
fundamental notions of probability. A thor- ough grounding in these
concepts allows the reader to have a better understanding of
statistical inference. Without some formalism in probability, the
student can- not appreciate the true interpretation of data
analysis through modern statistical methods. It is quite natural to
study probability prior to studying statistical infer- ence.
Elements of probability allow us to quantify the strength or
"confidence" in our conclusions. In this sense, concepts in
probability form a major component that supplements statistical
methods and help gauge the strength of the statisti- cal inference.
The discipline of probability, then, provides the transition
between descriptive statistics and inferential methods. Elements of
probability allow the conclusion to be put into the language that
the science or engineering practitioners require. An example
follows that enables the reader to understand the notion of a
P-value, which often provides the "bottom line" in the
interpretation of results from the use of statistical methods.
Example 1.1:1 Suppose that an engineer encounters data from a
manufacturing process in which 100 items are sampled and 10 are
found to be defective. It is expected and antic- ipated that
occasionally there will be defective items. Obviously these 100
items represent the sample. However, it has been determined that in
the long run, the company can oidy tolerate 5% defective in the
process. Now, the elements of prob- ability allow the engineer to
determine how conclusive the sample information is regarding the
nature of the process. In this case the population conceptually
represents all possible items from the process. Suppose we learn
that if the process is acceptable, that is, if it does produce
items no more than 5% of which are defec- tive, there is a
probability of 0.0282 of obtaining 10 or more defective items in a
random sample of 100 items from the process. This small probability
suggests that the process does, indeed, have a long-run percent
defective that exceeds 5%. In other words, under the condition of
an acceptable process, the sample information obtained would rarely
occur. However, it did occur! Clearly, though, it would occur with
a much higher probability if the process defective rate exceeded 5%
by a significant amount. J From this example it becomes clear that
the elements of probability aid in the translation of sample
information into something conclusive or inconclusive about the
scientific system. In fact, what was learned likely is alarming
information to the engineer or manager. Statistical methods (which
we will actually detail in Chapter 10) produced a P-value of
0.0282. The result suggests that the process very likely is not
acceptable. The concept of a P-value is dealt with at length in
succeeding chapters. The example that follows provides a second
illustration. Example 1.2:1 Often the nature of the scientific
study will dictate the role that probability and deductive
reasoning play in statistical inference. Exercise 9.40 on page 297
provides data associated with a study conducted at the Virginia
Polytechnic Institute and State University on the development, of a
relationship between the roots of trees and the action of a fungus.
Minerals are transferred from the fungus to the trees and sugars
from the trees to the fungus. Two samples of 10 northern red oak
seedlings
26. 1.2 The Role of Probability are planted in a greenhouse,
one containing seedlings treated with nitrogen and one containing
no nitrogen. All other environmental conditions are held constant.
All seedlings contain the fungus Pisolithus tinctorus. More details
are supplied in Chapter 9. The stem weights in grams were recorded
after the end of 140 days. The data are given in Table 1.1. Table
1.1: Data Set for Example 1.2 No Nitrogen 0.32 0.53 0.28 0.37 0.47
0.43 0.36 0.42 0.38 0.43 Nitrogen 0.26 0.43 0.47 0.49 0.52 0.75
0.79 0.86 0.62 0.46 In this example there are two samples from two
separate populations. The purpose of the experiment is to determine
if the use of nitrogen has an influence on the growth of the roots.
The study is a comparative study (i.e.. we seek to compare the two
populations with regard to a certain important characteristic). It
is instructive to plot the data as shown in Figure 1.1. The o
values represent the "with nitrogen" data and the x values
represent the "without nitrogen" data. Now, the purpose of this
experiment is to determine whether the use of nitrogen has an
influence on the growth of the roots. Notice that the general
appearance of the data might suggest to the reader that, on
average, the use of nitrogen increases the stem weight. Four
nitrogen observations are considerably larger than any of the
no-nitrogen observations. Most of the no-nitrogen observations
appear to be below the center of the data. The appearance of the
data set would seem to indicate that nitrogen is effective. But how
can this be quantified? How can all of the apparent visual evidence
be summarized in some sense? As in the preceding example, the
fundamentals of probability can be used. The conclusions may be
summarized in a probability statement or P-value. Wc will not show
here the statistical inference that produces the summary
probability. As in Example 1.1, these methods will be discussed in
Chapter 10. The issue revolves around the "probability that data
like these could be observed" given that nitrogen has no effect, in
other words, given that both samples were generated from the same
population. Suppose that this probability is small, say 0.03. That
would certainly be strong evidence that the use of nitrogen does
indeed influence (apparently increases) average stem weight of the
red oak seedlings. J How Do Probability and Statistical Inference
Work Together? It. is important for the reader to gain the clear
distinction between the discipline of probability, a science in its
own right, and the discipline of inferential statistics. As
27. Chapter 1 Introduction to Statistics and Data Analysis o
0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
0.90 Figure 1.1: Stern weight data.. we have already indicated, the
use or application of concepts in probability allows real-life
interpretation of the results of statistical inference. As a
result, it can be said that statistical inference makes use of
concepts in probability. One can glean from the two examples above
that, the sample information is available; to the analyst and, with
the aid of statistical methods and elements of probability,
conclusions are made about, some feature of the population (the
process does not appear to be acceptable in Example 1.1 and
nitrogen does influence average stem weights in Example 1.2). Thus
for a statistical problem the sample along with inferen- tial
statistics allows us to draw conclusions about the population, with
inferential statistics making clear use of elements of probability.
This reasoning is inductive in nature. Now as we move into Chapter
2 and beyond, the reader will note' thai unlike our two examples
here, we will not focus on solving statistical problems. Many
examples will be given in which no sample is involved. There will
be a population clearly described with all features of the
population known. Then questions of importance will focus on the
nature of data that might hypothetical]}' be drawn from the
population. Thus one can say that problems in probability allow us
to draw conclusions about characteristics of hy- pothetical data
taken from the population based on known features of the
population. This type of reasoning is deductive in nature. Figure
1.2 shows the fundamental relationship between probability and
inferential statistics. Probability Population Sample Statistical
Inference Figure 1.2: Fundamental relationship between probability
and inferential statistics. Now, in the grand scheme of things,
which is more important, the field of probability or the field of
statistics? They are both very important and clearly arc
complementary. The only certainty concerning the pedagogy of the
two disciplines lies in the fact that if statistics is to be
taught: at more than merely a "cookbook" level, then the discipline
of probability must be taught first. This rule stems from the fact
that nothing can be learned about a population from a sample until
the analyst learns the rudiments of uncertainty in that sample. For
example, consider Example 1.1. The question centers around whether
or not the population, defined
28. 1.3 Sampling Procedures; Collection of Data by the process,
is no more than 5% defective. In other words the conjecture is that
on the average 5 out of 100 items are defective. Now, the sample
contains 100 items and 10 are defective. Does this support the
conjecture or refute it? On the surface it would appear to be a
refutation of the conjecture because. 10 out of 100 seem to be "a
bit much." But without elements of probability, how do we know'.'
Only through the study of materia] in future chapters will we learn
that under the condition that the process is acceptable (5%
defective), the probability of obtaining 10 or more defective items
in a sample of 100 is 0.0282. We have given two examples where the
elements of probability provide a sum- mary that the scientist or
engineer can use as evidence on which to build a decision. The
bridge between the data and the conclusion is, of course, based on
foundations of statistical inference, distribution theory, and
sampling distributions discussed in future; chapters. 1.3 Sampling
Procedures; Collection of Data In Section 1.1 we discussed very
briefly the notion of sampling and the sampling process. While
sampling appears to be a simple concept, the complexity of the
questions that must be answered about the population or populations
necessitates that the sampling process be very complex at times.
While the notion of sampling is discussed in a technical way in
Chapter 8, we shall endeavor here to give: some common sense
notions of sampling, This is a natural transition to a discussion
of the concept of variability. Simple Random Sampling The
importance of proper sampling revolves around the degree of
confidence with which the analyst is able to answer the questions
being asked. Let us assume that only a single population exists in
the problem. Recall that in Example 1.2 two populations were
involved. Simple random sampling implies that any particular
sample: of a specified sample size has the same chance of being
selected as any other sample of the same size. The term sample size
simply means the number of elements in the sample. Obviously, a
table of random numbers can be utilized in sample selection in many
instances. The virtue of simple random sampling is that it aids in
the elimination of the problem of having the sample reflect a
different (possibly more confined) population than the one about
which inferences need to be made. For example, a sample is to be
chosen to answer certain questions regarding political preferences
in a. certain state in the United States. The sample involves the
choice of, say, 1000 families and a survey is to be conducted. Now,
suppose it turns out that random sampling is not used. Rather, all
or nearly all of the 1000 families chosen live in an urban setting.
It is believed that political preferences in rural areas differ
from those in urban areas. In other words, the sample drawn
actually confined the population and thus the inferences need to be
confined to the "limited population," and in this case confining
may be undesirable. If, indeed, the inferences need to be made
about the state as a whole, the sample of size 1000 described here
is often referred to as a. biased sample.
29. Chapter 1 Introduction to Statistics and Data Analysis As
we hinted earlier, simple random sampling is not always
appropriate. Which alternative approach is used depends on the
complexity of the problem. Often, for example, the sampling units
are not homogeneous and naturally divide themselves into
nonoverlapping groups that are homogeneous. These groups are called
strata, and a procedure called stratified random sampling involves
random selection of a sample within each stratum. The purpose is to
be sure that each of the strata is neither over- or
undcrrepresented. For example, suppose a sample survey is conducted
in order to gather preliminary opinions regarding a bond referendum
that is being considered in a certain city. The city is subdivided
into several ethnic groups which represent natural strata. In order
not to disregard or overrepreseut any group, separate random
samples of families could be chosen from each group. Experimental
Design The concept of randomness or random assignment plays a huge
role in the area of experimental design, which was introduced very
briefly in Section 1.1 and is an important staple in almost any
area of engineering or experimental science. This will be discussed
at length in Chapters 13 through 15. However, it is instructive to
give a brief presentation here in the context of random sampling. A
set of so-called treatments or treatment combinations becomes the
populations to be studied or compared in some sense. An example: is
the "nitrogen" versus "no-nitrogen" treatments in Example 1.2.
Another simple example would be "placebo" versus "active drug," or
in a corrosion fatigue study we might have treatment combina- tions
that involve specimen that are coated or uncoated as well as
conditions of low or high humidity to which the specimen are
exposed. In fact, there are four treat- ment or factor combinations
(i.e., 4 populations), and many scientific questions may be asked
and answered through statistical and inferential methods. Consider
first the situation in Example 1.2. There arc 20 diseased seedlings
involved in the experiment. It. is easy to see from the data
themselves that the seedlings arc different from each other. Within
the nitrogen group (or the no-nitrogen group) there is considerable
variability in the stem weights. This variability is due to what is
generally called the experimental unit. This is a very important
concept in inferential statistics, in fact one whose description
will not end in this chapter. The nature of the variability is very
important. If it is too large, stemming from a condition of
excessive non-homogeneity in experimental units, the variability
will "wash out" any detectable difference between the two
populations. Recall that in this case that did not occur. The dot
plot in Figure 1.1 and P-value indicated a clear distinction
between these two conditions. What role do those experimental units
play in the data- taking process itself? The common-sense and,
indeed, quite standard approach is to assign the 20 seedlings or
experimental units randomly to the two treat- ments or conditions.
In the drug study we may decide to use a total of 200 available
patients, patients that clearly will be different in some sense.
They are the experimental units. However, they all may have the
same chronic condition for which the drug is a potential treatment.
Then in a so-called completely ran- domized design 100 patients are
assigned randomly to placebo and 100 to the active drug. Again, it
is these experimental units within a group or treatment that
produce the variability in data results (i.e., variability in the
measured result).
30. 1.3 Sampling Procedures; Collection of Data 9 say blood
pressure, or whatever drug efficacy value is important. In the
corrosion fatigue study the experimental units are the specimen
that are the subjects of the corrosion. Why Assign Experimental
Units Randomly? What is the possible negative impact of not
randomly assigning experimental units to the treatments or
treatment combinations? This is seen most clearly in the case of
the drug study. Among the characteristics of the patients that
produce variability in the results are age, gender, weight, and
others. Suppose merely by chance the placebo group contains a
sample of people that are predominately heavier than those in the
treatment group. Perhaps heavier individuals have a tendency to
have a higher blood pressure. This clearly biases the result and,
indeed, any result obtained through the application of statistical
inference may have little to do with the drug but more to do with
differences in weights among the two samples of patients. We should
emphasize the attachment of importance to the term variability.
Excessive variability among experimental units "camouflages"
scientific findings. In future sections we attempt to characterize
and quantify measures of variability. In sections that follow we
introduce and discuss specific quantities that can be computed in
samples; the quantities give a sense of the nature of the sample
with respect to center of location of the data and variability in
the data, A discussion of several of these single number measures
serves to provide a preview of wdiat statistical information will
be important components of the statistical methods that are used in
Chapters 8 through 15. These measures that help characterize the
nature of the data set fall into the category of descriptive
statistics. This material is a prelude to a brief presentation of
pictorial and graphical methods that go even further in
characterization of the data set. The reader should understand that
the statistical methods illustrated here will be used throughout
the text. In order to offer the reader a clearer picture of what is
involved in experimental design studies, we offer Example 1.3.
Example 1.3:1 A corrosion study was made in order to determine
whether corrosion of an alu- minum metal coated with a corrosion
retardation substance reduced the amount of corrosion. The coating
is a protectant that is advertised to minimize fatigue damage in
this type of material. Also of interest is the influence of
humidity on the amount of corrosion. A corrosion measurement can be
expressed in thousands of cycles to failure. Two levels of coating,
no coating and chemical corrosion coating, were used. In addition,
the two relative humidity levels are 20% relative humidity and 80%
relative humidity. The experiment involves four treatment
combinations that axe listed in the table that follows. There are
eight experimental units to be used and they are alu- minum
specimens prepared, with two assigned randomly to each of four
treatment combinations. The data are presented in Table 1.2. The
corrosion data are averages of 2 specimens. A plot of the averages
is pictured in Figure 1.3. A relatively large value of cycles to
failure represents a small amount of corrosion. As one might
expect, an increase in humidity appears
31. 10 Chapter 1 Introduction to Statistics and Data Analysis
Table 1.2: Data for Example 1.3 Average Corrosion in Coating
Humidity Thousands of Cycles to Failure Uncoated Chemical Corrosion
20% 80% 20% 80% 975 350 1750 1550 2000 O 1000
32. 1.4 Measures of Location: The Sample Mean and Median 11 (1)
random assignment of treatment combinations (coating/humidity) to
experi- mental units (specimens) (2) the use of sample averages
(average corrosion values) in summarizing sample information (3)
the need for consideration of measures of variability in the
analysis of any sample or sets of samples This example suggests the
need for what follows in Sections 1.4 and 1.5, namely, descriptive
statistics that indicate measures of center of location in a set of
data, and those that measure variability. 1.4 Measures of Location:
The Sample Mean and Median Location measures in a data set are
designed to provide the analyst some quantita- tive measure of
where the data center is in a sample. In Example 1.2 it appears as
if the center of the nitrogen sample clearly exceeds that of the
no-nitrogen sample. One obvious and very useful measure is the
sample mean. The mean is simply a numerical average. Definition
1.1: Suppose that the observations in a sample are x denoted by x
is _ _ v^ x% _ an + %i + f-i n. n I . . - C 2 , . . + X , xn. The
sample mean, There are other measures of central tendency that are
discussed in detail in future chapters. One important measure is
the sample median. The purpose of the sample median is to reflect
the central tendency of the sample in such a way that it is
uninfluenced by extreme values or outliers. Given that the
observations in a sample are xj, X2> , xn, arranged in
increasing order of magnitude, the sample median is - _ J r
("+i)/2: if n is odd, X = $(xn/2 + En/2+l)5 n n i s c v c n - For
example, suppose the data set is the following: 1.7, 2.2, 3.9,
3.11, and 14.7. The sample mean and median are, respectively, =
5.12, = 3.9. Clearly, the mean is influenced considerably by the
presence of the extreme obser- vation, 14.7, whereas the median
places emphasis on the true "center" of the data set. In the case
of the two-sample data set of Example 1.2, the two measures of
33. 12 Chapter 1 Introduction to Statistics and Data Analysis
central tendency for the individual samples are X (no nitrogen) =
0.399 gram, . , . , 0.38 + 0.42 x (no nitrogen) = 0.400 gram, X
(nitrogen) = 0.5G5 gram, 0,19 + 0.52 x (nitrogen) = = 0.505 gram.
Clearly there is a difference in concept between the mean and
median. It may be of interest to the reader with an engineering
background that the sample mean is the centroid of the data in a
sample. In a sense it is the point at which a fulcrum can be placed
to balance a system of "weights" which are the locations of the
individual data. This is shown in Figure 1.4 with regard to the
"with nitrogen" sample. x = 0.565 i - + - H H I 1 +- 0.25 0.30 0.35
0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 Figure 1.4:
Sample mean as a centroid of the "with nitrogen" stem weight. In
future chapters, the basis for the computation of x is that of an
estimate of the population mean. As we indicated earlier, the
purpose of statistical infer- ence is to draw conclusions about
population characteristics or parameters and estimation is a very
important feature of statistical inference. The median and mean can
be quite different from each other. Note, however, that in the case
of the stem weight data the sample mean value for no-nitrogen is
quite similar to the median value. Other Measures of Locations
There are several other methods of quantifying the center of
location of the data in the sample. We will not deal with them at
this point. For the most part, alternatives to the sample mean are
designed to produce values that represent compromises between the
mean and the median. Rarely do we make use of these other measures.
However, it is instructive to discuss one class of estimators,
namely the class of trimmed means. A trimmed mean is computed by
"trimming away" a certain percent of both the largest and smallest
set of values. For example, the 10% trimmed mean is found by
eliminating the largest 10% and smallest 10% and computing the
average of the remaining values. For example, in the case of the
stem weight data we would eliminate the largest and smallest since
the sample size is 10 for each sample. So for the without-nitrogen
group the 10% trimmed mean is given by 0.32 + 0.37 + 0.47 + 0.36 +
0.42 + 0.38 + 0.43 xtr(10) = 5 = 0.39(50.
34. Exercises 13 Exercises and for the 10% trimmed mean for the
with-nitrogen group we have 0.43 + 0.47 + 0.49 + 0.52 + 0.75 + 0.79
+ 0.62 + 0.46 ^trcio) 8 = 0.56625. Note that in this case, as
expected, the trimmed means are close to both the mean and median
for the individual samples. The trimmed mean approach is, of
course, more insensitive to outliers than the sample mean but not
as insensitive as the median. On the other hand the trimmed mean
approach makes use of more information. Note that the sample median
is, indeed, a special case of the trimmed mean in which all of the
sample data are eliminated apart from the middle one or two
observations. 1.1 The following measurements were recorded for the
drying time, in hours, of a certain brand of latex paint. 3.4 2.5
4.8 2.9 3.6 2.8 3.3 5.6 3.7 2.8 4.4 4.0 5.2 3.0 4.8 Assume that the
measurements are a simple random sample. (a) What is the sample
size for the above sample? (b) Calculate the sample mean for this
data. (c) Calculate the sample median. (d) Plot the data by way of
a dot plot. (e) Compute the 20% trimmed mean for the above data
set. 1.2 According to the journal Chemical Engineering, an
important property of a fiber is its water ab- sorbency. A random
sample of 20 pieces of cotton fiber is taken and the absorbency on
each piece was measured. The following are the absorbency values:
18.71 21.41 20.72 21.81 19.29 22.43 20.17 23.71 19.44 20.50 18.92
20.33 23.00 22.85 19.25 21.77 22.11 19.77 18.04 21.12. (a)
Calculate the sample mean and median for the above sample values.
(b) Compute the 10% trimmed mean. (c) Do a dot plot of the
absorbency data. 1.3 A certain polymer is used for evacuation
systems for aircraft. It is important that the polymer be re-
sistant to the aging process. Twenty specimens of the polymer were
used in an experiment. Ten were as- signed randomly to be exposed
to the accelerated batch aging process that involved exposure to
high tempera- tures for 10 days. Measurements of tensile strength
of 218 229 215 204 217 228 211 201 225 221 209 205 the specimens
were made and the following data were recorded on tensile strength
in psi. No aging: 227 222 218 216 Aging: 219 214 218 203 (a) Do a
dot plot of the data. (b) From your plot, does it appear as if the
aging pro- cess has had an effect on the tensile strength of this
polymer? Explain. (c) Calculate the sample mean tensile strength of
the two samples. (d) Calculate the median for both. Discuss the
simi- larity or lack of similarity between the mean and median of
each group. 1.4 In a study conducted by the Department of Me>
chanical Engineering at Virginia Tech, the steel rods supplied by
two different companies were compared. Ten sample springs were made
out of the steel rods supplied by each company and a measure of
flexibility was recorded for each. The data are as follows: Company
A: 9.3 8.8 6.8 8.7 8.5 6.7 8.0 6.5 9.2 7.0 Company B: 11.0 9.8 9.9
10.2 10.1 9.7 11.0 11.1 10.2 9.6 (a) Calculate the sample mean and
median for the data for the two companies. (b) Plot the data for
the two companies on the same line and give your impression. 1.5
Twenty adult males between the ages of 30 and 40 were involved in a
study to evaluate the effect of a specific health regimen involving
diet and exercise on the blood cholesterol. Ten were randomly
selected to be a control group and ten others were assigned to
take
35. Control group: Treatment group: 7 5 - 6 12 3 22 5 37 - 4 7
9 5 11 9 4 3 2.07 2.05 2.52 1.99 2.14 2.18 2.15 2.42 2.22 2.09 2.49
2.08 2.03 2.14 2.03 2.42 2.21 2.11 2.37 2.29 2.03 2.02 2.05 2.01 14
Chapter 1 Introduction to Statistics and Data Analysis part in the
regimen as the treatment group for a period to be a function of
curing temperature. A study was of 6 months. The following data
show the reduction in carried out in which samples of 12 specimens
of the rub- cholesterol experienced for the time period for the 20
ber were prepared using curing temperatures of 20 C subjects: and
45 C. The data below show the tensile strength values in
megapascals. 20 C: 45 C: (a) Do a dot plot of the data for both
groups on the same graph. (a) Show a dot plot of the data with both
low and high (b) Compute the mean, median, and 10% trimmed
temperature tensile strength values. means for both groups. (b)
Compute sample mean tensile strength for both (c) Explain why the
difference in the mean suggests samples. one conclusion about the
effect of the regimen, (c) Docs it appear as if curing temperature
has an in- while the difference in medians or trimmed means fluence
on tensile strength based on the plot? Com- suggests a different
conclusion. ment further. (d) Does anything else appear to be
influenced by an 1.6 The tensile strength of silicone rubber is
thought increase in cure temperature? Explain. 1.5 Measures of
Variability Sample variability plays an important role in data
analysis. Process and product variability is a fact of life in
engineering and scientific: systems: The control or reduction of
process variability is often a source of major difficulty. More and
more process engineers and managers are learning that product
quality, and as a result, profits that are derived from
manufactured products are very much a function of process
variability. As a result, much of Chapters 9 through 15 deal with
data analysis and modeling procedures in which sample variability
plays a major role. Even in small data analysis problems, the
success of a particular statistical method may depend on the
magnitude of the variability among the observations in the sample.
Measures of location in a sample do not provide a proper summary of
the nature of a data set. For instance, in Example 1.2 we cannot
conclude that the use of nitrogen enhances growth without taking
sample variability into account. While the details of the analysis
of this type of data set are deferred to Chapter 9, it should be
clear from Figure 1.1 that variability among the "no-nitrogen"
observations and variability among the "with-nitrogen" observations
are certainly of some consequence. In fact, it appears that the
variability within the nitrogen sample is larger than that of the
no-nitrogen sample. Perhaps there is something about the inclusion
of nitrogen that not only increases the stem height (x of 0.565
gram compared to an x of 0.399 gram for the no-nitrogen sample) but
also increases the variability in stem height (i.e., renders the
stem height more inconsistent). As another example, contrast the
two data sets below. Each contains two samples and the difference
in the means is roughly the same for the two samples: Data set B
seems to provide a much sharper contrast between the two
populations from which the samples were taken. If the purpose of
such an experiment, is to detect distinction between the two
populations, the task is accomplished in the case of data set B.
However, in data set A the large variability within the two samples
creates difficulty. In fact, it is not clear that there is a
distinction bet-ween
36. 1.5 Measures of Variability 15 Data set A: Data set B: X X
X X X X O X X O O X X X O O O O O O O O i i 1 I xx x0 X X X X X X X
X X X X 0 0 0 0 0 0 0 0 0 0 0 1 1 J. J_ *x X0 the two populations.
Sample Range and Sample Standard Deviation Just as there are many
measures of central tendency or location there are many measures of
spread or variability. Perhaps the simplest one is the sample range
^ m a x - ^ m i n . The range can be very useful and is discussed
at length in Chapter 17 on statistical quality control. The sample
measure of spread that is used most often is the sample standard
deviation. We again let Xi,X2,...,xn denote sample values;
Definition 1.2: The sample variance, denoted by s2 , is given by
(Xi - x)2 =i = i n - 1 The sample standard deviation, denoted by s,
is the positive square root of s2 , that is, s = Vs2 . It should be
clear to the reader that the sample standard deviation is, in fact,
a measure of variability. Large variability in a data set produces
relatively large values of (x x)2 and thus a large sample variance.
The quantity n 1 is often called the degrees of freedom associated
with the variance estimate. In this simple example the degrees of
freedom depict the number of independent pieces of information
available for computing variability. For example, suppose that we
wish to compute the sample variance and standard deviation of the
data set (5, 17, 6, 4). The sample average is x = 8. The
computation of the variance involves: (5 - 8)2 + (17 - 8)2 + (6 -
8)2 + (4 - 8)2 = (-3)2 + 92 + (-2)2 + (-4)2 . The quantities inside
parentheses sum to zero. In general, Yl (x i ~ &) = 0 (see t=t
Exercise 1.16 on page 28). Then the computation of a sample
variance does not involve n independent squared deviations from the
mean x. In fact, since the last value of x x is determined by the
initial n 1 of them, we say that these are n 1 "pieces of
information" that produce s2 . Thus there are n 1 degrees of
freedom rather than n degrees of freedom for computing a sample
variance.
37. 16 Chapter 1 Introduction to Statistics and Data Analysis
Example 1.4:1 In an example discussed extensively in Chapter 10, an
engineer is interested in testing the "bias" in a pH meter. Data
arc collected on the meter by measuring the pH of a neutral
substance (pH 7.0). A sample of size 10 is taken with results given
by 7.07 7.00 7.10 6.97 7.00 7.03 7.01 7.01 6.98 7.08. The sample
mean x is given by . = 7 . 0 7 + 7 . 0 0 + 7 . 1 0 + - + 7.08 = ? ^
The sample variance s2 is given by s2 =I[(7.07 - 7.025)2 + (7.00 -
7.025)2 + (7.10 - 7.025)2 + + (7.08 - 7.025)2 ] = 0.001939. As a
result, the sample standard deviation is given by s = /0.00193 =
0.044. So the sample standard deviation is 0.0440 with n 1 = 9
degrees of freedom. J Units for Standard Deviation and Variance It
should be apparent from Definition 1.2 that the variance is a
measure of the average squared deviation from the mean x. We use
the term average squared deviation even though the definition makes
use of a division by degrees of freedom n 1 rather than n. Of
course, if n is large the difference in the denominator is
inconsequential. As a result, the sample variance possesses units
that are the square of the units in the observed data whereas the
sample standard deviation is found in linear units. As an example,
consider the data of Example 1.2. The stem weights are measured in
grams. As a result, the sample standard deviations are in grams and
the variances are measured in grams . In fact, the individual
standard deviations are 0.0728 grams for the no-nitrogen case and
0.1867 grams for the nitrogen group. Note that the variability as
characterized by the standard deviation does indicate considerably
larger variability in the nitrogen sample. This condition was noted
from Figure 1.1. Which Variability Measure Is More Important? As we
indicated earlier the sample range has applications in the area of
statistical quality control. It may appear to the reader that the
use of both the sample variance and the sample standard deviation
is redundant. Both measures reflect the same concept in measuring
variability but the sample standard deviation measures variability
in linear units whereas the sample variance is measured in squared
units. Both play huge roles in the use of statistical methods. Much
of what is
38. Exercises ] 7 accomplished in the context: of statistical
inference involves drawing conclusions about, characteristics of
populations. Among these characteristics arc constants which are1
called population parameters. Two important parameters are the
population m e a n and the population variance. The sample variance
plays an explicit role in the statistical methods used to draw
inferences about the population variance. The sample; standard
deviation has an important role along with the sample mean in
inferences that are made about the population mean. In general, the
variance is considered more in inferential theory, while the
standard deviation is used more in applications. Exercises 1.7
Consider the drying time data for Exercise 1.1 1.10 For the data of
Exercise 1.4 on page 13. com- on page 13. Compute the
sample-variance and sample pute both the mean and variance in
'flexibility" for standard deviation, berth company A and company
B. 1.8 Compute the sample variance: and standard devi- i . n
Consider the data in Exercise 1.5 on page 13. ation for the water
absorbency data of Exercise 1.2 on Compute the sample variance and
the sample standard page 13. deviation for both the control and
treatment groups. 1.9 Exercise 1.3 on page 13 showed samples of
tensile 1.12 Rjr Exercise 1.6 em page 14, compute the sample
strength data, one for specimens that were exposed to standard
deviation in tensile strength for the samples an aging process and
one in which there was no ag- separately for the two temperatures.
Does it. appear as ing of the specimens. Calculate the sample
variance as if ar, ic r c a so in temperature influences the
variability well as standard deviation in tensile strength for both
m (ensile strength? Explain. samples. 1.6 Discrete and Continuous
Data Statistical inference through the analysis of observational
studies or designed ex- periments is used in many scientific areas.
The data gathered may be d i s c r e t e or continuous, depending
on the area of application. For example1 , a chemical engineer may
be interested in conducting an experiment that will lead to condi-
tions where yield is maximized. Here, of course, the yield may be
in percent, or grams/pound, measured on a continuum. On the other
hand, a toxicologist con- ducting a combination drug experiment may
encounter data that are binary in nature (i.e., the patient either
responds or not). Great distinctions arc made between discrete and
continuous data in the prob- ability theory that allow us to draw
statistical inferences. Often applications of statistical inference
are found when the data are count, data. For example, an en- gineer
may be interested in studying the number of radioactive particles
passing through a counter in, say, 1 millisecond. Personnel
responsible for the1 efficiency of a port facility may be
interested in the characteristics of the number of oil tankers
arriving each day at a certain port: city, In Chapter 5, several
distinct scenarios, leading to different ways of handling data arc
discussed for situations with count data. Special attention even at
this early stage of the textbook should be paid to some details
associated with binary data. Applications requiring statistical
analysis of
39. 18 Chapter 1 Introduction to Statistics and Data Analysis
binary data are voluminous. Often the measure that is used in the
analysis is the sample proportion. Obviously the binary situation
involves two categories. If there are n units involved in the data
and x is defined as the number that fall into category 1, then n x
fall into category 2. Thus, x/n is the sample proportion in
category 1 and 1 x/n is the sample proportion in category 2. In the
biomedical application, 50 patients may represent the sample units
and if 20 out of 50 experienced an improvement in a stomach ailment
(common to all 50) after all were given the drug, then |g = 0.4 is
the sample proportion for which the drug was a success and 1 0.4 =
0.6 is the sample proportion for which the drug was not successful.
Actually the basic numerical measurement for binary data is
generally denoted by either 0 or 1. For example, in our medical
example, a successful result is denoted by a 1 and a nonsuccess a
0. As a result, the sample proportion is actually a sample mean of
the ones and zeros. For the successful category, xi + x2 + + x50 _
1 + 1 + 0 + - - + 0 + 1 _ 20 50 ~ 50 ~ 50 ~ ' ' What Kinds of
Problems Are Solved in Binary Data Situations? The kinds of
problems facing scientists and engineers dealing in binary data are
not a great deal unlike those seen where continuous measurements
are of interest. However, different techniques are used since the
statistical properties of sample proportions are quite different
from sample means that result from averages taken from continuous
populations. Consider the example data in Exercise 1.6 on page 14.
The statistical problem underlying this illustration focuses on
whether an in- tervention, say, on increase in cure temperature,
will alter the population mean tensile strength associated with the
silicone rubber process. On the other hand, in a quality control
area suppose an automobile tire manufacturer reports that a
shipment of 5000 tires selected randomly from the process results
in 100 of them showing blemishes. Here the sample proportion is ^j
= 0.02. Following a change in the process designed to reduce
blemishes, a second sample of 5000 is taken and 90 tires are
blemished. The sample proportion has been reduced to ggj, = 0.018.
The question arises, "Is the decrease in the sample proportion from
0.02 to 0.018 substantial enough to suggest a real improvement in
the population proportion?" Both of these illustrations require the
use of the statistical properties of sample averages-one from
samples from a continuous population, and the other from sam- ples
from a discrete (binary) population. In both cases the sample mean
is an estimate of a population parameter, a population mean in the
first illustration (i.e., mean tensile strength), and a population
proportion (i.e., proportion of blem- ished tires in the
population), in the second case. So here we have sample estimates
used to draw scientific conclusions regarding population
parameters. As we indi- cated in Section 1.4, this is the general
theme in many practical problems using statistical inference.
40. 1.7 Statistical Modeling, Scientific Inspection, and
Graphical Diagnostics 19 1.7 Statistical Modeling, Scientific
Inspection, and Graphical Diagnostics Often the end result of a
statistical analysis is the estimation of parameters of a
postulated model. This is natural for scientists and engineers
since they often deal in modeling. A statistical model is not
deterministic but, rather, must entail some probabilistic: aspects.
A model form is often the foundation of assumptions that are made
by the analyst. For example, in our Example 1.2 the scientist may
wish to draw some level of distinction between the "nitrogen" and
"no-nitrogen" populations through the sample information. The
analysis may require a certain model for the data, for example,
that the two samples conic: from normal or Gaussian distr