Life after linear regression
A survey of Penn State applied statistics graduate courses
The courses
• Stat 500: Applied Statistics• Stat 501: Regression Methods• Stat 502: Analysis of Variance & Design of Expts• Stat 503: Design of Experiments• Stat 504: Analysis of Discrete Data• Stat 505: Applied Multivariate Statistical Analysis• Stat 506: Sampling Theory and Methods• Stat 509: Biostatistical Methods• Stat 510: Applied Time Series Analysis
Stat 500: Applied Statistics
• Topics covered:– Descriptive statistics
– Hypothesis testing and power
– Estimation and confidence intervals
– Regression
– One- and two-way ANOVA
– Chi-square tests
• Prerequisites– 2 credits of algebra
Stat 501: Regression Methods
• Topics covered:– Analysis of research data through simple and multiple
regression and correlation
– Polynomial models
– Indicator variables
– Stepwise and piecewise regression
– Logistic regression
• Prerequisites– 6 credits of statistics or Stat 500; matrix algebra
Stat 502: Analysis of Variance and Design of Experiments
• Analysis of data when:– the response y is continuous– the predictors (called factors or treatments) are
all qualitative– have same error assumptions as for regression
• Do the means differ among the groups defined by the factor combinations?
Stat 502: Analysis of Variance and Design of Experiments
• Topics covered:– Analysis of variance and design concepts
– Factorial, nested and unbalanced data
– Analysis of covariance
– Blocked designs
– Latin-square, split-plot, repeated measures designs
– Multiple comparisons
• Prerequisites– Stat 501 (or undergraduate version Stat 462)
A Stat 502 Example:Intertidal Seaweed Grazers
• To study influence of ocean grazers on regeneration rates of seaweed in intertidal zone, a researcher scraped square rock plots free of seaweed and observed the seaweed regeneration when certain types of seaweed-grazing animals were denied access.
• Research questions:– Which grazer consumes most seaweed?– Do different grazers influence impact of each other?– Are grazing effects similar in all microhabitats?
A Stat 502 Example:Intertidal Seaweed Grazers
• The grazers were limpets (L), small fishes (f), and large fishes (F):– LfF: all three grazers were allowed access– fF: limpets were excluded using caustic paint– Lf: large fish were excluded using coarse net– f: limpets and large fish were excluded– L: small, large fish excluded using fine net– C: the control group, all excluded
A Stat 502 Example:Intertidal Seaweed Grazers
• Intertidal zone is a highly variable environment. Researcher applied treatments in 8 blocks of 12 plots each:– #1: Just below high tide, exposed to heavy surf– #2: Just below high tide, protected from surf– #3: Midtide, exposed– #4: Midtide, protected– #5: Just above low tide level, exposed– #6: Just above low tide level, protected– #7: On near-vertical rock wall, midtide, protected– #8: On near-vertical rock wall, above low tide, protected
A Stat 502 Example:Percent of regenerated seaweed on intertidal
plots with some grazers excluded
Block Control L f Lf fF LfF
1 14, 23 4, 4 11, 24 3, 5 10, 13 1, 2
2 22, 35 7, 8 14, 31 3, 6 10, 15 3, 5
3 67, 82 28, 58 52, 59 9, 31 44, 50 6, 9
4 94, 95 27, 35 83, 89 21, 57 57, 73 7, 22
5 34, 53 11, 33 33, 34 5, 9 26, 42 5, 6
6 58, 75 16, 31 39, 52 26, 43 38, 42 10, 17
7 19, 47 6, 8 43, 53 4, 12 29, 36 5, 14
8 53, 61 15, 17 30, 37 12, 18 11, 40 5, 7
Stat 503: Design of Experiments
• The key word is “experiments”• When you can control the values of your
predictors (factors), you should ensure you can answer your research question by:– Collecting the appropriate measurements– Setting the values of your factors appropriately– Reducing extraneous variation by “blocking”– Having an appropriate sample size
Stat 503: Design of Experiments
• Topics covered:– Design principles– Optimality– Confounding in split-plot designs– Repeated measures designs, fractional factorial designs,
response surface designs– Balanced/partially balanced incomplete block designs
• Prerequisites:– Stat 501 (or undergraduate Stat 462)– Stat 502
A Stat 503 Example:The BARGE Study
• Current standard treatment for patients with mild to moderate asthma is scheduled daily use of inhaled albuterol.
• Now hypothesized that such regular use has a negative effect on lung function in patients with B16Arg/Arg genotype, but not in those with B16Gly/Gly genotype.
A Stat 503 Example:The BARGE Study
• The BARGE Study concerns comparing the regular use of inhaled albuterol (A) to placebo (P) in patients with the B16Arg/Arg genotype (R) and in patients with the B16GlyGly genotype.
• The primary hypothesis concerns inference about whether (μRA- μRP)- (μGA- μGP) is 0.
A Stat 503 Example:BARGE Study’s Paired Crossover
OrderPeriod
1Wash
outPeriod
2
GenotypeR
1 (AP) Y1jRA --- Y1jRP
2 (PA) Y2jRP --- Y2jRA
GenotypeG
1 (AP) Y1jGA --- Y1jGP
2 (PA) Y2jGP --- Y2jGA
Stat 504: Analysis of Discrete Data
• Analysis of data when:– the response y is binary or discrete– the predictors are qualitative or quantitative
• Summarized data are frequency counts
• How do the predictors affect the response?
Stat 504: Analysis of Discrete Data
• Topics covered:– Models for frequency arrays
– Goodness-of-fit tests
– Two-, three- and higher-way tables
– Latent models
– Logistic and Poisson regression models
• Prerequisites– Stat 502 (or undergraduate Stat 460 or major Stat 512)
– Matrix algebra
A Stat 504 Example:Survival in the Donner Party
• In 1846, Donner and Reed families traveled from Illinois to California by covered wagon.
• Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow.
• 40 of 87 members (45 adults over age 15) died from famine and exposure.
• Are females better able to withstand harsh conditions than are males?
A Stat 504 Example:Survival in the Donner Party
655545352515
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Age
Pro
babi
lity
of
surv
ival
Female
Male
A Stat 504 Example:Survival in the Donner Party
Link Function: Logit
Response Information
Variable Value CountSTATUS SURVIVED 20 (Event) DIED 25 Total 45
Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 1.633 1.110 1.47 0.141AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72
Stat 505: Applied Multivariate Statistical Analysis
• Analysis of data when you have several correlated, continuous responses is called multivariate data analysis.
• A repeated measure is a special kind of multivariate response obtained by measuring the same variable on each subject several times, possibly under different conditions.
Stat 505: Applied Multivariate Statistical Analysis
• Topics covered:– Multivariate data: matrix review, graphical displays, probability
theory, multivariate normal distribution, partial correlations– Inferences about multivariate means: Hotelling’s T2 tests,
multivariate analysis of variance, repeated measures experiments and growth curves, discriminant analysis
– Data reduction: Principal components, factor analysis, canonical correlation analysis, cluster analysis
– Structural equation modeling
• Prerequisites:– 6 credits in statistics– Matrix algebra
A Stat 505 Example: Pottery Data
• Pottery samples were collected from four sites in the British Isles: Llanedyrn, Caldicot, Isle Thornes, and Ashley Rails.
• Each piece analyzed for its aluminum, iron, magnesium, calcium, and sodium content.
• Do the pottery samples from the four sites differ with respect to their composition?
A Stat 505 Example:Pottery Data
Stat 506: Sampling Theory and Methods
• Topics covered:– Basic methods: simple random sampling, selecting sample sizes,
unequal probability sampling, ratio and regression estimation, stratified sampling, cluster and systematic sampling, multistage designs, double sampling
– Special topics: sampling hidden human populations, environmental sampling, sampling to study cause-and-effect relationships, resampling of data, measurement errors and nonresponse in surveys, adaptive sampling, network and snowball sampling
• Prerequisites:– Calculus– 3 credits in statistics
A Stat 506 Example:A Water Pollution Survey
• Study region of interest has 320 lakes.• Take random sample of the lakes by:
– Drawing a rectangle of length l and width w around study region.
– Generate pairs of (0,1) random numbers. Multiple first number by l, second by w to get random location coordinates within region.
– If location is a lake, then lake is selected.
– Continue until required number of lakes selected.
Stat 509: Biostatistics
• Topics covered:– An introduction to the design and statistical
analysis of randomized and observational studies in biomedical research
• Prerequisites:– Stat 500
Stat 510: Applied Time Series Analysis
• Topics covered:– Identification of models for empirical data collected
over time
– Use of models in forecasting
• Prerequisites:– Stat 501 (or undergraduate Stat 462 or major Stat 511)
A Stat 510 Example:Measuring Global Warming
• Temperature (in degrees Celsius) averaged for the northern hemisphere over a full year.
• Temperature series collected from 1880 to 1987.• All measurements expressed as differences from
their 108-year mean.• Research questions:
– Is the mean temperature increasing over the 88 years?– What is the rate of increase in global temperature over
the past century?
A Stat 510 Example:Measuring Global Warming
YEAR
TEM
P
2000198019601940192019001880
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
Scatterplot of TEMP vs YEAR
A Stat 510 Example:Measuring Global Warming
Observation Order
Resi
dual
1009080706050403020101
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
Residuals Versus the Order of the Data(response is TEMP)