+ All Categories
Home > Documents > Regression II. OK Non-normal OK Non-normal OK Non-linear.

Regression II. OK Non-normal OK Non-normal OK Non-linear.

Date post: 21-Dec-2015
Category:
View: 237 times
Download: 2 times
Share this document with a friend
37
Regression II
Transcript

Regression II

OK

Non-normalOK

Non-normalOK

Non-linear

Non-normalOK

Non-linear Unequal variance

Non-linear regression

• There are nearly unlimited options here• Keep it simple! Only use a particular

non-linear fit if the data strongly suggest it

• I’ll discuss three types:– Quadratic regression– Smoothing– Logistic regression

Non-linear regression

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Non-linear regression

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Complex; goes throughall the data points

Simpler; still provides good fit to the data

Non-linear regression

• Three types of non-linear regression:– Quadratic regression– Smoothing– Logistic regression

Quadratic regression

• Y = a + bX + cX2

• Fits a parabolic curve to predict Y from X

• Often fitted using least-squares - minimize MSresiduals

Quadratic regression

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Quadratic regression

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

c > 0 c < 0

Quadratic regression

• Y = a + bX + cX2

• Three parameters to estimate from the data: a, b, and c

• More complex model

• Requires more data to get a good fit

Smoothing

• Runs a line (without any formula) through the data

• Can curve, or be straight – depends on data

• Several types: kernel, spline, lowess

• Each has a smoothing parameter to determine how much the line bends

Logistic Regression

• Used when Y is discrete – either 0 or 1

• Example: survival

• Predicts the odds of success for Y against X

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

LD50

Quick Reference Summary: Confidence Interval for

Regression Slope• What is it for? Estimating the slope of the linear

equation Y = + X between an explanatory variable X and a response variable Y

• What does it assume? Relationship between X and Y is linear; each Y at a given X is a random sample from a normal distribution with equal variance

• Parameter: • Estimate: b• Degrees of freedom: n-2• Formulae:

b− tα (2),df SEb < β < b+ tα (2),df SEb

SEb =MSresidual

X i − X ( )2∑

MSresidual =(Yi −Y )2 −b (X i − X )(Yi −Y )∑∑

n −2

Quick Reference Summary: t-test for Regression Slope

• What is it for? To test the null hypothesis that the population parameter equals a null hypothesized value, usually 0

• What does it assume? Same as regression slope C.I.

• Test statistic: t• Null distribution: t with n-2 d.f.• Formula:

t =b

SEb

Sample

Test statistic

Null hypothesis=0

compare

How unusual is this test statistic?

P < 0.05 P > 0.05

Reject Ho Fail to reject Ho

T-test for Regression Slope

Null distributiont with n-2 df

t =b

SEb

Class Activity

• Are taller people smarter, or dumber, than short people in this class?

• Trivia quiz, followed by group calculation

Trivia quiz

• Get out blank piece of paper

• Number from 1-10

• Answer each multiple choice question

Question 1

• Which of the following has the longest recorded life span?

A. Termite

B. Indian elephant

C. Freshwater oysterD. Chimpanzee

Question 2

• What was the first genetically engineered organism?

A. Corn

B. Mouse

C. Sheep

D. Tobacco

Question 3

• What animal has the highest blood pressure?

A. GiraffeB. Blue whale

C. Elephant

D. Flea

Question 4

• What happens to the critical value of a Chi-squared distribution (with constant ) as you increase the degrees of freedom?

A. IncreasesB. DecreasesC. Stays the sameD. None of the above

Question 5

• In the TV show The Simpsons, what is the name of Springfield Elementary`s Lunchlady?

A. LurleenB. MaryC. Ashley

D. Doris

Question 6

• Which of the following means: “the quality by which a person claims to know something intuitively, instinctively, or from the gut without regard to evidence, logic, intellectual examination, or actual facts”

A. Factuality

B. Statistics

C. TruthinessD. Hypothesis

Question 7

• Who invented the ANOVA?

A. Dr. Harmon

B. Karl Pearson

C. R. A. FisherD. Kareem Abdul-Jabar

Question 8

• An experiment that investigates all treatment combinations of two or more variables is called a(n):

A. Randomized block design

B. Kruskal-Wallace design

C. Factorial designD. Interaction

Question 9

• After class one day, Shelly comes home and decides to make chocolate chip cookies. The bag she uses contains 200 chocolate chips, and she ends up making 20 cookies, which gives an average of 10 chips per cookie. She wants that first one she (randomly) chooses to be the perfect cookie--what is the likelihood that that first cookie will have at least 13 chocolate chips?

A. About 5%

B. About 30%

C. About 10%

D. About 20%

Question 10

• Which of the following is NOT an assumption of linear regression?

• A. Relationship between X and Y is linear

• B. Each Y at a given X is a random sample

• C. Equal variance at each Y

• D. X is drawn from a normal distribution

Now, use your data

• Test the following null hypothesis:

• Ho: The slope of the relationship between height (X) and score on the trivia quiz (Y) is zero (=0)

t =b

SEb

SEb =MSresidual

X i − X ( )2∑

MSresidual =(Yi −Y )2 −b (X i − X )(Yi −Y )∑∑

n −2

b =

X i − X ( ) Yi − Y ( )i=1

n

X i − X ( )2

i=1

n


Recommended