+ All Categories
Home > Documents > Synthesis

Synthesis

Date post: 25-Feb-2016
Category:
Upload: delu
View: 54 times
Download: 1 times
Share this document with a friend
Description:
STAT 101 Dr. Kari Lock Morgan 12/6/12. Synthesis. Big Picture Essential Synthesis Bayesian Inference (continued) Review Synthesis Activities. Data Collection. The way the data are/were collected determines the scope of inference - PowerPoint PPT Presentation
Popular Tags:
35
Statistics: Unlocking the Power of Data STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued) Review Synthesis Activities
Transcript
Page 1: Synthesis

Statistics: Unlocking the Power of Data Lock5

STAT 101Dr. Kari Lock Morgan

12/6/12

Synthesis

Big Picture Essential Synthesis• Bayesian Inference (continued) • Review• Synthesis Activities

Page 2: Synthesis

Statistics: Unlocking the Power of Data Lock5

Data Collection• The way the data are/were collected determines the scope of inference

• For generalizing to the population: was it a random sample? Was there sampling bias?

• For assessing causality: was it a randomized experiment?

• Collecting good data is crucial to making good inferences based on the data

Page 3: Synthesis

Statistics: Unlocking the Power of Data Lock5

Exploratory Data Analysis• Before doing inference, always explore your data with descriptive statistics

• Always visualize your data! Visualize your variables and relationships between variables

• Calculate summary statistics for variables and relationships between variables – these will be key for later inference

• The type of visualization and summary statistics depends on whether the variable(s) are categorical or quantitative

Page 4: Synthesis

Statistics: Unlocking the Power of Data Lock5

Estimation• For good estimation, provide not just a point estimate, but an interval estimate which takes into account the uncertainty of the statistic

• Confidence intervals are designed to capture the true parameter for a specified proportion of all samples

• A P% confidence interval can be created by

• bootstrapping (sampling with replacement from the sample) and using the middle P% of bootstrap statistics

• *statisti z Sc E

Page 5: Synthesis

Statistics: Unlocking the Power of Data Lock5

Hypothesis Testing• A p-value is the probability of getting a statistic as extreme as observed, if H0 is true

• The p-value measures the strength of the evidence the data provide against H0

• “If the p-value is low, the H0 must go”

• If the p-value is not low, then you can not reject H0 and have an inconclusive test

Page 6: Synthesis

Statistics: Unlocking the Power of Data Lock5

p-value• A p-value can be calculated by

• A randomization test: simulate statistics assuming H0 is true, and see what proportion of simulated statistics are as extreme as that observed

• Calculating a test statistic and comparing that to a theoretical reference distribution (normal, t, 2, F)

Page 7: Synthesis

Statistics: Unlocking the Power of Data Lock5

Hypothesis TestsVariables Appropriate Test

One Quantitative Single mean (t)

One Categorical Single proportion (normal)Chi-square Goodness of Fit

Two Categorical Difference in proportions (normal)Chi-square Test for Association

One Quantitative, One Categorical

Difference in means (t)Matched pairs (t)ANOVA (F)

Two Quantitative Correlation (t)Slope in Simple Linear Regression (t)

More than two Multiple Regression (t, F)

Page 8: Synthesis

Statistics: Unlocking the Power of Data Lock5

Regression• Regression is a way to predict one response variable with multiple explanatory variables

• Regression fits the coefficients of the model

• The model can be used to• Analyze relationships between the explanatory

variables and the response• Predict Y based on the explanatory variables• Adjust for confounding variables

0 1 2 21 ... k k ix xxy

Page 9: Synthesis

Statistics: Unlocking the Power of Data Lock5

Probability

( or ) ( ) ( ) ( and )P A B P A P B P A B

( ) ( and ) + P(A and not )P A P A B B

( if ) (( and ) )P P A BA PB B

(not ) 1 ( )P A P A

( and ) ( if ) ( )(

( if )) ( )

P A B P B A P AP

P AB

BB P

Page 10: Synthesis

Statistics: Unlocking the Power of Data Lock5

Romance• What variables help to predict romantic interest?

• Do these variables differ for males and females?

• All we need to figure this out is DATA!

(For all of you, being almost done with STAT 101, this is the case for many interesting questions!)

Page 11: Synthesis

Statistics: Unlocking the Power of Data Lock5

Speed Dating• We will use data from speed dating conducted at Columbia University, 2002-2004

• 276 males and 276 females from Columbia’s various graduate and professional schools

• Each person met with 10-20 people of the opposite sex for 4 minutes each

• After each encounter each person said either “yes” (they would like to be put in touch with that partner) or “no”

Page 12: Synthesis

Statistics: Unlocking the Power of Data Lock5

Speed Dating Data

What are the cases?

a) Students participating in speed datingb) Speed datesc) Ratings of each student

Page 13: Synthesis

Statistics: Unlocking the Power of Data Lock5

Speed Dating

What is the population? Ideal population?

More realistic population?

Page 14: Synthesis

Statistics: Unlocking the Power of Data Lock5

Speed Dating

It is randomly determined who the students will be paired with for the speed dates.

We find that people are significantly more likely to say “yes” to people they think are more intelligent.

Can we infer causality between perceived intelligence and wanting a second date?

a) Yesb) No

Page 15: Synthesis

Statistics: Unlocking the Power of Data Lock5

Successful Speed Date?

What is the probability that a speed date is successful (results in both people wanting a second date)?

To best answer this question, we should use

a) Descriptive statisticsb) Confidence Intervalc) Hypothesis Testd) Regressione) Bayes Rule

Page 16: Synthesis

Statistics: Unlocking the Power of Data Lock5

Successful Speed Date?

63 of the 276 speed dates were deemed successful (both male and female said yes).

A 95% confidence interval for the true proportion of successful speed dates is

a) (0.2, 0.3)b) (0.18, 0.28)c) (0.21, 0.25)d) (0.13, 0.33)

= 63/276 = 0.230.23 0.23 0.05(0.18, 0.28)

Page 17: Synthesis

Statistics: Unlocking the Power of Data Lock5

Pickiness and Gender

Are males or females more picky when it comes to saying yes?

Guesses?

a) Malesb) Females

Page 18: Synthesis

Statistics: Unlocking the Power of Data Lock5

Pickiness and Gender

Are males or females more picky when it comes to saying yes? How could you answer this?

a) Test for a single proportionb) Test for a difference in proportionsc) Chi-square test for associationd) ANOVAe) Either (b) or (c)

Yes NoMales 146 130

Females 127 149

Page 19: Synthesis

Statistics: Unlocking the Power of Data Lock5

Pickiness and Gender

Do males and females differ in their pickiness? Using α = 0.05, how would you answer this?

a) Yes b) No c) Not enough information

Page 20: Synthesis

Statistics: Unlocking the Power of Data Lock5

Reciprocity

Are people more likely to say yes to someone who says yes back? How would you best answer this?

a) Descriptive statisticsb) Confidence Intervalc) Hypothesis Testd) Regressione) Bayes Rule

Male says Yes Male says NoFemale says Yes 63 64

Female says No 83 66

Page 21: Synthesis

Statistics: Unlocking the Power of Data Lock5

Reciprocity

Are people more likely to say yes to someone who says yes back? How could you answer this?

a) Test for a single proportionb) Test for a difference in proportionsc) Chi-square test for associationd) ANOVAe) Either (b) or (c)

Male says Yes Male says NoFemale says Yes 63 64

Female says No 83 66

p-value =0.3731

Page 22: Synthesis

Statistics: Unlocking the Power of Data Lock5

Reciprocity

Are people more likely to say yes to someone who says yes back?

p-value = 0.3731

Based on this data, we cannot determine whether people are more likely to say yes to someone who says yes back.

Page 23: Synthesis

Statistics: Unlocking the Power of Data Lock5

Race and Response: Females

Does the chance of females saying yes to males differ by race?

How could you answer this question?

a) Test for a single proportionb) Test for a difference in proportionsc) Chi-square goodness of fitd) Chi-square test for associatione) ANOVA

Asian Black Caucasian Latino Other0.50 0.57 0.42 0.48 0.53

p-value =0.69

Page 24: Synthesis

Statistics: Unlocking the Power of Data Lock5

Race and Response: Males

Each person rated their date on a scale of 1-10 based on how much they liked them overall.

Does how much males like females differ by race?

How would you test this?

a) Chi-square testb) t-test for a difference in meansc) Matched pairs testd) ANOVAe) Either (b) or (d) p-value =0.892

Page 25: Synthesis

Statistics: Unlocking the Power of Data Lock5

Physical Attractiveness

Each person also rated their date from 1-10 on the physical attractiveness. Do males rate females higher, or do females rate males higher?

Which tool would you use to answer this question?

a) Two-sample difference in meansb) Matched pair difference in meansc) Chi-Squared) ANOVAe) Correlation

Page 26: Synthesis

Statistics: Unlocking the Power of Data Lock5

Physical Attractiveness

95% CI: (0.10, 0.71)p-value

Page 27: Synthesis

Statistics: Unlocking the Power of Data Lock5

Other RatingsEach person also rated their date from 1-10

on the following attributes: Attractiveness Sincerity Intelligence How fun the person seems Ambition Shared interests

Which of these best predict how much someone will like their date?

Page 28: Synthesis

Statistics: Unlocking the Power of Data Lock5

Multiple RegressionMALES RATING FEMALES:

FEMALES RATING MALES:

Page 29: Synthesis

Statistics: Unlocking the Power of Data Lock5

Ambition and Liking

How does the perceived ambition of a date relate to how much the date is liked?

How would you answer this question?

a) Inference for difference in meansb) ANOVAc) Inference for correlationd) Inference for simple linear regressione) Either (b), (c) or (d)

Page 30: Synthesis

Statistics: Unlocking the Power of Data Lock5

Ambition and Liking

r = 0.44, SE = 0.05

Find a 95% CI for .

0.28, SE = 0.06

Test whether 1 differs from 0.

.44 2(.05) = (0.34, 0.54) t = 0.28/0.06 = 4.67=> significant

Page 31: Synthesis

Statistics: Unlocking the Power of Data Lock5

Data!If you have a question that needs answering…

ALL YOU NEED IS DATA!!!!

Page 32: Synthesis

Statistics: Unlocking the Power of Data Lock5

FinalTuesday, December 11th, 2 – 5pm

No make-ups, no excuses

25% of your course grade

Cumulative from the entire course

Open only to a calculator and 3 double-sided pages of notes prepared only by you

StatKey will be available if needed for theoretical distributions, but a calculator will be sufficient

Page 33: Synthesis

Statistics: Unlocking the Power of Data Lock5

Office Hours Before FinalSunday, 4 – 7pm, Tracy, Old Chem 211 A

Monday, 12 – 3pm, Prof Morgan, Old Chem 216

Monday, 4 – 6pm, Heather, Old Chem 211A

Monday, 6 – 9pm, Sam, Old Chem 211A

Tuesday, 12 – 1pm, Prof Morgan, Old Chem 216

Page 34: Synthesis

Statistics: Unlocking the Power of Data Lock5

To DoProject 2 individual grades on Sakai (due Monday, 12/10)

Do Homework 9 (all practice problems)

Study for final!

Do Big Picture Essential Synthesis problems (solutions)

Do Practice Final (solutions)

If you want more problems to do… any odd essential synthesis or review problems (solutions

under documents on course website) any problem in the book (solutions in my office – can check

during office hours on Monday)

Page 35: Synthesis

Statistics: Unlocking the Power of Data Lock5

Thank You!!!


Recommended