QUANTITATIVE DATA ANALYSIS - WordPress.com · 07/12/16 1 QUANTITATIVE DATA ANALYSIS Sociological...

Post on 02-Apr-2018

217 views 2 download

transcript

07/12/16

1

QUANTITATIVE DATA ANALYSIS Sociological Research Methods

Why analyze data? • Describe population of interest

• Explain the relationship between variables

• Figure out the answer to your research question

Survey about religious beliefs • Q1: What is your gender?

•  Male •  Female

• Q2: What is your age? •  ____ (0 – 100+)

• Q3: Indicatelevelofagreementwiththefollowingstatement.

StronglyAgree Agree Neutral Disagree Strongly

DisagreeIbelievethatheavenandhellexists

07/12/16

2

Survey about religious beliefs § How often do you attend religious services?

§ Never § About once or twice a year § Several times a year § About once a month § 2-3 times a month § Nearly every week § Every week § Several times a week § Don’t know, No answer

Getting Started § Everything must be quantified

§  Transform all responses into a numeric value

§ Respondent’s answers must be quantified §  Assign a numerical value to the respondent’s answer to each

survey question §  Question measured at the nominal and ordinal levels

§  Questions measured at the ratio and interval level already numeric

Survey about religious beliefs • Q1: What is your gender?

•  Male (1) •  Female (0)

• Q2: What is your age? •  ____ (0 – 100+)

(already a numeric value)

• Q3: Indicatelevelofagreementwiththefollowingstatement.

StronglyAgree(1) Agree(2) Neutral(3) Disagree(4) Strongly

Disagree(5)Ibelievethatheavenandhellexists

07/12/16

3

Quantifying the responses • How often do you attend

religious services? •  Never (0) •  About once or twice a year (1) •  Several times a year (2) •  About once a month (3) •  2-3 times a month (4) •  Nearly every week (5) •  Every week (6) •  Several times a week (7) •  Don’t know, No Answer (8)

§ Could have chosen any numeric value to represent the respondent’s answer § Remain consistent

§  Choose the same numeric scheme for all Likert questions

Organizing the data § Use the numeric values to organize the survey data

§  Spreadsheet format §  Rows: one respondent §  Columns: one variable

Survey of religious beliefs Respondent Gender Age Belief Attendance

1 0 25 3 0

2 0 32 5 1

3 0 68 1 5

4 0 75 1 6

5 0 29 5 2

6 0 25 4 3

7 1 32 2 5

8 1 25 2 5

9 1 54 1 8

10 1 36 5 9

07/12/16

4

Analyzing the data • Once data is organized, analysis can begin • Univariate analysis

• Analyze one variable •  Frequency •  Averages •  Standard deviation

• Bivariate analysis • Analyze two variables

•  Cross-tabulations •  Correlations •  Mean comparisons

Frequencies (the number of times an attribute was observed)

Frequencies

07/12/16

5

Frequencies from Sample Survey of Religious Beliefs

• Gender •  How many men took the survey?

•  Female (0): 60% •  (6/10)

•  How many women took the survey? •  Male (1): 40%

•  (4/10)

Frequencies from Sample Survey of Religious Beliefs

• Belief? •  How many strongly agreed (1): •  How many agreed (2): •  How many were neutral (3): •  How many disagreed (4): •  How many strongly agreed (5):

Measures of Central Tendency • Central tendency focuses on determining the average

value •  Goal average (soccer) .300 •  Grade point average 2.5

• More than one way to think about average •  Mode •  Median •  Mean

07/12/16

6

Measures of Central Tendency • Mode

•  Response that occurs most frequently •  If all responses occur with the same frequency, there is no mode

•  Can have more than one mode •  Some responses may tie for most frequently occurring

Mode of Age in Survey of Religious Beliefs

Respondent Age Frequency 25 3 29 1 32 2 36 1 54 1 68 1 75 1

What if… Respondent Age Frequency

25 2 29 2 32 2 36 1 54 1 68 1 75 1

07/12/16

7

What if… Respondent Age Frequency

25 1 29 1 32 1 36 1 54 1 68 1 75 1

Measures of Central Tendency § Median

§  Middle response in an ordered list

§  If number of responses is odd, median is exactly in the middle

§  If number of responses is even, median is the mean of the two in the middle

Median age in survey of religious beliefs

§ Finding the middle response § Same number of

responses above and below

§ Calculating the median?

Respondent ID Respondent Age 1 25 2 25 3 25 4 29 5 32 6 32 7 36 8 54 9 68

10 75

07/12/16

8

What if… • Finding the middle response • Same number of

responses above and below

• The median?

Respondent ID Respondent Age 1 25 2 25 3 25 4 29 5 32 6 34 7 36 8 54 9 68

10 75 11 89

Measures of Central Tendency § Mean

§  Most common measure of central tendency

§  Sum of the responses divided by the number of responses

Mean of Age in Survey of Religious Beliefs

§ Sum of the responses: §  25+25+25+29+32+32+36+54+68+75 = 401

§ Number of responses: §  10

§ Mean: 401/10 = 40.10

07/12/16

9

Which Measure of Central Tendency is Appropriate?

•  Interval and ratio measures •  Mean most appropriate

•  Average age, weight, hours studied, hours spent watching televisions

•  Sometimes median more appropriate (when you have extreme values (outliers)) •  Median house price •  Median wage

• Ordinal •  Mode appropriate at times

•  Most people strongly agree •  Median appropriate at times

•  How does the middle person feel

Which measure of central tendency is appropriate?

• Nominal measures; where you can’t logically order the responses (blue eyes not inherently better than green eyes) •  Mode most appropriate

•  Most respondents were women

Codebook Variable:Gender

Ques0on:Whatisyourgender?

AIributes NumericCode Frequency CentralTendency

Female 0.00 6 0.00(Mode)

Male 1.00 4

• Codebook: document that describes the contents, structure, and layout of a data collection. •  Should contain information

intended to be complete and self-explanatory for each variable in a data file.

•  Serves as a guideline for the researcher who collected the dataset, and those who use the dataset once it is collected.

07/12/16

10

Mishaps in Quantitative Analysis • Focused on perfect scenario

•  Each respondent provides complete information

• What about missing data? •  What if the respondents do not report all the information we ask

them for. •  How does each item we discussed earlier, change if data are

missing?

Missing Data • Why would some data be missing?

•  Contingency questions •  If respondent answers no, data for questions 14 – 25 are missing

•  This is foreseeable “missing data”

Missing Data • Why would some data be missing?

•  Respondent does not provide answer •  On purpose: refuse to give information •  By mistake: missed a question

•  Don’t want to disregard entire survey •  Use what you can, treat the rest as missing

07/12/16

11

Spreadsheet with missing data Respondent Gender Age Belief Attendance

1 0 25 3 0 2 0 - 5 1 3 0 68 - 5 4 - 75 - 6 5 0 29 5 - 6 0 - 4 3 7 1 32 2 5 8 1 25 2 5 9 1 54 1 8

10 1 36 - 9

Need Numerical Code for Missing Data Variable:Gender

Ques0on:Whatisyour

gender?AIributes Numeric

CodeFrequency Central

TendencyMale 0.00 5 0.00

(Mode)Female 1.00 4

Missing 99.00 1

•  99 commonly used numeric code •  Can be confusing if ‘99’ is a

possible value •  Age •  Income •  Weight •  Grade

• Choose a value •  Remain consistent

Variable: Belief Question: Please indicate your level of agreement with the following statement: “I believe that heaven and hell exist”

Attributes Numeric Code Frequency

Strongly Agree 1 2

Agree 2 2

Neutral 3 1

Disagree 4 1

Strongly Disagree 5 2

Missing 99 3

07/12/16

12

Things to consider • Calculations of univariate statistics when missing data

•  Don’t want to include missing data in calculations

• Mode •  Don’t report missing data value as the mode

•  Focus on most frequently occurring response of non-missing data

Reporting the Mode in Datasets with Missing Data

Variable: Belief Question: Please indicate your level of agreement with the following statement: “I believe that heaven and hell exist”

Attributes Numeric Code Frequency

Strongly Agree 1 2

Agree 2 2

Neutral 3 1

Disagree 4 1

Strongly Disagree 5 2

Missing 99 3

Calculating the Median in Datasets with Missing Data

• Median •  Middle response

•  If data are missing •  No longer have 10 responses

•  Middle response is no longer the average of 5th and 6th responses

•  Median=(32+36)/2=34

Median (32+36)/2= 34

1 25 2 25 3 39 4 32 5 36 6 54 7 68 8 75 9 10

07/12/16

13

Calculating the Mean in Datasets with Missing Data

• Mean •  Sum of responses divided by number of responses

•  If data are missing •  The number of responses should reflect this (we no longer have 10

responses- 2 are missing: have 8 responses)

•  Mean age=(25+68+75+29+32+25+54+36)/8=344/8=43 (was 40.10 when we had 10 responses)

Collapsing Responses • Sometimes you want to combine attributes together

•  Condense presentation of data •  Make tables easier to understand

•  Some attributes chosen relatively few times •  Combine them with another attribute

07/12/16

14

Belief

TheOriginalData

Belief Frequency

Strongly Agree 2

Agree 2

Neutral 1

Disagree 1

Strongly Disagree 2

Missing 2

CollapsedData

Belief Frequency Agree, at least 4 Neutral 1 Disagree, at least 3 Missing 2

Bivariate Analysis • Analyzing two variables at once

•  Trying to determine if an empirical relationship exists between the two •  Independent & dependent variable

• Cross-tabulation (also known as contingency tables) •  Represent relationships among variables as percentages

What conclusions could we make?

07/12/16

15

What conclusions could we make?

Steps Involved in Creating Crosstabs

• Choose two variables •  Variables related to hypothesis and research question

•  One independent •  One dependent

•  Analyze those instances when respondent provided information for both

•  Collapse attribute categories •  Tables best understood when variables are nominal or ordinal

Survey of religious beliefs & practices • Hypotheses

•  Women are more religious than men. •  As age increases, belief increases. •  As age increases, attendance increases

• Can use crosstabs to investigate each

07/12/16

16

Focus on the Dimension of Belief

• Gender (IV), Belief (DV) • Both variables measured

at the nominal and ordinal level

Belief Men Women Strongly Agree 0 1 Agree 0 2 Neutral 1 0 Disagree 1 0 Strongly Disagree 2 0 Total 4 3

Using Counts to Construct Percentages Belief Men Women Strongly Agree

0 1

Agree 0 2 Neutral 1 0 Disagree 1 0 Strongly Disagree

2 0

Total 4 3

Belief Men Women Strongly Agree

0% 33.3%

Agree 0 66.7 Neutral 25 0 Disagree 25 0 Strongly Disagree

50 0

Total 100% 100%

Collapsing the Attributes Belief Men Women Agree, at least

0% 100%

Neutral 25 0% Disagree, at least

75 0%

Total 100% 100%

• What conclusions would you make?

07/12/16

17

What about interval and ratio variables? • Crosstabs easiest to understand if data presented in

nominal or ordinal level •  Best to collapse interval and ratio data into nominal/ordinal

• Ex: As age increases belief increases •  Age measured at ratio level

Collapsing age • How can you make an interval/ratio measure into a

nominal/ordinal measure? •  Assign each response to a category

•  Nominal: categories need to be exhaustive & mutually exclusive •  Ordinal: categories need to be rank-order, exhaustive, & mutually

exclusive

•  Creating a new variable •  Collapsing an interval/ratio measure

RecallAge Belief Attendance

25 3 0

- 5 1

68 - 5

75 1 6

29 5 -

- 4 3

32 2 5

25 2 5

54 1 8

36 - 9

• Themeanagewas:43• Wecancategorizeeachpersonaccordingto•  Youngerthan43•  43andolder

• Themedianagewas:34•  Wecancategorizeeachpersonaccordingto•  Youngerthan34•  34andolder

07/12/16

18

Determine the number of people in each category

Belief 43 and older Younger than 43 Strongly Agree (1) 2 0 Agree (2) 0 2

Neutral (3) 0 1

Disagree (4) 0 0

Strongly Disagree (5) 0 1

Total 2 4

Calculate % of people in each category

Belief 43 and older Younger than 43 Strongly Agree (1) 100% 0%

Agree (2) 0 50 Neutral (3) 0 25 Disagree (4) 0 0

Strongly Disagree (5) 0 25

Total 100% 100%

Collapse belief to make table easier to understand

Belief 43 and older Younger than 43

Agree, at least 100% 50%

Neutral 0 25

Disagree, at least 0 25

Total 100% 100%

07/12/16

19

As age increases attendance increases

Attendance ≥43 <43

Never (0) 0 1

Less than once a year (1) 0 0

Once or twice/year (2) 0 0

Several times/year (3) 0 0

Once a month (4) 0 0

2-3 times a month (5) 1 2

Nearly every week (6) 1 0

Every week (7) 0 0

Several times/week (8) 1 0

Don’t know (9) 0 1

Total 3 4

Attendance ≥43 <43

Never (0) 0% 25%

Less than once a year (1) 0 0

Once or twice/year (2) 0 0

Several times/year (3) 0 0

Once a month (4) 0 0

2-3 times a month (5) 33.3% 50%

Nearly every week (6) 33.3% 0

Every week (7) 0 0

Several times/week (8) 33.3% 0

Don’t know (9) 0 25

Total 100% 100%

CollapseaIendancetomaketableeasiertounderstand

Belief 43andolder Youngerthan43Onceamonthorless 0% 25%Morethanonceamonth

100% 50%

Don’tknow 0% 25%Total 100% 100%

Correlation • Another way to think about how to variables are related to

each other • How much of one variable (religious attendance) can the

other variable (age) can explain? • Correlation coefficient

07/12/16

20

Types of Correlation • Direct (positive) correlation

•  One variable increases as the other one increases •  One variable decreases as the other one decreases

•  HINT: Both variables move in the same direction

•  Indirect (negative) correlation •  One variable decreases as the other one increases

•  HINT: Variables move in opposite directions

• No correlation •  Behavior of one variable is not affected by the behavior of the other

variable

https://www.youtube.com/watch?v=VFjaBh12C6s