+ All Categories
Home > Documents > Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid,...

Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid,...

Date post: 17-Dec-2015
Category:
Upload: claude-montgomery
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010
Transcript
Page 1: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

Overview of Main Survey Data Analysis and

Scaling

National Research Coordinators Meeting Madrid, February 2010

Page 2: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Content of presentation

• Scaling and analysis of test items

• Scaling and analysis of questionnaire items

• Data analysis for the reporting of ICCS data

Page 3: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Steps in analysis

• Preliminary analysis of first data sets received– Review at JMC data analysis meeting in

Hamburg in July 2009

• Analysis of clean and uncleaned data sets from almost all participating countries– Review at PAC meeting in Tallinn (Oct 2009)

and JMC data analysis meeting in Hamburg in early December 2009

• Final scaling and analysis with clean data from all 38 countries

Page 4: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Test item analysis

• Review of missing data

• Analysis of item dimensionality

• Review of item statistics (international)

• Analysis of differential item functioning by gender

• Analysis of item-by-country interaction– Measurement equivalence

• Item adjudication

Page 5: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scaling model

• Rasch one-parameter model

• Pi() is the probability for person n to score 1 on item i

n is the estimated ability of person n and i

P

i

n i

n i

( )exp

exp

1

Page 6: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Probability curves

0

0.5

1

-4 -3 -2 -1 0 1 2 3 4

Page 7: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Partial credit model

• For open-ended items (and questionnaire items) with more than two categories the Partial Credit model was used:

• Here, tij denotes an additional step parameter

iim

h

k

kijin

x

kijin

x mxPii

,,1,0

)(exp

)(exp)(

0 0

0

Page 8: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Threshold curves

THETA

4.003.002.001.00.00-1.00-2.00-3.00-4.00

Pro

babi

lity

1.0

.8

.6

.4

.2

0.0

Strongly Agree Agree Disagree Strongly disagree

1

2 3

Page 9: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

THETA

4.003.002.001.00.00-1.00-2.00-3.00-4.00

Pro

babi

lity

1.0

.8

.6

.4

.2

0.0

1

2 3

Strongly Agree Agree Disagree Strongly disagree

Response probabilities

Page 10: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Missing data issues

• Different categories of missing data

• Omitted responses– Somewhat higher percentages for open

response items

• Invalid responses– Generally very low percentages

• Not reached responses– Omitted items at end of test booklets– Generally low, in few countries more

considerable

Page 11: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Not reached % by region

Page 12: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Test characteristics

• Test items were generally a little easier than the average student abilities (pooled across countries)

• Test reliability was 0.84 (similar to CIVED assessment)

• Very high latent correlations between possible sub-dimensions– Decision not to pursue sub-scales

Page 13: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Mapping of test items to

abilities

| | X| 2 X| XX| XX| XX|14 XX| XXX|25 XXXX| XXXXX| 1 XXXXX| XXXXXXX| XXXXX| XXXXXX|5 37 XXXXXXX|46 75 XXXXXXXX|2 4 49 56 74 XXXXXXXXX|51 71 XXXXXXXXXX|6 9 28 0 XXXXXXXX|55 59 77 XXXXXXXXX|10 32 34 68 XXXXXXXX|19 27 44 XXXXXXXXX|20 30 33 36 65 72 XXXXXXXXX|26 40 41 50 53 57 58 61 70 XXXXXXXXX|11 16 18 21 23 31 64 66 69 XXXXXXXX|12 17 35 76 78 XXXXXXXX|29 XXXXX|3 38 43 47 63 -1 XXXXX|7 13 15 22 48 62 67 79 XXXXX|1 52 54 XXXX|42 60 73 XXX|24 39 XXX| XX|45 X| X| -2 X| X| |8 | | |

Page 14: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Review of item scaling properties

• Most items had excellent scaling properties– Weighted mean square item fit– Item-total correlation– Item characteristic curves

• Only on test item (CI2HRM2) was omitted from scaling

Page 15: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Item statistics

Item 37 ------- item:37 (CI2HRM2) Cases for this item 7574 Item-Rest Cor. 0.09 Item Threshold(s): -0.37 Weighted MNSQ 1.23 Item Delta(s): -0.37 ------------------------------------------------------------------------------ Label Score Count % of tot Pt Bis t (p) PV1Avg:1 PV1 SD:1 ------------------------------------------------------------------------------ 1 0.00 222 2.93 -0.15 -13.31(.000) -0.78 0.85 2 1.00 4401 58.11 0.09 7.44(.000) 0.12 0.90 3 0.00 449 5.93 -0.12 -10.71(.000) -0.43 0.83 4 0.00 2392 31.58 0.05 4.23(.000) 0.00 0.86 7 0.00 45 0.59 -0.07 -5.92(.000) -0.97 1.08 9 0.00 65 0.86 -0.06 -4.80(.000) -0.52 0.73 ==============================================================================

Page 16: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Item characteristic curves

Page 17: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scoring reliabilities - 1

• Open-ended items were scored according to international scoring guidelines

• Double-scoring of sub-samples

• On average, percentages of scorer agreement ranged between 84 and 92 across participating countries

Page 18: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scoring reliabilities - 2

• Only items accepted where scorer agreement was 70% or more

• Data for items where this criterion was not met were not included in scaling

• In two countries open-ended items were consistently easier than other items– Omitted from scaling and database

Page 19: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Gender DIF

• DIF estimates reflect the differences between item difficulties for males and females of equal ability – This may cause bias in favour of one

group

• Generally, only few items with gender DIF were found

Page 20: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Cross-national measurement equivalence

• Occurrence of item-by-country interaction– Items relatively much harder in some

countries but much easier in others

• In ICCS, national item calibrations were compared with those for the international calibration sample

• Standard errors were adjusted for sample design effects and multiple comparisons

Page 21: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Example for CI2HRM2

Page 22: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Item-by-country interaction

• Generally, items tended to behave in a similar way

• Number of items with parameter variance– Sometimes due to translation errors– Often due to other factors (national context,

curricula)

• Occurrence of some parameter variation across countries – Similar results as in other cross-national

studies

Page 23: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Item adjudication

• Based on results from scaling analysis (item statistics, item curves, item-by-country interaction etc.)

• International item adjudication– Omission of CI2HRM2 from scaling

• National item adjudication– Re-verification for items with larger

discrepancies in item difficulty– Omission of item for national scaling with

translation or scoring issues

Page 24: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Calibration of items

• Based on international calibration sample with 500 randomly selected students from each of the 36 participating countries that met sampling requirements

• ACER ConQuest was used for estimation

• Booklet effects adjusted by including booklet as a facet in the scaling model

Page 25: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scaling methodology

• Plausible values were generated as student ability estimates– More information at workshop!

• Dummy indicators for classroom and all student level variables (international and regional) were included in the conditioning model

• Scale scores set to international metric with mean of 500 and SD of 100 for equally weighted countries

Page 26: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of changes in cognitive knowledge - 1

• 17 test items from CIVED included as intact cluster

• 17 countries with comparable data– Three countries with grade 9 in CIVED

and additional grade 9 samples in ICCS

• Small number of items in some countries had to be discarded due to translation errors or differences between ICCS and CIVED

Page 27: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of changes in cognitive knowledge - 2

• Comparison of item parameters showed high similarity (correlation of 0.95)

• Slight positioning effect due to different test designs– CIVED: One booklet– ICCS: CIVED link cluster in each of the

three positions• CIVED items at beginning slightly easier, at

end slightly harder than in ICCS

Page 28: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of changes in cognitive knowledge - 3

Page 29: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of changes in cognitive knowledge - 4

• Framework broadened since CIVED– Re-scaling CIVED data to equate with

ICCS not appropriate

• Selection of CIVED items not representative for overall CIVED test– Equating link items with CIVED scale (or

sub-scale) also not appropriate

• Solution: Establish new comparison scale based only on 17 link items

Page 30: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of changes in cognitive knowledge - 5

• Concurrent calibration of item parameters based on calibration samples with 34 samples from 17 countries (CIVED and ICCS)

• Establishing a metric with a mean of 500 and SD of 100 for equally weighted 17 CIVED countries

• For results in tables, weighted likelihood estimates were used– Usually unbiased for country averages

Page 31: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Questionnaire item analysis

• Missing data issues

• Item dimensionality and scaling review

• Item/scale adjudication

• Scaling procedures

Page 32: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Missing data - 1

• On average about 3 percent of students have missing scale scores– Only in two countries there are

percentages of 18 and 12 percent

• Teacher survey data relatively low missing percentages were found (about 2 percent)

• Very low percentages of missing data in school questionnaire

Page 33: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Missing data - 2

• Concerns about missing data for socio-economic indicators– Highest parental occupation: 5%– Highest parental education: 3%– Books at home: 1%

• However, in a few countries higher percentages of missing data were found (up to 15% for parental education)

Page 34: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Analysis of item dimensionality

• Exploratory and confirmatory factor analyses showed generally very similar results to those from the field trial

• These analyses will be described in detail in the ICCS technical report

Page 35: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scaling analysis

• Scale reliabilities (Cronbach’s alpha)– Over 0.7 satisfactory internal consistency

• Item-total correlations:– Useful for reviewing translation errors

• Scaling with IRT Partial Credit Model– Item fit– Category characteristic curves

Page 36: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Item and scale adjudication

• Only three scales with median scale reliabilities below 0.7– Democratic value beliefs, civic

participation in community and at school

• Adjudication for student, teacher, school and each regional questionnaire

• Some items were removed from scale

• In some cases, single-item reporting

Page 37: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scaling procedures - 1

• IRT scaling with Partial Credit Model

• So-called weighted likelihood estimates as scale scores

• International metric with mean of 50 and a standard deviation of 10

0

)(exp

)exp(

2 1

0 0

0

i

k

jm

h

k

kijin

x

jijin

n

nx

iI

Jr

Page 38: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Scaling procedures - 2

• Item parameter calibration with ACER ConQuest

• Calibration samples:– 500 students per country– 250 teachers per country– All school data with equal weights for

each country

• Only data from countries that met sampling requirements (categories 1 or 2) included in calibration

Page 39: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Questionnaire scales

• Advantages of IRT scales– Inclusion of students with at least two

item responses per scale– Possibility to describe scale

• From IRT Partial Credit Model it is possible to map scale scores to expected item responses

• Item maps will be provided in appendix to international report

Page 40: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Example of item map

Item

Item #1

Item #2

Item#3

Example of how to interpret the item-by-score map#1:

#2:

#3:

#4:

#5:

A respondent with score 60 has more than 50 probability to strongly agree with items 1 and at least agree with items 2 and 3

A respondent with score 40 has more than 50 % probability to strongly agree with items 1, 2 and 3

Example of item-by-score map

A respondent with score 30 has more than 50% probability to strongly agree with all three items

A respondent with score 40 has more than 50% probability to at least disagree with items 1 and 2 but to disagree with item 3

A respondent with score 50 has more than 50% probability to at least agree with items 1 and at least disagree with items 2 and 3

20 30 40 50 60 70 80

Scores

Strongly disagree Disagree Agree Strongly agree

Page 41: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Data analysis for reporting

• Estimation of sampling variance

• Estimation of measurement variance

• Reporting of differences

Page 42: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of sampling variance

• Data from cluster samples are not simple random samples– Standard formula for estimating sampling

error not appropriate

• Jackknife repeated replication technique used for ICCS

• IDB Analyser, WESVAR or SPSS/SAS macros may be used for applying this methodology

Page 43: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Estimation of measurement variance

• Using plausible values allows estimating the measurement error– The variation between the five PVs can

be used for estimation

• IDB Analyser, WESVAR or SPSS macros (ACER replicates module) include features to do this

• More information will be provided at the training workshop on Wednesday

Page 44: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Page 45: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Reporting of differences - 1

• The following types of significance tests will be reported:– For differences in population estimates between

countries– For differences between a country and the

international – in population estimates between subgroups within

countries.– For differences between population estimates in

ICCS and in CIVED (trend estimation)

Page 46: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Reporting of differences - 2

• Adjustment for multiple comparisons with Dunn-Bonferroni method– increasing critical value (p> .05) from

1.96 to 3.189

• SE for differences between samples

• Estimation of SE for sub-group differences with JRR

22_ jiijdif SESESE

Page 47: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Reporting of differences - 3

• For the SE of trend differences it is important to take the equating error into account

• The estimation of SE for differences between CIVED and ICCS can be computed as

• The equating error in the international metric is 3.31

222_ EqErrSESESE jiijdifCIVEDICCS

Page 48: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Multivariate analysis

• Multiple regression models were used for the tables in draft Chapter 7– Bivariate regression– Multiple regression

• Multi-level models were used for the analysis in draft Chapter 8– Students nested within classrooms– Classrooms mostly equivalent to schools

Page 49: Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

NRCMeetingMadrid

February 2010

Questions or comments?


Recommended