Comparing CTT and IRT Using the Aptitude Test

Comparing CTT and IRT using the Aptitude Test for High School

Adelaida de Perio De La Salle University - Manila

Background   Admission to tertiary education requires applicants to

pass the screening process set by schools.   One of the assessment tools used to select their

potential applicants is the use of aptitude tests. Aptitude tests are used to measure one’s fundamental intellectual abilities.

Background   The Abstract Reasoning is a non-verbal test measuring

one’s ability to identify patterns in a series.   Numerical ability on the other hand, measure’s one’s

ability in solving mathematical problems.   Verbal Reasoning measures one’s ability to understand

analogies and covers areas in English language.   The spatial ability measures one’s ability to manipulate

shapes.

Background   Mechanical reasoning measures one’s knowledge of

physical and mechanical principles. Lastly, spelling measures one’s ability to detect errors in grammar punctuation and capitalization (Magno & Quano, 2010).

Review of Related Literature   Because many studies link aptitude with academic

performance, schools use the aptitude test to predict future outcomes of students’ performance.

  Long standing key predictors of academic success is students’ abilities measured by SAT or ACT, or high school GPA in predicting academic success (Covington, 1992; Lavin, 1965; Willingham, Lewis, Morgan, & Ramist, 1990).

Review of Related Literature   Garavila, Gredler & Margaret (1997) examined the extent

to which college students’ learning strategies, prior achievement and aptitude predicted course achievement. Analyses showed that each of the predictor was significantly correlated with achievement. These variables accounted for 45% of the variance in course achievement.

Review of Related Literature   Garcia (1997) found the same results in his study

examining the relations of motivation, attitude, and aptitude on second language achievement. The findings of the study revealed that aptitude (β=.43) Motivation (β =.41) and Ethnic Membership (β =.14) explained more than 50% of the variance in language achievement.

Review of Related Literature   In secondary education, little has been done to screen in

students before entering the high school. This is the reason why some students lack the necessary skills and come unprepared to meet the demands and expectations of high school education. The use of an aptitude test therefore will not only serve as a screening tool but moreover, it will provide teachers with information on the areas students have to improve on.

Objective   The present study therefore aims to compare CTT and

IRT results in evaluating the Aptitude Test developed for High School in terms of item difficulty, and item discrimination.

Review of CTT and IRT   The CTT model, also called the “True Score Theory”

espouses the idea that responses of examinees are only due to variation of the examinee’s ability.

  In CTT, item difficulty is indicated by the frequency of responses; item discrimination is indicated by item total correlation; and frequency of responses is used to examine distracters (Impara & Plake, 1997).

Review of CTT and IRT   Traditionally, CTT has been used as a method of analysis

in evaluating test although it has several limitations.

  First, the person statistic or the observed score is item dependent. Second, item statistics or the difficulty level and item discrimination are examinee dependent. The Item Response Theory answers these major limitations of the CTT.

Review of CTT and IRT   The Rasch model, which is also referred to as the IRT,

estimates the probability of a correct response to an item as a function of the person’s ability and difficulty of the item.

  In IRT, each item in a test has its own characteristic curve which describes the probability of getting the item correctly or depending on the test taker’s ability (Kaplan & Saccuzzo, 1997).

Review of CTT and IRT   IRT asserts that the easier question, the more likely a

student will be able to respond to it correctly, and the more able the student, the more likely he or she will be able to answer the question correctly as compared to a student who is less able. Rasch model is based on the assumption that guessing and item differences in discrimination are negligible (Anastasi and Urbina (2002).

Method Participants   A total of 63 incoming 1st year High School students,

both male and female participated in the study. The participants in the study were composed of grade 6 students from different elementary schools in Manila. The participants have finished the grade 6 level and were applying in a Science High School. Age ranges from 11-13 years old.

Method   Instrument   The Aptitude Test for High School was developed to

measure fundamental intellectual abilities in abstract reasoning, verbal reasoning and quantitative reasoning. The instrument consists of a total of 100 multiple choice items. The AHP consists of 30 items for abstract reasoning; 30 items for numerical reasoning, and 40 items for verbal reasoning.

Method   Psychometric properties of the test show the following

reliability estimates for each subtest. Obtained reliability coefficients for each subtest are .70 for abstract reasoning, .77 for numerical reasoning, and .78 for verbal reasoning.

Method   Procedure   The test was administered to incoming 1st year high

school students in a Science High School in Manila. The AHP was given as one of the assessment tools in their selection of potential applicants who will be accepted in the Science High School. A trained examiner administered the test for one hour.

Data Analysis   Data gathered were analyzed in terms of its reliability

coefficients, item difficulty and discrimination using both CTT and IRT.

  In terms of item difficulty and item discrimination using the Rasch model, two samples were tested and compared.

  The following computer software was used: SPSS version 16, and Microsoft Excel version 2007, and Winsteps for the IRT.

Results   Reliability Indices   Using the Classical Test Theory, reliability coefficients for

abstract reasoning, numerical reasoning and verbal reasoning were as follows: .70, .77, and.78.

Table 1 Summary of Person and Item Measure for Abstract Reasoning

Person Input Measured Infit Outfit

Score Count Measure Error IMNSQ ZSTD

OMNSQ ZSTD

Mean 21.8 30 1.33 0.5 1 0.1 0.94 0.1

SD 4 0 0.88 0.11 0.15 0.6 0.31 0.6 Real

RMSE 0.51 True SD 0.72 Separation 1.39 Person

Reliability 0.66


Mean 45.9 63 0 0.36 1 0.1 0.94 0

SD 10.2 0 1.09 0.14 0.11 0.8 0.22 0.9 Real

RMSE 0.39 True SD 1.02 Separation 2.65 Item reliability 0.88

Table 2 Summary of Person and Item Measure for Numerical Reasoning


Score Count Measure Error IMNSQ ZSTD OMNS

Q ZSTD Mean 19.5 30 0.82 0.47 1 0.1 0.97 0

0.9 0.27 0.9 SD 4.9 0 0.95 0.97 0.16

Real RMSE 0.47 True SD 0.82 Separation 1.74

Person Reliability 0.75


Mean 40.9 63 0 0.32 1 0.1 0.97 0

SD 10.5 0 0.93 0.04 0.11 0.9 0.2 1 Real

RMSE 0

.32 True SD 0.87 Separation 2.74 Item

reliability 0.88

Table 3 Summary of Person and Item Measure for Verbal Reasoning

Person Input Measured Infit Outfit Score Count Measure Error IMNSQ ZSTD OMNSQ ZSTD

Mean 21.8 40 0.33 0.4 1.01 0 0.99 0

SD 5.1 0 0.75 0.04 0.19 1 0.43 0.9 Real

RMSE 0.4 True SD 0.63 Separati

on 1.59 Person

Reliability .0.72


Mean 33.8 62 0 0.34 0.99 0.1 0.99 0.1

SD 14.8 0 1.41 0.12 0.09 0.8 0.29 1 Real

RMSE 0.36 True SD 0.87 Separati

on 3.79 Item

reliability 0.94

Table 4 Summary of Item Difficulty for Abstract Reasoning using Two Samples

SAMPLE 1 SAMPLE 2 MEASURE SE MEASURE SE

ITEM 1 -0.27 0.47 -0.02 0.44 ITEM 2 0.49 0.41 0.67 0.4 ITEM 3 0.96 0.39 0.17 0.43 ITEM 4 -2.36 1.03 -0.98 0.56 ITEM 5 0.32 0.42 -0.22 0.46 ITEM 6 .-51 0.51 -0.44 0.48 ITEM 7 -0.27 0.47 -0.44 0.48 ITEM 8 2.01 0.4 1.9 0.4 ITEM 9 1.7 0.39 1.44 0.39

ITEM 10 -0.27 0.47 -0.02 0.44 ITEM 11 0.8 0.39 0.17 0.43 ITEM 12 1.25 0.38 0.98 0.39 ITEM 13 -0.27 0.47 0.51 0.41 ITEM 14 -0.51 0.51 -2.55 0.03 ITEM 15 -0.79 0.55 -0.44 0.48 ITEM 16 0.96 0.39 -0.22 0.46 ITEM 17 -0.27 0.47 -1.8 0.75 ITEM 18 -1.14 0.62 -0.98 0.56 ITEM 19 -1.6 0.75 -1.8 0.75 ITEM 20 0.32 0.42 0.34 0.42 ITEM 21 -0.79 0.55 -0.44 0.48 ITEM 22 -3.56 1.81 -2.55 1.03 ITEM 23 0.14 0.43 0.34 0.42 ITEM 24 0.32 0.42 2.22 0.41 ITEM 25 -1.14 0.62 0.83 0.39 ITEM 26 1.7 0.39 2.78 0.46 ITEM 27 -0.27 0.47 -0.22 0.46 ITEM 28 0.32 0.42 1.29 0.39 ITEM 29 -0.51 0.51 -0.69 0.51 ITEM 30 -0.27 0.47 0.17 0.43

Table 5 Summary of Item Difficulty for Numerical Reasoning using Two Samples

SAMPLE 1 SAMPLE 2

MEASURE SE MEASURE SE

ITEM 1 -0.91 0.51 -0.61 0.44

ITEM 2 -2.76 1.03 -1.02 0.48

ITEM 3 2.45 0.46 1.47 0.41

ITEM 4 0.76 0.4 -0.8 0.45

ITEM 5 0.92 0.39 0.23 0.39

ITEM 6 -0.45 0.46 -1.26 0.51

ITEM 7 -0.45 0.46 -0.61 0.44

ITEM 8 1.23 0.4 0.99 0.39

ITEM 9 0.28 0.41 -0.08 0.4

ITEM 10 0.45 0.4 0.39 0.39

ITEM 11 -0.67 0.48 -1.02 0.48

ITEM 12 -1.2 0.56 -0.25 0.41

ITEM 13 -1.55 0.63 -1.26 0.51

ITEM 14 0.28 0.41 0.54 0.39

Table 5 Summary of Item Difficulty for Numerical Reasoning using Two Samples

ITEM 15 0.61 0.4 0.23 0.39

ITEM 16 1.38 0.4 1.64 0.42

ITEM 17 -2.76 1.03 -1.26 0.51

ITEM 18 0.61 0.4 -0.61 0.44

ITEM 19 0.28 0.41 0.39 0.39

ITEM 20 -0.45 0.46 -0.42 0.42

ITEM 21 -0.45 0.46 0.84 0.39

ITEM 22 -0.91 0.51 -1.02 0.48

ITEM 23 -0.67 0.48 -0.25 0.41

ITEM 24 -0.06 0.43 -0.25 0.41

ITEM 25 -1.2 0.56 0.23 0.39

ITEM 26 2.06 0.43 1.82 0.43

ITEM 27 0.92 0.39 0.23 0.39

ITEM 28 -0.45 0.46 -0.42 0.42

ITEM 29 1.23 0.46 0.69 0.39

ITEM 30 1.07 0.39 1.47 0.41

Table 6 Summary of Item Difficulty for Verbal Reasoning using Two Samples

SAMPLE 1 SAMPLE 2 MEASURE SE MEASURE SE

ITEM 1 -0.43 0.4 -0.66 0.41 ITEM 2 0.17 0.38 0.69 0.38 ITEM 3 0.77 0.39 1.33 0.42 ITEM 4 -2.67 0.74 -2.53 0.74 ITEM 5 -3.41 1.02 -4.5 1.83 ITEM 6 1.55 0.44 1.7 0.45 ITEM 7 -2.21 0.62 -2.53 0.74 ITEM 8 -0.59 0.41 0.55 0.38 ITEM 9 1.64 0.45 0.55 0.38 ITEM 10 -0.94 0.43 -0.66 0.41 ITEM 11 -1.35 0.47 -2.08 0.62 ITEM 12 -1.13 0.45 -3.28 1.02 ITEM 13 0.77 0.39 0.69 0.38 ITEM 14 0.77 0.39 0.69 0.38 ITEM 15 -0.94 0.43 -2.53 0.74 ITEM 16 -1.58 0.51 -3.28 1.02 ITEM 17 -1.58 0.51 -2.08 0.62 ITEM 18 -1.35 0.47 -1.02 0.45 ITEM 19 1.64 0.45 1.33 0.42 ITEM 20 0.03 0.38 1 0.4 ITEM 21 1.86 0.48 1.16 0.41 ITEM 22 0.93 0.4 0.4 0.38 ITEM 23 -0.12 0.39 -1.02 0.45 ITEM 24 0.93 0.4 1.16 0.41 ITEM 25 1.86 0.48 1.5 0.43 ITEM 26 0.62 0.39 1.7 0.45 ITEM 27 1.64 0.45 1.91 0.48 ITEM 28 0.77 0.39 1.16 0.41 ITEM 29 -0.59 0.41 0.26 0.38 ITEM 30 -0.27 0.39 0.69 0.38 ITEM 31 -0.27 0.39 -0.49 0.4 ITEM 32 1.64 0.45 0.84 0.39 ITEM 33 -0.27 0.39 -0.18 0.39 ITEM 34 0.03 0.38 -0.18 0.39 ITEM 35 -0.43 0.4 0.26 0.38 ITEM 36 2.74 0.63 2.79 0.63 ITEM 37 -0.43 0.43 -0.49 0.4 ITEM 38 0.93 0.4 1.16 0.41 ITEM 39 -0.94 0.43 -1.02 0.45 ITEM 40 0.17 0.38 0.55 0.38

Discussion   In terms of reliability measures obtained reliability using

CTT and IRT shows moderately high estimates. This suggests that there is a higher chance that persons estimated with higher measures actually have really higher measures than persons with low measures.

  Results also reveal that in terms of item and person separation, the sample can still be separated into groups and the test can still be divided into groups.

Discussion   In terms of item discrimination, the same items were

found to have poor discrimination index for numerical reasoning and verbal reasoning using CTT and IRT. Therefore, these items should be subjected to revision.

  However, for abstract reasoning 2 out of 5 items considered poor using CTT was also considered poor using IRT. In terms of item difficulty, similar items considered difficult were seen using both models.

Discussion   However, there is discrepancy in the number of items

considered difficult for both CTT and IRT. These findings suggest that there is a relative degree of stability across CTT and IRT in terms of item discrimination.

  Overall results showed that there appears to have consistency in the results using both CTT and IRT.

Discussion   However, in this study, one of the advantages of using the

IRT over CTT was evidently seen. IRT is sample- free nature of its results. This means that item parameters are invariant when computed using different groups of different abilities.

Thank you!

Date post:	10-Apr-2015
Category:	Documents
Upload:	pemea2008
View:	285 times
Download:	1 times

Comparing CTT and IRT Using the Aptitude Test

Documents