Post on 14-Apr-2018
transcript
7/27/2019 Development of Mathematics Diagnostic Tests
1/24
Development of Mathematics Diagnostic Test for DORSHS
Second Year High School Students Using Item Response Theory
Jeremias C. Ceniza
Donnell C. Cereno
Abstract
The study developed a diagnostic test that determined to measure learning in Intermediate
Algebra among second year high school students of Davao Oriental Regional Science High
School (DORSHS). The test established validity, discrimination indices, difficulty indices and
reliability. The construction of the test adapted the Research Methods for Educational Planning.
Content validity was handled by 3 experts in the content area of secondary mathematics. Primary
data for item analysis were extracted through 2 test tryouts: one on 59 third year students for
grammatical checking and compatibility benchmarking while another was on 78 second year
students for discrimination and difficulty indices and reliability. Analyses were attained through
the use of an IRT modeling software called ConQuest, spreadsheet and statistical software
known as SPSS. The study used the IRT two-parameter logistic model. The test was found to be
valid and highly reliable. Experts guaranteed that the instrument can gauge all content in the
learning competency of MathematicsII and that nothing was left untargeted. The reliability
coefficient of the test is 0.84. The range of difficulty of the items was at -1.37 to 1.35 and
discrimination indices were within 0.23 to 0.51. This standardized test is a tool that identifies the
zone of proximal development, mastered and least learned content areas in MathematicsII
among DORSHS second year students.
7/27/2019 Development of Mathematics Diagnostic Tests
2/24
The Davao Oriental Regional Science High School (DORSHS) of Region XI offers
additional subjects in Math, Science and English to its students as early as first year. For the past
five years of its existence, the DORSHS National Achievement Test (NAT) results in
Mathematics are much far from the 75% Mean Percentage Score (MPS) target. The researcher
has become interested in creating a diagnostic test for DORSHS second year students that is
valid and reliable since the school doesnt have any yet. This instrument w ould be used in
detecting the strengths and weaknesses of second year students before undergoing a review
program for NAT. In general, this instrument intends to help DORSHS during NAT review to
improve results in Mathematics.
The study aimed to construct a standardized diagnostic test in Mathematics for second
year students in Davao Oriental Regional Science High School (DORSHS). Specifically, it
sought to answer the following questions:
1. Is the diagnostic test valid?2. To what extent do the test items illustrate difficulty?3. To what degree does the test exhibit discrimination level?4. How reliable is the entire test according to IRT model?
7/27/2019 Development of Mathematics Diagnostic Tests
3/24
Conceptual Framework
Figure1.The Conceptual Paradigm of the Study
Item Response Theory (IRT)
After the popular (or classical) measurement models for constructing test and interpreting
test scores have served well its purpose for quite a long time, a new test theory had been
developed over the past forty years that was conceptually more powerful than classical test
theory. Based upon items rather than test scores, the new approach was known as item response
theory (Baker, 2001).
Diagnostic Test
Content Areas
Learning Areas in
MathematicsII
Test Objectives
Test Construction
Test Development
Content Validity
IRT Analysis
o Item Analysis- Difficulty Indices- Discrimination Indices
o Test Reliability- KR-20
Diagnostic Test
Results
Students Zoneof Proximal
Develo ment
Least Learned
and Mastered
Competencies
Settings of Item
Thinking Level
7/27/2019 Development of Mathematics Diagnostic Tests
4/24
In contrast to the limitations of classical test models, the item response theory has many
desirable features. These features would include (a) item characteristics that are not group-
dependent, (b) scores describing examinee proficiency that are not test-dependent, (c) a model
that is expressed at the item level rather than at the test level, (d) a model that does not require
strictly parallel tests for assessing reliability, and (e) a model that provides a measure of
precision for each ability score (Hambleton, Swaminathan & Jane Rogers, 1991). Having this
advancement in educational and psychological measurement, it is now safe to administer test
tryouts to different groups of various abilities before giving to the target examinees. Kim, Cohen,
& Park (1995) illustrated further that IRT allows researchers to conduct rigorous tests of
measurement equivalence across experimental groups. IRT methods can distinguish item bias
from true differences on the attribute measured for which the classical test theory (CTT) could
not.
Point Biserialsor point biserial correlation coefficient (rpb)is one common metric used
to assess item quality. The pt bis as it is sometimes called is the correlation between an item
score (1/0) and the total score on a test. Positive values are desirable and indicate that the item is
good at differentiating between high ability and low ability examinees (Bontempo, 2009).
Zone of Proximal Development, or ZPD, as established by the Russian psychologist Lev
Vygotsky, refers to the distance between what a child can do with assistance and what the child
can accomplish without assistance (Vygotsky, 1978). In other words, it is the learning of the
students on a certain competency at 50% mastery level. Under IRT perspective, test items that
the child can answer correctly at 50% chance are the items within the childs zone of proximal
7/27/2019 Development of Mathematics Diagnostic Tests
5/24
development. The child is, at this point, has a 50-50 chance of achieving the learning task
without teachers assistance.
Assessment
The main purpose of assessment is to improve the learning outcomes of students. In
assessing students learning, systematic process is being followed, specifically; the Diagnostic
test which is an assessment before a learning program, the Formative Test which is an
assessment during a learning program and the Summative Test which refers to assessment after a
learning program (Davies, Arbuckle & Bonneau, 2005).
Conducting a Diagnostic Test
The main purpose of diagnostic test is to pinpoint specific strengths and weaknesses of
the learner in accordance to his or her grade level requirements. They are scored using true test
score criteria, this means that they are not averaged or normed (Educational Diagnostic
Prescriptive Services, 2009). Izard, J. (2005) says that scores in a criterion referenced test are
interpreted as an individual performance of each student in the group on what he can do or not do
rather than comparing the results with other groups of students.
Stages in Test Construction
Izard (2005) in his module on Quantitative Research Methods in Educational Planning
laid an overview on test construction the test construction steps. His module describes the
different stages of developing a test that could be used by teachers in the classroom and even
7/27/2019 Development of Mathematics Diagnostic Tests
6/24
nationwide as achievement test. The methods revealed important details on how the test will be
constructed in such a way that it can accumulate results that are valid, fair and reliable.
Validity, Reliability and Usability
Whatever the test is, it should apply the qualities of a good measuring instrument. The
qualities of a good measuring instrument are, validity, reliability, and usability (Calmorin, 2004).
Validity is the extent to which a test measures what it claims to measure. One type of
validity is the content validity. It refers to the extent to which the test reflects the content
represented in curriculum statements and the skills implied by that content
Reliabilityis another important characteristic of a good test. This refers to the consistency
of a measure. A test is considered reliable if we get the same result repeatedly that means, each
time the test is administered to a subject, the results should be approximately the same (Cherry,
2005). A reliability coefficient within the range of 0.811.0 indicates high reliability; 0.610.80
signifies moderate reliability; 0.41 to 0.60 fair reliability; 0.10 to 0.40 slight reliability and
less than 0.10virtually no reliability (Shrout, 1998).
7/27/2019 Development of Mathematics Diagnostic Tests
7/24
Methods
Research Locale and Duration
Table 1.Timescale and Resource Requirement for the Test Development
TIMESCALE AND RESOURCES FOR TEST DEVELOPMENT
Stages Time Resources
I. Planning the Test
a. Developing Test ItemObjectives
1 week Basic Education Curriculum (BEC) list ofcompetencies
b.Constructing Test Grid orTable of Specification
1 week DepEd Memo on the Official number ofSchool Days
II. Preparing the Test
a. Content Analysis 1 week Learning Competencies, textbooksb. Item Writing and
Preparation of Answers Key
1 month Guidelines on test construction,
Mathematics TextbooksIII. Review and Testing Process
a. Item Review I 2 weeks Test construction team, Research adviser
b.Planning of Item Scoring 2 days Answers key, Spreadsheet softwarec.Production of Trial Tests I 1 day Word-processingd.Trial Testing I 1 day DORSHS 3rd Year studentse.Item Review II 3 days Researcherf.Production of Trial Tests II 1 day Word-processingg.Trial Testing II 1 day DORSHS 2nd Year studentsh. Item Analysis 2 weeks IRT software, Encoder
IV. Assembly of Final Test
a. Revising Test Items 5 days IRT concepts and interpretationsb.Finalization of the Test 2 days Data Findings and Resultsc. IRT Model Fitting 1 day IRTConQuestModeling Softwared. Identification of the Zone of
Proximal Development1 day
IRTConQuestModeling
Software,Learning Competenciese.Test Difficulty Settings 1 day IRT Results, Test Grid
Table 1 is the timescale and resource requirement for test development patterned
from the Quantitative Research Methods for Educational PlanningModule 6 (Izard, 2005). The
study was conducted in Davao Oriental Regional Science High School (DORSHS) campus at
Mati City during the month of February of School Year 2010-2011.
7/27/2019 Development of Mathematics Diagnostic Tests
8/24
Statistical Treatment
This study used IRT model in describing the data. Item Analysis and test reliability (as
explained by Brannick, 2006; Baker, 2001; Hambleton, Swaminathan, & Jane Rogers, 1991)
were computed through a computer software model and interpreted as discussed by the
following:
Item Analysis. Under the IRT Two-Parameter Model was the last part of the item review
that verified how each item performed in the final test tryout. The researcher together with an
IRT expert did the following:
i. Processing test responses through IRT model
In determining the difficulty and discrimination indices of test items, a computer
software known as ConQuest: Generalised Item ResponseModelling Software was
used to construct the Item Characteristic Curve (ICC) of each test item. In here, the
responses of all test takers for each item were recorded in a spreadsheet, converted
into text format through SPSS, and then were run through the ConQuestsoftware. The
software then generated the summary of statistical results for each item and the
corresponding ICC as shown on Figure 2.
Figure 2. Sample Item Characteristic Curve (ICC)
7/27/2019 Development of Mathematics Diagnostic Tests
9/24
The ICC, as shown in Figure 2, was the basis of item estimations for the two
parameters: item difficulty and item discrimination. The horizontal axis ( ) is the
latent ability of the examinees while the vertical axis P( ) is the probability of the
examinees to choose the correct answer. The ability ( ) whereP( ) is equal to 0.5 is
the value of the difficulty parameter. Also, the value estimated describing the
steepness of the curve is the value of the item discrimination parameter.
ii. Difficulty and Item Discrimination
With the item difficulty and item discrimination parameter values given by the
ICC, each item of the test was interpreted as the following:
Labels for item discrimination parameter (a) values
Verbal label Range of values
Negative less than zero Zero 0 Low 0.010.20 Moderate 0.210.60 High above 0.60
The discrimination parameters were sometimes called slope parameters. A jumpy
curve means that the expected test score responded to true ability unevenly. Flat
curves means that the expected score is not very sensitive to differences in true
ability. A steeper S-curve (like in Figure 2) means that the expected score is more
sensitive to differences in ability. In other words, the test discriminates or
distinguishes better between persons of different ability, which explains the term
discrimination parameter. Thus, discrimination parameter describes how well an
7/27/2019 Development of Mathematics Diagnostic Tests
10/24
item can differentiate between examinees having abilities below the item location and
those having abilities above the item location.
Labels for item difficulty parameter (b) values
Verbal Label Range of values
Very easy less than2 Easy 0.50 to2.00 Average 0.49 to 0.49 Difficult 0.50 to 2.00 Very difficult greater than 2.00
The item difficulty parameter (b) value tells us how easy or how difficult an
item is. Under item response theory, an items difficulty is a point on the ability scale
where the probability of correct response is 0.5. One can find the value of b on the
common ability axis at the point for which the predicted probability equals 0.5.
Test reliability. In classical test theory, there will be another test for reliability. In IRT,
there is local reliability, that is, an amount of information at each point of the underlying
continuum. With IRT model, each item of the test contained information. For the set of
parameters associated with each term in a model, ConQuest computed a separation reliability
index. This reliability was an index of the equality of the parameters. In the case of dichotomous
data like the test conducted, the Coefficient Alpha given by ConQuest is equal to KR-20 (Wu,
Adams, et. al., 2007).
Revising Test Items. The selection of the items suited for inclusion to the final output of
the test was determined through the verbal interpretation of each item. This is illustrated by
Table 2 below.
7/27/2019 Development of Mathematics Diagnostic Tests
11/24
Table 2.Decision Table for Difficulty and Discrimination Indices
Difficulty
Level
Discrimination
LevelDecision
Easy
Low Revise
Moderate Retain
High Retain
Average
Low Revise
Moderate Retain
High Retain
Difficult
Low Revise
Moderate Retain
High Retain
An item was rejected as any of the following was observed: (a) its discrimination
parameter value is negative or zero, and/or (b) its difficulty parameter value is very easy or very
difficult.
To preserve the validity of the entire test and to maintain the total number of items, all
items rejected were automatically replaced. The process of revision and replacement were
subjected to IRT item analyses yet still accord to the Test Grid.
Revised and replacement items were re-administered to the target examinees, the second
year students, and its test responses were re-run through IRT software for final verification of
whether these items are retained or not. Retesting of selected items was done as necessary in
preparation for the finalization of the test.
Evaluators of the Constructed Test
A group of experts and experienced persons in the field of test construction were in-
charged on the development of the test. This group served as consultants in connection to content
7/27/2019 Development of Mathematics Diagnostic Tests
12/24
validity of the test. Another group, composed of students, took the test for item reviews and test
reliability. These two teams were called as test evaluators.
Test Construction Team. The first set of evaluators was the group of persons who took
charge in examining the content validity of the test. This group was made up of experts on test
development or individuals with experience on testing or trainings related to. More importantly,
these persons were familiar with the content areas in secondary mathematics and at least a
graduate of Master of Arts in Mathematics or Master of Science Teaching in Mathematics.
Try Out Test Examinees. The second set of evaluators was the group of students in
Davao Oriental Regional Science High School (DORSHS). Specifically, the members of the
team involved all third year (first group) and all second year (second group) students of the said
school. The administration of the test to the first group was vital in making the test structurally
ready while the second group responses was used to distinguish the difficulty and discrimination
indices of test items and the reliability coefficient of the over-all test.
Validation of the Diagnostic Test
Validation of the diagnostic test did not utilize a statistical analysis. It relied on matching
the test items from the objectives and presenting the whole test to the group of experts in the
content areas of secondary mathematics for item review. The team guaranteed that the instrument
had strong content validity in which each item represented at least one topic actually being
investigated to students, rather than asking unrelated questions.
7/27/2019 Development of Mathematics Diagnostic Tests
13/24
Test Administration
The administration of the test tryouts was done after securing permission from the
concerned head of offices. The conduct of the test was officially approved by the OIC - Schools
Division Superintendent. Likewise, the principal of DORSHS also posed no objection to proceed
with the testing process. As a sort of ethical considerations, examinees involved in this study
who were minors were dealt with proper information about the research.
The first tryout was administrated to 59 third year students of DORSHS. The purpose of
administering the test was to determine the structural readiness of the test and assured its
compatibility to examinees thinking level. The next tryout was administered to 78 second year
high school students of DORSHS for interpretation of test results. Specifically, the test was
conducted for item analysis purposes.
The researcher introduced to the test takers certain guidelines in taking the test. In
answering the test, examinees were told to use the answer sheets as provided. They were
instructed to shade on the answer sheet the letter that corresponds to the best answer for every
test question. They were also been told to mark X on the previous answer if they decide to
change their answer and that they must only have one answer for each item, otherwise, it will be
marked wrong.
7/27/2019 Development of Mathematics Diagnostic Tests
14/24
Results and Discussion
Test Validation
Validity of the test was done through certain revisions of some items as suggested by the
test construction team in line with the following test attributes: grammar structuring, proper
usage of punctuation, principles on item construction, setting of item difficulty, typographical
precision and more importantly, on content criterion.
Difficulty and Discrimination Indices of Test Items
Table 3 shows the result of item analysis from the final tryout using IRT through a
computer program called ConQuest: Generalized Item Response Modeling Software.
Table 3.Difficulty and Discrimination Indices Result of the Final Tryout
Item
Number
Difficulty
Index
Difficulty
Level
Discriminatio
n Index
Discrimination
LevelAction
1 -1.08 Easy -0.01 Negative Reject2 -1.08 Easy 0.34 Moderate Retain
3 -1.41 Easy 0.20 Low Revise
4 0.58 Difficult 0.08 Low Revise
5 -0.12 Average 0.46 Moderate Retain
6 -0.18 Average 0.35 Moderate Retain
7 0.58 Easy 0.30 Moderate Retain
8 -1.29 Easy 0.34 Moderate Retain
9 0.57 Difficult 0.23 Moderate Retain
10 -0.58 Easy 0.27 Moderate Retain
11 -0.95 Easy 0.36 Moderate Retain
12 0.52 Difficult 0.13 Low Revise13 -2.53 Very Easy 0.14 Low Revise
14 -0.24 Average 0.46 Moderate Retain
15 0.66 Difficult 0.46 Moderate Retain
16 1.11 Difficult 0.24 Moderate Retain
17 0.83 Difficult 0.32 Moderate Retain
18 0.51 Difficult 0.33 Moderate Retain
7/27/2019 Development of Mathematics Diagnostic Tests
15/24
Item
Number
Difficulty
Index
Difficulty
Level
Discriminatio
n Index
Discrimination
LevelAction
19 -0.29 Average 0.33 Moderate Retain
20 -0.29 Average 0.34 Moderate Retain
21 0.27 Average 0.34 Moderate Retain
22 0.33 Average 0.50 Moderate Retain23 -0.18 Average 0.27 Moderate Retain
24 0.45 Average 0.30 Moderate Retain
25 1.35 Difficult 0.33 Moderate Retain
26 -0.70 Easy 0.34 Moderate Retain
27 -0.76 Easy 0.49 Moderate Retain
28 0.16 Average 0.23 Moderate Retain
29 0.57 Difficult 0.34 Moderate Retain
30 -0.18 Average 0.30 Moderate Retain
31 -0.82 Easy 0.41 Moderate Retain
32 -1.37 Easy 0.37 Moderate Retain
33 -0.12 Average 0.47 Moderate Retain34 0.33 Average 0.17 Low Revise
35 0.10 Average 0.39 Moderate Retain
36 -0.70 Easy 0.28 Moderate Retain
37 -0.18 Average 0.48 Moderate Retain
38 0.45 Average 0.17 Low Revise
39 -0.95 Easy 0.34 Moderate Retain
40 -2.12 Very Easy 0.14 Moderate Reject
41 -1.37 Easy 0.14 Low Revise
42 -0.01 Average 0.35 Moderate Retain
43 -1.22 Easy 0.35 Moderate Retain
44 0.44 Average 0.33 Moderate Retain
45 -0.46 Average 0.38 Moderate Retain
46 0.57 Difficult 0.44 Moderate Retain
47 1.27 Difficult 0.38 Moderate Retain
48 0.51 Difficult 0.13 Low Revise
49 -0.20 Average 0.31 Moderate Retain
50 -0.07 Average 0.41 Moderate Retain
51 1.19 Difficult 0.01 Low Revise
52 1.27 Difficult -0.02 Negative Reject
53 2.19 Very Difficult 0.30 Moderate Reject
54 0.45 Average 0.28 Moderate Retain55 0.45 Average 0.51 Moderate Retain
56 -0.76 Easy 0.39 Moderate Retain
57 0.88 Difficult 0.46 Moderate Retain
58 -0.12 Average 0.23 Moderate Retain
59 2.90 Very Difficult 0.42 Moderate Reject
60 -0.18 Average 0.37 Moderate Retain
7/27/2019 Development of Mathematics Diagnostic Tests
16/24
IRT discrimination and difficulty parameters for each item suggested that item numbers 1
(negative discrimination), 13 (very easy), 40 (very easy), 52 (negative discrimination), 53 (very
difficult) and 59 (very difficult) be rejected. The item numbers subjected for revision according
to Table 3 were numbers 3, 4, 12, 34, 38, 41, 48 and 51.
Revisions and Replacements of Weak Items
Data generated by Conquest facilitated the revisions or replacements of some items.
Items with negative or low discrimination like numbers 1, 3, 4, 12, 34, 38, 41, 48 and 52 were
treated with the aid of an IRT attribute called point biserial. The following table shows the IRT
Generalized Item Analysis Result of item number 1. Other items subject for revisions had similar
attributes with this table.
Table 4.IRT Generalized Item Analysis Result for Item No. 1
item:1 (1)
Cases for this item 78 Discrimination -0.01
Item Threshold(s): -1.08 Weighted MNSQ 1.14 Item Delta(s): -1.08
-----------------------------------------------------------------------Label Score Count % of tot Pt Bis t (p) PV1Avg:1 PV1 SD:1
-----------------------------------------------------------------------
1 1.00 56 71.79 -0.01 -0.06(.952) -0.00 0.72
2 0.00 2 2.56 -0.02 -0.21(.833) -0.31 0.05
3 0.00 2 2.56 -0.00 -0.00(.000) 0.11 1.29
4 0.00 18 23.08 0.02 0.14(.885) 0.02 0.64
=======================================================================
As observed in Table 4, item number 1 same as the other weak items had a low or
negative discrimination. It was noticed that the point biserial of correct answer is negative, or if
not, very close to zero while other wrong options become positive where ideally it should be
negative. This simply means that the wrong options were attractive as correct answer to
examinees with high ability. These options were reviewed then replaced for improvement and
7/27/2019 Development of Mathematics Diagnostic Tests
17/24
ease of the test takers. Other items were also improved by restructuring the manner of
questioning to lessen confusions in answering.
IRT Generalized Item Analysis Result also revealed that the Delta values for item
numbers 40, 53 and 59 were -2.12, 2.19 and 2.90, respectively. It has a very easy and very
difficult difficulty level. Thus, the level of questioning on these items was rephrased to fit the
students level.
Finalization of the Test
Table 5 shows the item difficulty and discrimination results after retesting of the revised
and replacement items.
Table 5. Retesting Results on Difficulty and Discrimination Indices
Item
Number
Difficulty
Index
Difficulty
Level
Discrimination
Index
Discrimination
Level
Action
1 -0.76 Easy 0.39 Moderate Retain
3 1.27 Difficult 0.38 Moderate Retain
4 1.35 Difficult 0.33 Moderate Retain
12 0.51 Difficult 0.33 Moderate Retain
13 -1.37 Easy 0.37 Moderate Retain
34 -0.18 Average 0.27 Moderate Retain
38 1.27 Difficult 0.38 Moderate Retain
40 -0.70 Easy 0.34 Moderate Retain
41 0.45 Average 0.30 Moderate Retain
48 0.57 Difficult 0.23 Moderate Retain
51 0.83 Difficult 0.32 Moderate Retain
52 0.88 Difficult 0.46 Moderate Retain
53 0.57 Difficult 0.44 Moderate Retain
59 0.57 Difficult 0.23 Moderate Retain
Test Reliability
As observed in the summary results by Table 6 below, the Coefficient Alpha is 0.84. This
is the Kuder-Richardson Formula 20 (KR-20) reliability coefficient (Wu et. al., 2007).
7/27/2019 Development of Mathematics Diagnostic Tests
18/24
Table 6. Summary Statistics from Item Analysis Results-----------------------------------------------------------------------
The following results are scaled to assume that a single response was provided for each item.
N 78
Mean 28.77
Standard Deviation 8.56
Variance 73.30
Skewness 0.93
Kurtosis 1.05
Coefficient Alpha 0.84
=======================================================================
Zone of Proximal Development
The following Table translates the items into its corresponding learning areas in
Mathematics-II that the students have shown 50% mastery which is, in fact, known as the Zone
of Proximal Development.
7/27/2019 Development of Mathematics Diagnostic Tests
19/24
Table 7. Students Achievement on Different Learning Areas in Mathematics-II
Students
Group
Mathematics Learning Area within the
Groups Zone of Proximal Development
Upper Top
Group (UTG)
- use systems of linear equations to solve problems- represent the solution set of a system of linear inequalities by graphing - simplify complex rational algebraic expressions- solve problems involving expressions with exponents - describe an arithmetic sequence by giving the formula for the nth term- solve problems involving geometric mean
Upper Middle
Group (UMG)
- represent the solution set of a system of linear inequalities by graphing - find the solution set of a quadratic equation- identify rational algebraic expression-perform operations on rational algebraic expressions- solve rational equations and check for extraneous solutions- solve problems involving rational algebraic expressions- solve equations involving variations- demonstrate understanding of expressions- rewrite algebraic expressions w/ zero and negative exponents- name two rational numbers where n x lies in between- simplify expression containing rational exponents using laws of exponents- solve problems involving radical equations- solve problems involving arithmetic means- solve problems involving arithmetic sequences- describe a geometric sequence given the first few terms- derive the formula for the sum of the terms of a geometric sequence- derive the formula for an infinite geometric series- solve problems involving geometric sequence- find the solution set of a quadratic equation
Lower Middle
Group (LMG)
-perform operations on radical expressions- define a system of linear equations in two variables- translate certain situations in real life to linear inequalities - draw the graph of a linear inequality in two variables - solve rational equations which can be reduced to quadratic equations- use quadratic equations to solve problems- simplify rational algebraic expressions- identify variation relationships of real life- represent variation relationships as equations- evaluate numerical expressions involving integral exponents- identify expressions which are perfect squares or perfect cubes- find the square root or cube root of expressions- rewrite expressions with rational exponents as radical expressions- simplify the radical expression- solve radical equations- list the next few terms of a sequence given several consecutive terms- derive a mathematical rule for generating the sequence- given few terms of an arithmetic sequence, find the common difference - given two terms of an arithmetic sequence, find the specified nth term
7/27/2019 Development of Mathematics Diagnostic Tests
20/24
(Continuation)
Lower MiddleGroup (LMG)
- derive the formula for the sum of the n terms of an arithmetic sequence - describe a geometric sequence given the first few terms- find the sum of the terms of a geometric sequence- define a system of linear inequalities- describe an arithmetic sequence by giving the first few terms
Lower Bottom
Group (LBG)
- solve systems of linear equations in two variables- distinguish a quadratic equation from a linear equation- translate verbal expressions into rational algebraic expressions- rationalize a fraction whose denominator contains square roots- define the sum of an arithmetic sequence
Table 7 was a mapping of students score to their zone of proximal development (ZPD) or
the learning competencies that they had 50% mastery level. Thus, the competencies above their
ZPD were the least learned learning areas of the group while those below their ZPD were the
competencies the group mastered.
Settings on Thinking Hierarchy
Table 8 shows the results of students thinking level measured by the item as set by
intuition and by IRT calculation. It was revealed in the table that out of 60, only 23 items had
equal setting of difficulty level between the researchers intuition and test responses result.
7/27/2019 Development of Mathematics Diagnostic Tests
21/24
Table 8. Item Difficulty Level (Intuition versus IRT perspective)
Item
Number
Level of
Difficulty by
Researchers
Intuition
Level of
Difficulty by
Test
Responses
Item
Number
Level of
Difficulty by
Researchers
Intuition
Level of
Difficulty by
Test
Responses1 1
stlevel 1
stlevel 31 3
rlevel 1
stlevel
2 2n
level 1s
level 32 2n
level 1s
level
3 3r
level 1st
level 33 2n
level 2n
level
4 2n
level 3r
level 34 3r
level 2n
level
5 1st
level 2n
level 35 3r
level 2n
level
6 2n
level 2n
level 36 2n
level 1st
level
7 3r
level 3r
level 37 3r
level 2n
level
8 1st
level 1st
level 38 1st
level 2n
level
9 2n
level 3r
level 39 2n
level 1st
level
10 2n
level 1st
level 40 2n
level 1st
level
11 3r level 1st level 41 3r level 1st level12 1
stlevel 3
rlevel 42 3
rlevel 2
nlevel
13 1st
level 1st
level 43 3r
level 1st
level
14 2n
level 2n
level 44 3r
level 2n
level
15 2n
level 3r
level 45 2n
level 2n
level
16 3r
level 3r
level 46 2n
level 3r
level
17 3r
level 3r
level 47 3r
level 3r
level
18 3r
level 3r
level 48 3r
level 3r
level
19 1st
level 2n
level 49 3r
level 2n
level
20 2n
level 2n
level 50 1st
level 2n
level
21 3r
level 2n
level 51 3r
level 3r
level
22 1st level 2n level 52 3r level 3r level
23 2n
level 2n
level 53 2n
level 3r
level
24 2n
level 2n
level 54 2n
level 2n
level
25 3r
level 3r
level 55 2n
level 2n
level
26 1st
level 1st
level 56 2n
level 1st
level
27 2n
level 1st
level 57 2n
level 3r
level
28 2n
level 2n
level 58 1s
level 2n
level
29 2n
level 3r
level 59 2n
level 3r
level
30 2n
level 2n
level 60 2n
level 2n
level
7/27/2019 Development of Mathematics Diagnostic Tests
22/24
Conclusions and Recommendations
Conclusion
In view of the findings of this study, the following conclusions were drawn:
1.The results of this study reflect that the diagnostic test being developed is valid. It is aninstrument that can measure the desired trait of second year students in MathematicsII.
2.The final output of the test showed that the over-all test difficulty were within the rangeof -1.37 to 1.35. The diagnostic test therefore has an average level of difficulty.
3.The discrimination indices were within the range of 0.23 to 0.51. Thus, the test ismoderately discriminating instrument.
4.With the KR-20 coefficient at 0.84, the developed diagnostic test, as introduced byShrout (1998), indicates high reliability.
5.Based from the previous items, the diagnostic test developed in this study is valid, highlyreliable and fair. Hence, this test is standardized and can now be used in assessing
Mathematics learning of second year students in DORSHS.
6.The test had identified the least learned, zone of proximal development and masteredcompetencies of DORSHS second year high school students in Mathematics-II.
7.The study revealed the considerable difference of intuitive item difficulty setting fromactual item difficulty results of students.
Recommendations
Based on the findings and conclusions of this study, the following are the general
recommendations:
7/27/2019 Development of Mathematics Diagnostic Tests
23/24
1. The standardized diagnostic test can already be used in assessing Mathematics learningof second year students in DORSHS prior from any Achievement Test review program.
2.Other researches like determining the coherence of the test to the second year NationalAchievement Test (NAT) can be made.
3.A further validation of the test using IRT is highly encouraged to second year students ofother schools or to another batch of second year students in DORSHS.
4.The test and its findings can also be utilized for any comparative study on item analysisbetween Classical Test Method and IRT.
5.The result of this study can be utilized for any possible research on students assessment.
7/27/2019 Development of Mathematics Diagnostic Tests
24/24
References
Baker, F. (2001). The Basics of Item Response Theory (2nd
ed.). United States of America: ERIC
Clearinghouse on Assessment and Evaluation.
Bontempo, Brian D., PhD (2009). Measurement Art. The Point-Biserial Correlation Coefficient.Retrieved June 30, 2009, from http://www.mountainmeasurement.com/blog/?p=148.
Brannick, M. (2006). Concepts from IRT that Move Beyond Classical Test Theory. Multiple
Regression and Research Methods. Retrieved November 22, 2009, from
http://luna.cas.usf.edu/~mbrannic/files/pmet/irt.htm.
Calmorin, L. P. (2004).Educational Research Measurement and Evaluation (3rd
ed.). Manila,
Philippines: National Book Store, Inc.
Cherry, K. (2009).Reliability - What Is Reliability. Retrived November 20, 2009 from
http://psychology.about.com/od/researchmethods/
Davies, A., Arbuckle, M., Bonneau, D.(2005).Assessment For Learning: Planning for
Professional Development. Retrieved October 14, 2009 fromhttp://electronicportfolios.org/afl/Assessment4learning.pdf.
Hambleton, R., Swaminathan, H., Jane Rogers, H. (1991).Fundamentals of Item ResponseTheory. United States of America: SAGE Publications, Inc.
Izard, J. (2005). Quantitative Research Methods in Educational Planning. Overview of TestConstruction. Paris, France. International Institute for Educational Planning/UNESCO.
Kim, S. H., Cohen, A. S., & Park, T. H. (1995). Detection of Differential Item Functioning inMultiple Groups.Journal of Educational Measurement, 32: 261 - 276.
Educational Diagnostic Prescriptive Services (2009).Educational Diagnostic Prescriptive.
Retrived October 20, 2009 from
http://homeschoolcreations.blogspot.com/2009/09/educational.
Shrout, PE (1998). Measurement Reliability and Agreement in Psychiatry. Statistics Methods in
Medical Results. United States of America: SAGE Publications, Inc.
Vygotsky, L.S. (1978).Mind and society: The development of higher psychological processes.Cambridge, MA: Harvard University Press.