Date post: | 25-Nov-2014 |
Category: |
Documents |
Upload: | ikhwan-syafiq-zulkifli |
View: | 69 times |
Download: | 0 times |
1.0 INTRODUCTION
1.1 Purpose
To investigate the effectiveness of a set of 30 multiple-choice questions on
English for Science and Mathematics subject in an upper secondary class in
Sekolah Menengah Kebangsaan Gajah Berang.
1.2 Objectives
1.2.1 To plan and develop a set of 30 multiple-choice questions on English or
Science and Mathematics based on the syllabus, content (topics ad sub-
topics), instructional objectives, and Table of Specifications.
1.2.2 To assemble the 30 multiple-choice questions.
1.2.3 To administer the 30 multiple choice questions
1.2.4 To score and grade the performance of the students in the 30 multiple
choice questions.
1.2.5 To present the performance of the students using descriptive statistics
such as measures of central tendency (Mean, Median, Mode), Measures
of Dispersions/ Variability (Range, Variance, Standard Deviation) and Z-
scores & T- scores.
1.2.6 To analyze the 30 multiple choice questions using item analysis (item
difficulty & item discrimination) and distracter analysis.
1.2.7 To discuss the results of the descriptive statistics, item analysis and
distracter analysis.
1.2.8 To provide a conclusion for the project.
2.0 METHODOLOGY
2.1 Subjects
In conducting this research, we assessed 20 students of form 5 from Sek. Men.
Keb. Gajah Berang. The questions are on English for Science and Technology
(EST). The sample students provided by the teacher are the representatives from
form 5 Science 1, 5 Science 2, 5 Science 3, and 5 Science 4. Even though they
may seem differ from classroom segregation of academic abilities, but according
to the teacher, Madam Fang, they are from homogenous group. They possessed
similar abilities for they are the science stream students. The teacher picks the
students from each science stream class randomly. 5 representatives have been
chosen from each four classes mentioned above in order to assess overall
students’ performance precisely regardless of their academic abilities. It is also
for the purpose of to spread or vary our students sampling in this research to be
more accurate. As stated earlier, there are 20 students altogether. Four female
students and the rest which are sixteen of them are male students. According to
the school’s principal and the Head teacher of English Language of the school,
students are from middle class family and few of them are from upper middle
class family. Therefore, financial problem is not a big deal for most of them. As
for the time allocation of EST subject per class, Madam De Kwee Poh (EST
teacher for the 5 Science 1) said that 3 hours per week were allocated for each
class.
2.2 Materials
In developing the questions, we done our surveys in the bookshops and
picked several exercise books that follow the new Form 4 and form 5 KBSM
syllabuses. After finish discussing on which sample questions is the best and that
relate to the new KBSM syllabuses as well as the Bloom’s Taxonomy of
Educational Objectives, we have decided to choose form 4 EST exercise book,
published by Oxford Fajar. The test consist of 30 multiple choice questions,
varies from their syllabus and Bloom’s Taxonomy of Educational Objectives level
and the tests is mid-year assessment for Form students. According to our table
of specification, there are six questions fall under Knowledge, twenty two under
Comprehension and one question fall under Analysis, and Application stage,
respectively. According to the syllabus, there are eleven topic included in the test
given. The topics are Treasures of Nature, Energy Comes & Goes, its All
Chemistry, Force & Motion, Tiny Beings Great Terrors, It’s All In The Genes,
Meddling With Nature, Food and thoughts, The World at Your Fingertips (ICT),
The Frontier of Space, and Reading New Horizons. Basically, EST does not have
specific syllabus for each form 4 and form 5. According to the curriculum
specification, both form share the same syllabus and curriculum specifications.
And their textbook also is of the same thing. They use the same textbook in form
5 that they have been used in form 4. Therefore, although we use form 4
exercise book, our students sampling are from form 5. The details of 30 multiple-
choice questions later were explained in planning and staging stage.
2.3 PROCEDURES
2.3.1 Planning Stage
Before we start to construct the test question, all of us gather together to discuss
what subjects do we want to measure for this project. After a few discussions
with all of the group members, we come to an agreement and choose the English
Science and Technology (EST) subject as common ground to be measure. After
that we went to Sek. Men. Keb. Gajah Berang, and see the EST teacher to find
out the syllabus of the subject and how many topics have already been taught or
covered by the teacher because as we all know, a test should measure what has
been taught by the teacher. This information is important to us to facilitate the
test that we are going to construct for the students. Once we have analyzed the
information that we have gathered, we starts to develop our table of specification
which will serve as our guideline to make sure that our test contents will be
closely related to the classroom curriculum and educational objectives. The table
of specification is very crucial because it help us to determine what the major
content areas are to be covered in the test. These content areas are derived by
carefully reviewing the educational objective and selecting major content to be
included in the test which will measure different level of Bloom’s Taxonomy of
education. Thus, it is essential to refer to the table of specification to ensure as
wide a sampling of the potential content as possible. In order to get the general
ideas of how the EST test should look like, we go through several of EST past
year papers and closely discuss with the teacher what are the suitable question
to be located or set in our test paper to make sure its reliability and validity. After
all of the process above have been established, we start to construct our
question for the test and distribute it to the teacher to be checked to make sure
its accuracy standards and whether it measure what are suppose to be measure
for the subject.
2.3.2 Assembling Stage
The test papers consists of 30 multiple choice questions and have to be
completed in 1 hour. For question number 1 it comes from Man and Human Body
under topic 9 in the syllabus which measure the comprehension level of Bloom’s
Taxonomy. While question number 2 is from topic 5 Natural Resources and
Industrial Process which also measure the comprehension level of the cognitive
taxonomy. Question 3 also comes under topic 5 but it measures the lowest level
of Bloom’s Taxonomy which is Knowledge. Furthermore, Topic 15; The Universe
Astronomy Aerospace is set for question number 4 which will measures the
comprehension level of the Taxonomy. We move to question number 5 which
going to test the student’s comprehension level of the Bloom’s Taxonomy from
topic 3 Natural Resources.
Now we move to question number 6 which come under topic 6, Matter & Mass
which also measure the comprehension level of Bloom’s Taxonomy. As question
number 7, it measures the comprehension level of the Taxonomy under Topic 8,
The Human Body from Topic 13, Technology and Communication is set for
question number 8 which will measures the knowledge level of Bloom’s
Taxonomy. Question number 9 is about to measures the knowledge level of the
Taxonomy under topic 9, Man & Human Body. For question numbers 9 and 10, it
comes under topic 9 too, which measure the comprehension level. Same goes to
questions number 12 and 13 which will measure the comprehension level of the
Taxonomy under topic 10, Man & Living Organism.
Topic 11, Nutrition & Food is placed in question number 14 which will measures
the student’s level of comprehension level of Bloom’s Taxonomy. Meanwhile
Knowledge level which is the lowest level of the Bloom’s Taxonomy is measured
in question 15, under topic 16 that is The Universe, Astronomy and Aerospace.
In the other hand, for question number 16, it comes under topic 7, Force &
Motion which will measures the comprehension level of the Taxonomy. Whereas,
topic 6 Matter & Mass comes under question 17 which will measures the
application level of Bloom’s Taxonomy. As for questions number 18, 19 and 20 it
all come under the same topic that is Matter & Mass from topic 6.
Comprehension level is test in question 18 and 19 while knowledge level of the
cognitive level is measure in question 20.
Apart from that, Man & Living Organism from topic 10 is set in question 21 which
will measures the knowledge level of the Bloom’s Taxonomy. As for questions
number 22 and 23, it covered topic 8 that is Human Body which both measure
the comprehension level of the Boom’s Taxonomy. Meanwhile topic 10, Man &
Living Organism are placed in question 24 and 25. In question 24, analysis level
of Bloom’s Taxonomy is tested whereas in question 25 will measure the
comprehension level. Nutrition & Food from topic 11 is covered in question 26
until 30. The questions will all measure the comprehension level of the Bloom’s
Taxonomy.
2.3.3 Administering
When the test is ready, all the 30 multiple-choice questions are given to
the students. Firstly we have to make sure that the Form 4 Science students of
Sekolah Menengah Kebangsaan Gajah Berang are ready for the test. There are
some suggestions to help students psychologically prepare for the test.
Firstly we maintain a positive attitude. We went to the school a week
before we distribute the test. Letting the students know that there will be a test
next week can encourage them a positive test-taking attitude. It helps keep the
main purposes of classroom testing in mind; to evaluate achievement and
instructional procedures and to provide feedback to us and also the students. By
doing this, falling victims to such testing traps can be avoided and maintain a
positive test-taking atmosphere. Secondly is maximizing achievement aspect of
the test. Encourage the students to do their best in test and not to immobilize
with fear. The test is something to be taken clearly and this should be clear to the
class.
Technically we went to the school to inform about the test a week earlier.
Such preparation can avoid surprises and also the students will have sufficient
advance of notice. This is not to say that the teacher should avoid frequent
quizzes. When students are tested frequently, learning or study takes place at
more regular intervals rather than study a night before a test. Letting the students
know about the test late will affect their expected performance that very important
and this will not evaluate their achievement.
In the classroom, before distribute the tests, we inform the students about
the time limits, restroom policy, and some of our special considerations. It is
important to inform the rules because usually students often fiddle with the rule
after they receive their tests and may miss important instruction. We started to
distribute the tests from left to right because allocating tests in this way will
prevent any students to get last paper in the class.
After distributing the tests to student, we remind the students to check
their copies. The item that should be checks in the tests are page numbers, the
questions, answer key and confirm whether they get the correct paper. Then we
let the tests begin and we monitor/ set the time limits for the tests.
We monitor the students while they answering the tests. We have to
make sure that they are not copying each others’ answers. During the monitoring
stage, we also inform the students do not cheat; there are punishments for
cheating. The reasons of avoiding cheating are because we can have precise
results of the students’ performance and we can evaluate the results correctly.
2.3.4 Scoring and Grading
Scoring is one of the important parts in evaluating students’ performance.
We have distributed 30 multiple choice questions among 20 students of SMK
Gajah Berang . The questions given are tested on subject English for Science &
Technology (EST) and the students sampling prepared by the school
administrator are from mix abilities. Students are from four different science
classes altogether. 5 Science 1, 5 Science 2, 5 Science 3, and 5 Science 4. After
collecting back the questionnaires from the students, we determine the scoring.
There are several steps required in calculating the scoring and grading of each
student.
First, preparing the answer keys is the utmost important steps in
determining students’ scores. Without the answer keys, scoring can be difficult
task and might be unreliable to score. Answer key will save time during the
scoring session and also classify whether the questions need to be eliminated or
not. During constructing the answer key, researchers can identify whether the
time for the tests enough for students. Our 30 multiple-choice questions for the
subject English for Science and Technology is appropriate for the time limits; 1
hour.
Also during scoring the tests, we sit together and check each other’s
answer key in order to identify possible alternative answers and potential
problems. Since, we did not know the students, we are not affected by the halo
effect and hence their marks are not affected because basically there are not
much information about the students’ background and performance provided by
their teacher. We did not return the tests back to students as we need to compile
them in our final report.
After scoring the tests correctly and accurately, the next thing that needs
to be done is grading the results. Grading or analyzing the test will determine
whether the test is valid or not. Basically, no test that the teachers had
constructed to their students will be perfect. It will include inappropriate or
otherwise deficient items. Thus in grading stage, a technique called Item Analysis
is very important. Item analysis is used to identify items that are defiant in some
ways. For example miskeying, guessing, and ambiguity.
Based on the results of the test, most of the question is functioning well.
The question is clear enough for students to understand it. The distracters of
every questions are well functioning and it is not difficult to the upper 10 students
to answer them. As informed by the teacher, students are already covered the
entire syllabus in the textbook since last year. Thus the possibility of guessing
item to occur is less. Unfortunately there are several questions that are
miskeyed, characterized by guessing and ambiguous.
Miskeying occurs when most students who did well in the test will likely to
choose wrong answer (distracter) rather than the correct answer. In question
number 2, most of upper students tend to choose distractor (A, B) than the
correct answer (C):
Question 2.
A B C* D
Upper half 4 5 1 1
Questions for number 16 and 19 also miskeyed. Most of the students in
upper class choose distracters (B, A) relatively than the correct answers (A, C):
Question 16.
A * B C D
Upper half 1 9 0 0
Question 19.
A B C* D
Upper half 6 0 3 1
In these cases, the key is not positively discriminate and the distractors
are attracting the students in upper half; discriminate positively. Basically revision
is necessary and if possible, eliminates the items.
There is one question we characterized it as guessing. In guessing, it is
most likely to occur when the item is (a) not covered in the class, (b) so difficult
that even the students have no idea what the correct answer is, or (c) so trivial
that students are unable to choose the option provided. As for question number
22, the question is not clear enough and the distracters are not well functioning:
Question 22.
A * B C D
Upper half 5 2 1 2
The question should be revised or eliminated. The option (A) is not clear
enough and most of the distracters have almost same level of frequency as the
option.
As for question number 27, the item is ambiguous. the distractor (D) is not
well functioning because it attracts same total numbers of students selecting the
correct answer. In this item, students who do well miss the item that are drawn
almost entirely to one of the distractor. thus the item should de revised.
Question 22.
A * B C D
Upper half 4 1 0 4
3.0 RESULTS
3.1 Frequency Table & Histogram
Class Intervals Tally Frequency
24 – 26 /// 3
21 – 23 /////
/////
///
13
18 – 20 /// 3
15 – 17 / 1
Table 3.1: Frequency Distributions of Students Score
Histogram Showing the Distribution of Scores of Form 5 Students of SMK Gajah Berang in EST test in 30 MCQs.
No. Of students/ Frequency
20
18
16
14
12
10
8
6
4
2
0 Scores
12 – 14 18 – 20 24 – 26
15 – 17 21 – 23 27 – 29
3.2 Measures of Central Tendency
3.2.1 Mean
21.8
3.2.2 Median
22
3.2.3 Mode
23
3.3 Measures of Dispersion/Variability
3.3.1 Range
9
3.3.2 Variance
4.8
3.3.3 Standard Deviation
2.19
3.4 Z-score and T-score
Z-score = X- X̄7 SD
T-score = 10Z+50
No. X̄ X̄- X̄7 Z- score T- score
1 26 4.2 1.92 69.2
2 25 3.2 1.46 64.6
3 24 2.2 1.00 60
4 23 1.2 0.55 55.5
5 23 1.2 0.55 55.5
6 23 1.2 0.55 55.5
7 23 1.2 0.55 55.5
8 23 1.2 0.55 55.5
9 22 0.2 0.09 50.9
10 22 0.2 0.09 50.9
11 22 0.2 0.09 50.9
12 22 0.2 0.09 50.9
13 21 - 0.8 - 0.37 46.3
14 21 - 0.8 - 0.37 46.3
15 21 - 0.8 - 0.37 46.3
16 21 - 0.8 - 0.37 46.3
17 20 - 1.8 - 0.82 41.8
18 19 - 2.8 - 1.28 37.2
19 18 - 3.8 - 1. 74 32.6
20 17 - 4.8 - 2.19 28.1
Table 3.4: Table showing Z-score and T-score of the subject EST for 20 students in SMK GAJAH BERANG
3.5 Item Analysis
3.5.1 Item Difficulty, P
P= No. of students choosing the correct answer No. of students
Table 3.5.1: Item analysis and distracters analysis of 30 multiple-choice questions of EST subject (item difficulty)
The difficulty of a test item that is scored right or wrong is indicated by the
fraction of students who get the item right. There are 20 questions in the item that
falls into the easy category which falls into the range of >0.70. the questions are
question number 3, 4, 5, 6, 8, 9, 12, 13, 14, 15, 16, 17, 18, 20, 21, 25, 26, 28, 29
No. ITEM DIFFICULTY (p)1 0.552 0.103 0.754 0.905 0.956 0.807 0.358 0.709 1.0010 0.5511 0.5512 1.0013 1.0014 1.0015 0.9516 0.2517 1.0018 0.8519 0.3020 0.8021 0.8022 0.4023 0.5024 0.9025 0.9026 1.0027 0.2028 0.9029 0.8530 0.90
and 30. 8 questions fall into the moderate difficulty category, these ranges from
0.30 to 0.69. While, there is only 3 questions that falls into the difficult category,
which are question number 2, 16 and 20. Whereby the calculation of item
difficulty will show the value of less than 0.29
3.5.2 Item Discrimination, D
D= (No. of students who chose the correct answer in the upper group) - (No. of student who chose the correct answer in the lower group) (No. of students in each group)
Table 3.5.2: Table showing item difficulty and discrimination of 30 EST question distributed to
20 students of SMK GAJAH BERANG
No. ITEM DISCRIMINATION (d)1 0.302 0.003 0.304 0.005 0.106 0.007 0.108 0.409 0.0010 0.7011 0.1012 0.0013 0.0014 0.0015 -0.9016 -0.3017 0.0018 0.3019 0.0020 0.0021 0.4022 0.2023 0.2024 0.2025 0.2026 0.0027 0.3028 0.2029 -0.1030 0.00
There are 15 questions that have no discrimination (0.00 or negative values) at all. Those items are questions number 2, 4, 6, 9, 12, 13, 14, 15, 16, 17, 19, 20, 26, 29, 30. Meanwhile, there are 9 questions that falls into the moderate discrimination which ranges from 0.2 to 0.39. Those questions are questions number 1, 3, 18, 22, 23, 24, 25, 27, 28. Low discrimination, 0.1 to 0.19 were found in question number 5, 7 and 11. Lastly, there are only three questions with high discrimination, more than 0.4. The questions numbers are 8, 10 and 21.
4.0 DISCUSSION
4.1 Histogram
XO mdn mode
15 20 25 30
Negatively skewed
: 21.8X̄�Median: 22Mode: 23
Based on the graph, a negatively skewed distribution showed that the
mean has the lowest score that is 21.8 while the median in the middle with the
intermediate score 22 and the mode is the highest score 23. The negatively
skewed distribution indicates that the class did very well in the test with a majority
of them have high scores and only few had lower scores. Most of the students
have scored between 21 and 23 meaning that they can be class as a good
students or homogenous which mean all of them have almost the same ability
compare to one another. Again there could be many reasons for this. The test
may have been too easy due to the familiarity of the type of question since all of
the student have covered all the topic and done a lot of past year papers and
exercises in the classroom. Therefore it easy for them to score the test when
there are might be similar question in the test that they have done on their own.
Moreover the students are also exceptionally brilliant since they all come from
the first science class which their placement in the class was based on their
performance and achievement in the school academic.
4.2 Measures of Central Tendency
The mean is the average score of the student in the test. We can see
from the graph that most of the student had average score which is 21.8. This
indicates that almost all the student did well in the test with an average scored.
The mean has several characteristics that make it the measure of central
tendency most frequently used. One of these characteristic is stability. Since
each score in the distribution enters into the computation of the mean, it is more
stable over time than other measures of central tendency which consider only
one or two scores. Another characteristic is that the sum of each score’s distance
from the mean is equal to zero. A third characteristic is that the mean is affected
by extreme scores. This means a few very low scores of 20 or below in the
negatively skewed distribution will pull the mean down toward them. Thus, the
mean score gives an impression that the typical student scored about 21 and
pass the test with an A grade while the student below the mean score still pass
the test but with the C grade.
The median is the score that splits a distribution in half. 50 percent of the
scores lie above the median and 50 percent of the score lie below the median. It
also can be describe as the middle scores since its falls in the middle of the
distribution scores. The score distribution show that 50 percent of the students
scored 22 and above on the test and which mean half of the students past the
test with an A grade. While the other half falls between A and B grade.
The mode is the most frequently scores occur in the distribution. Based
on the graph we has unimodal mode which mean only one score that most
frequently occurred in the student’s scores that is 23. The mode also indicate that
the many students score highly in the test with an A grade. So we can conclude
that most of the students in this class are good students and most of them pass
the test with an A grade.
4.3 Measures of Standard Deviation
The purpose of measures of variability is to show how the scores are spread from the mean. It is important because the measures of variability will determine in which group the majority of the students are, the good or the weak.
Range is the simplest measures of variance, calculated by subtracting the lowest score from the highest score. The range provides a quick estimate of variability but is undependable because it is based on the two positions of two extreme scores. The addition or subtraction of a single score can change the range significantly. As for our research, we need to arrange the students’ scores from the highest score to the lowest score; starting from the score 26 over 30 until 17 over 30. The range of the data is 9.
Standard Deviation (SD) is the most useful measure of variability. The calculation of the standard deviation does not make its meaning readily apparent, but essentially it is an average of the degree to which a set of scores deviate from the mean the procedure for calculating a standard deviation involves squaring each score and taking a square root. In overall, the calculation of the standard deviation needs the help from scientific calculator.
In order to calculate the standard deviation, firstly we need to calculate the mean first. Next subtract the mean from the raw scores, accordingly from the highest score to the lowest score. Then, square the results from the subtraction. Lastly is taking the square roots from each score so thus the calculation of the standard deviation can be ended. Here is the formula of standard deviation:
The variance from the total score of the tests is 4.80. The standard deviation of the calculation is 2.19. Having the correct calculation of standard deviation can help to evaluate the students’ performance on the tests. Standard deviation is also important in calculating the Z-score and the T-score.
4.4.1 Z-score and T-score
Z-score is the simplest of the standard scores. This score expresses test performance simply and directly as the number of standard deviation units a raw score is above, or below the mean.
A Z-score is always positive when the raw score is bigger than mean. In our tests, we have 12 students that have positive Z-scores; 1.92, 1.46, 1.00, five of them get 0.55, and four of them get 0.09. As a Z-score is always positive when the raw score is bigger than mean, in contrast, a Z-score is always negative when the raw score is smaller than the mean. The lowest result of Z-score is -2.19, following by -1.74, -1.28, and four students share same results; -0.37. Forgetting the negative sign (-) can cause serious errors in test interpretation. For this reasons, Z-scores are seldom used directly in tests norms but are usually transformed into a standard score system that use only positive numbers; the term of T-score.
T-score has become to refer to any set of normally distributed standard score that has a mean of 50 and a standard deviation of 10. T-score can be obtained by multiplying the Z-score by 10 and adding the calculation to 50.
One reason that T-score is preferable to Z-score for reporting the test results is that only positive integers (+) are produced. The results of T-score of Sekolah Menengah Kebangsaan Gajah Berang were calculated and listed from the highest to the lowest.
4.5 ITEM ANALYSIS AND DISTRACTER ANALYSIS
Question 1
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.55 proves that it is appropriate for the testing application which is analysis. The question asks students to conclude what the short passage is about.
2. Does the item discriminate adequately?
With item discrimination of 0.3, the item does a satisfactory job in discriminating between examinees who performed well on the test and those doing poorly
3. Are the distracters performing adequately
Option A does not function for the item as it does not attract any of the students to choose it. As for B it is a weak distracter because it attracts one student from the upper group while none from the lower. Option C is a good distracter because it attracts more of the weak students that the good students.
4. Overall evaluation
This item need to checked and revised because there are weaknesses in the distracters as mentioned above.
Question 2
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.1 proves that the item is difficult. However, it is appropriate for the synthesis application. The item asks students to come up with a motto from what they have read.
2. Does the item discriminate adequately?
There is no discrimination for this item because the number of students between the upper and lower group choosing the correct answer is the same.
3. Are the distracters performing adequately
Option A is a good distracter because it attracts more of the lower group students. Option B is a weak distracter because more of the upper group students are attracted to it. As for D, it is a non-functioning distracter because it does not attract any of students from both the group.
4. Overall evaluation
This item is eliminated because it cannot discriminate between those who performed well those performing poorly in the test.
Question 3
1. Is the item difficulty level appropriate for the testing application
A p of 0.75 shows that the item is easy as it is only testing the students knowledge when they read the passage.
2. Does the item discriminate adequately?
With item discrimination of 0.3, the item does a satisfactory job in discriminating between examinees who performed well on the test and those doing poorly
3. Are the distracters performing adequately?
Option A is functioning well. C is a non-functioning distracter while D is a good distracter
4. Overall evaluation
This item need to checked and revised because there are weaknesses in the distracters as mentioned above and the discriminating power of the item is showing only satisfactory job
Question 4
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.9 this shows that the item is too easy. The optimal mean p value for a multiple-choice question item with four choices is 0.74
2. Does the item discriminate adequately?
This item has no discrimination.
3. Are the distracters performing adequately
Option A and C is a weak distracter. Distracter D is non-functioning
4. Overall evaluation
This item needs to be eliminated because it has no discrimination. It is not effective to test students’ performance
Question 5
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.95 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0.1, it indicates that the item has low discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately with option A being weak and B and C is non-functioning
4. Overall evaluation
This item is eliminated or rewritten in a new way with improved distracters.
Question 6
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.8 proves that this item is easy
2. Does the item discriminate adequately?
The item does not discriminate adequately because it has 0 value of discrimination.
3. Are the distracters performing adequately
All of the distracters are not performing adequately with option A and B being non-functioning while option C is a weak distracter
4. Overall evaluation
This item is eliminated
Question 7
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.35 shows that this item has moderate difficulty
2. Does the item discriminate adequately?
D value of 0.1 suggests that this item has low discrimination
3. Are the distracters performing adequately
Distracter A and B are performing adequately while distracter C is not
4. Overall evaluation
This item is checked and revised
Question 8
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.7 proves that this item is easy
2. Does the item discriminate adequately?
The D value of 0.4 suggests that this item has high discrimination
3. Are the distracters performing adequately
Some of the distracters are not performing adequately with option A and B being non-functioning while only option C works as a good distracter
4. Overall evaluation
This item is retained. However, the distracters need to be improve
Question 9
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 1 proves that this item is too easy
2. Does the item discriminate adequately?
The item does not discriminate adequately because it has 0 value of discrimination.
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer the correct answer
4. Overall evaluation
This item is eliminated
Question 10
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.55 proves that this item has moderate difficulty
2. Does the item discriminate adequately?
The item discriminates adequately because it has 0.7 value of discrimination.
3. Are the distracters performing adequately
Other distracters are performing adequately except for distracter C
4. Overall evaluation
This item is retained but distracter C need to be improved
Question 11
1. Is the item difficulty level appropriate for the testing application
The item difficulty with 0.55 suggests that this item has moderate difficulty
2. Does the item discriminate adequately?
This item has 0.1 discrimination, making it a low discrimination
3. Are the distracters performing adequately
Distracter A and C is not performing adequately
4. Overall evaluation
This item is eliminated or rewritten
Question 12
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 1 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer correctly
4. Overall evaluation
This item is eliminated
Question 13
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 1 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer correctly
4. Overall evaluation
This item is eliminated
Question 14
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 1 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer correctly
4. Overall evaluation
This item is eliminated
Question 15
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 0.95 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of negative value -0.9, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students in the lower group answer correctly
4. Overall evaluation
This item is eliminated
Question 16
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 0.25 suggests that this item is difficult
2. Does the item discriminate adequately?
With item discrimination of negative value -0.3, it indicates that the item has no discrimination
3. Are the distracters performing adequately
Distracters are not performing adequately except for distracter D
4. Overall evaluation
This item is eliminated
Question 17
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 1 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer correctly
4. Overall evaluation
This item is eliminated
Question 18
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.85 this shows that the item is easy.
2. Does the item discriminate adequately?
This item has moderate discrimination, 0.3
3. Are the distracters performing adequately
Not all of the distracters performing adequately. Option B is non-functioning, D is a weak distracter with only option C being a good distracter
4. Overall evaluation
This item needs to be checked and revised
Question 19
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.3 shows that the item has moderate difficulty
2. Does the item discriminate adequately?
This item has no discrimination, 0
3. Are the distracters performing adequately
The distracters are not performing adequately. A is a weak distracter. B is non-functioning and D works as a good distracter.
4. Overall evaluation
This item is eliminated
Question 20
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.8 this shows that the item is easy.
2. Does the item discriminate adequately?
This item has zero value of discrimination
3. Are the distracters performing adequately
Not all of the distracters performing adequately. A and D is a weak distracter while C is non-functioning
4. Overall evaluation
This item is eliminated
Question 21
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.85 this shows that the item is easy.
2. Does the item discriminate adequately?
This item has moderate discrimination, 0.3
3. Are the distracters performing adequately
Not all of the distracters performing adequately. Option B is non-functioning, D is a weak distracter with only option C being a good distracter
4. Overall evaluation
This item needs to be checked and revised
Question 22
1. Is the item difficulty level appropriate for the testing application
Item difficulty of 0.4 shows that the item has moderate difficulty
2. Does the item discriminate adequately?
This item has moderate discrimination, 0.2
3. Are the distracters performing adequately
Distracter B and D is not performing adequately because it attracts the same amount of students from both the upper and lower group
4. Overall evaluation
This item needs to be checked and revised
Question 23
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.5 this shows that the item has moderate difficulty
2. Does the item discriminate adequately?
This item has moderate discrimination, 0.2
3. Are the distracters performing adequately
Distracter C and D are not performing adequately. Distracter A is performing adequately because it attracts more students from the lower group
4. Overall evaluationThis item needs to be checked and revised
Question 24
1. Is the item difficulty level appropriate for the testing application
The item difficulty is 0.9, this shows that the item is easy.
2. Does the item discriminate adequately?
This item has moderate discrimination, 0.2
3. Are the distracters performing adequately
Only option A performs adequately. C and D is a non-functioning distracter
4. Overall evaluation
This item needs to be checked and revised
Question 25
1. Is the item difficulty level appropriate for the testing application
The item difficulty of 0.9 shows that the item is too easy.
2. Does the item discriminate adequately?
This item has moderate discrimination, 0.2
3. Are the distracters performing adequately
All of the distracters are not performing adequately. Option B is non-functioning, while C and D is a weak distracter
4. Overall evaluation
This item needs to be checked and revised
Question 26
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 1 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer correctly
4. Overall evaluation
This item is eliminated
Question 27
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 1 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of the students answer correctly
4. Overall evaluation
This item is eliminated
Question 28
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 9 suggests that this item is easy
2. Does the item discriminate adequately?
With item discrimination of 0.2, it indicates that the item has moderate discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all of it is weak
4. Overall evaluation
This item needs to be checked and revised. The distracters need to be change or rewritten
Question 29
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 0.85 suggests that this item is easy
2. Does the item discriminate adequately?
With item discrimination of -0.1, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately because all option A is a weak distracter while B and D is non-functioning
4. Overall evaluation
This item needs to be checked and revised
Question 30
1. Is the item difficulty level appropriate for the testing application
The item difficulty with value of 0.9 suggests that this item is too easy
2. Does the item discriminate adequately?
With item discrimination of 0, it indicates that the item has no discrimination
3. Are the distracters performing adequately
All of the distracters are not performing adequately. A and B is a weak distracter while D is non-functioning
4. Overall evaluation
This item is eliminated
4.6 LIMITATIONS
4.6.1 Inexperience
Our inexperience with coming up with a good question is one of the
limitations faced. This has been point out by the teacher after we go to
the school for follow-ups with the teacher. Puan Ee pointed out that we
should not extract the question from the exercise book alone. If we want
to extract questions, we should have extracted it from few materials
4.6.2 Lack of Reliable Materials
Much to our mortification, many of the books in the market is not reliable.
For example, the questions that we have extracted there are only three
items that can be retained. Others need to be improved or eliminated.
4.6.3 Students’ lack of preparation
This can be seen when the students were quite shocked when we tell
them that they need to answer a test. Lack of preparation from the
students can affect their performance. Some of the students also said to
us that they take the test easily and basically they just guess or peek at
other students’ answer
Apart from that, we have forgotten when we photocopied the questions
we were not aware of question 11, 12 and 13 have an answer. This is due
to our carelessness at the planning stage when we were going through
the question for the answer, one of the member ticks it.
5.0 CONCLUSION
In conclusion, after conducting this research, we found that we have gained a lot
of meaningful information for our future use. The most crucial and harder part is on
planning the questions. It is because there are a lot of things to be considered on this
stage. The format of the questions that we want to construct, questions stringently
followed the Bloom’s Taxonomy of Educational Objectives, and which level of proficiency
of students that wants to be tested. All of these things are put into consideration when
constructing questions. After that, grading and scoring stage occur. At this stage, one
must carefully revise their calculation. Any number missing will affect the rest of the
calculation. Questions that have been chosen in the exercise book or revision book
should be examined thoroughly before putting it in the final draft of the test. Try to
minimize the numbers of questions that are miskeying, skewed, or ambiguous are often
found in any exercise or revision book. It is recommended for the teacher to use their
own item banks, if any. As we noticed, our students sampling is from homogenous
group. They possess similar abilities in terms of academic achievement. Finally, we have
learnt many things in this course especially things that we are going to apply for our
future use. We found that this course help us to be prepared as we are going to be a
teacher or educator later on.
Appendices
1. Syllabus2. Table of Specifications3. Sample 30 multiple-choice questions4. Answer to the 30 multiple-choice questions5. Sample multiple-choice score sheet6. Marked score sheets (MCQ) of the students7. Item bank questions8. Calculation of Z-score and T-score 9. Calculations of item difficulty and item discrimination
4. Answer to the 30 multiple-choice questions
Section A
1. A2. C3. B4. C5. C6. D7. D8. C9. C10. A11. D12. B13. C14. B15. B16. A17. A18. A19. C20. B21. B22. A23. A24. B25. A26. C27. A28. A29. C30. C
8. Calculation of Z-score and T-score
no X Z - Score T - Score1 26 26 - 21.8 10 (1.92) + 50
2.191.92 69.2
2 25 25 - 21.8 10 (1.46) + 502.191.46 64.6
3 24 24 - 21.8 10 (1) + 502.19
1 604 23 23 - 21.8 10 (0.55) + 50
2.190.55 55.5
5 23 23 - 21.8 10 (0.55) + 502.190.55 55.5
6 23 23 - 21.8 10 (0.55) + 502.190.55 55.5
7 23 23 - 21.8 10 (0.55) + 502.190.55 55.5
8 23 23 - 21.8 10 (0.55) + 502.190.55 55.5
9 22 22 - 21.8 10 (0.09) + 502.190.09 50.9
10 22 22 - 21.8 10 (0.09) + 502.190.09 50.9
11 22 22 - 21.8 10 (0.09) + 502.190.09 50.9
12 22 22 - 21.8 10 (0.09) + 502.190.09 50.9
13 21 21 - 21.8 10 (- 0.37) + 50
2.19-0.37 46.3
14 21 21 - 21.8 10 (- 0.37) + 502.19-0.37 46.3
15 21 21 - 21.8 10 (- 0.37) + 502.19-0.37 46.3
16 21 21 - 21.8 10 (- 0.37) + 502.19-0.37 46.3
17 20 20 - 21.8 10 (- 0.82) + 502.19-0.82 41.8
18 19 19 - 21.8 10 (- 1.28) + 502.19-1.28 37.2
19 18 18- 21.8 10 (- 1.74) + 502.19-1.74 32.6
20 17 17- 21.8 10 (- 2.19) + 502.19-2.19 28.1
9. Calculations of item difficulty and item discrimination
no Item Difficulty, P Item Discrimination, D
1 7+4 7−420 10
0.55 0.32 1+1 7−4
20 100.1 0.3
3 9+6 9−620 10
0.75 0.34 9+9 9−9
20 100.9 0
5 10+9 10−920 10
0.95 0.16 8+8 8−8
20 100.8 0
7 4+3 4−320 10
0.35 0.18 9+5 9−5
20 100.7 0.4
9 10+10 10−1020 101 0
10 9+2 9−220 10
0.55 0.711 6+5 6−5
20 100.55 0.1
12 10+10 10−1020 101 0
13 10+10 10−1020 101 0
14 10+10 10−1020 101 0
15 9+10 9−10
20 100.95 -0.9
16 1+4 1−420 10
0.25 −0.317 10+10 10−10
20 101 0
18 10+7 10−720 10
0.85 0.319 3+3 3−3
20 100.3 0
20 8+8 8−820 100.8 0
21 10+6 10−620 100.6 0.4
22 5+3 5−320 100.4 0.2
23 6+4 6−420 100.5 0.2
24 10+8 10−820 100.9 0.2
25 10+8 10−220 100.9 0.2
26 10+10 10−1020 101 0
27 4+2 4−220 100.3 0.2
28 10+8 10−820 100.9 0.2
29 8+9 8−920 10
0.85 −0.130 9+9 9−9
20 100.9 0.3
References
Kubiszyn, T., and Borich, G. (1990). Educational Testing and Measurement (3rd Edition). Moterey, CA: Harper Colins Publishers.
Miller, M. D., Linn, R. L., and Gronland, N. E. (2009). Measurement and Assessment in Teaching (10th Edition). Upper Saddle River, N. J.: Pearson Publication, Inc.
Reynolds, C. R., Livingston, R. B., and William, V. (2006). Measurement and Assessment in Education. Boston, MA: Pearson Education, Inc.
Sax, G. (1989). Principles of Educational and Psychological Measurement and Evaluation (3rd Edition). Belmont, CA: Wadsworth Publishing Company.
UNIVERSITI TEKNOLOGI MARA
KAMPUS BANDARAYA MELAKA
Prepared for:
DR. DAVID LOH ER FUU
PRINCIPLES OF TESTING AND EVALUATION (TSL 480)
UNIVERSITI TEKNOLOGI MARA
KAMPUS BANDARAYA MELAKA
Prepared by:
NOOR IZZATI MUHAMAD NASIR
2007297688
NOOR ALINA NAMAMI
2007297732
MUHAMMAD NABIL MUSTAFA
2007297686
ADI FARHAN GHAZALI
2007297782
Student of B. Ed. TESL
Faculty of Education
UiTM KAMPUS BANDARAYA MELAKA
20th April 2009