Date post: | 02-Dec-2014 |
Category: |
Documents |
Upload: | dennis-bautista |
View: | 122 times |
Download: | 0 times |
CHAPTER 1
A Perspective on Educational Assessment,
Measurement, and Evaluation
As teaching is causing learning among learners, teachers need to be
thoroughly aware of the processes in determining how successful they are
in the aforementioned task. They need to know whether their students are
achieving successfully the knowledge, skills, and values inherent in their
lessons. For this reason, it is critical for beginning teachers, to build a
repertoire measurement and evaluation of student learning. This chapter is
geared towards equipping you with the basic concepts in educational
assessment, measurement, and evaluation.
Measurement, Assessment, and Evaluation
Measurement as used in education is the quantification of what
students learned through the use of tests, questionnaires, rating scales,
checklists, and other devices. A teacher, for example, who gave his class a
10 – item quiz after a lesson on the agreement of subject and verb is
undertaking measurement of what was learned by the students on that
particular lesson.
Assessment, however, refers to the full range of information
gathered and synthesized by teachers about their students and their
classrooms (Arends, 1994). This information can be gathered in informal
ways, such as through observation or verbal exchange. It cal also be
gathered through formal ways, such as assignments, tests, and written
reports or outputs.
While measurement refers to the quantification of students’
performance and assessment as the gathering and synthesizing of
information, evaluation is a process of making judgments, assigning value
or deciding on the worth of students’ performance. Thus, when a teacher
assigns a grade to the score you obtained in a chapter quiz or term
examination, he is performing an evaluative act. This is because he places
value on the information gathered on the test.
Measurement answers the question, how much does a student
learn or know? Assessment looks into how much change has occurred on
the student’s acquisition of a skill, knowledge or value before and after a
given learning experience. Since evaluation is concerned with making
judgments on the worth or value of a performance, it answers the question,
how good, adequate or desirable is it? Measurement and assessment are,
therefore, both essential to evaluation.
Educational Assessment: A context for Educational
Measurement and Evaluation
As a framework for educational measurement and evaluation,
educational assessment is quite difficult to define. According to Stiggins
and his colleagues (1996) assessment is a method of evaluating
personality in which an individual, living in a group meets and solve a
variety of lifelike problems. From the viewpoint of Cronbach, as cited by
Jaeger (1997), three principal features of assessment are identifiable: (1)
the use of a variety technique; (2) reliance on observations in structured
and unstructured situations; and (3) integration of information. The
aforementioned definition and features of assessment are applicable to a
classroom situation. The term personality in the definition of assessment
refers to an individual’s characteristics which may be cognitive, affective
and psychomotor. The classroom setting is essentially social, which
provides both structured and unstructured phases. Even problem – solving
is a major learning task. Holistic appraisal of a learner, his or her
environment, and his or her accomplishments is the principal objective of
educational assessment.
Bloom (1970) has this to say on the process of educational
assessment:
Assessment characteristically starts with an analysis of the
criterion and the environment in which an individual lives, learns,
and works. It attempts to determine the psychological pressures the
environment creates, the role expected, and the demands and
pressures – their hierarchical arrangement, consistency, as well as
conflict. It then proceeds to the determination of the kinds of
evidence that are appropriate about the individuals who are placed
in this environment, such as their relevant strengths and
weaknesses, their needs and personality characteristics, their skills
and abilities.
From the foregoing description of the process of educational
assessment, it is very clear that educational assessment concerns itself
with the total educational setting and is a more inclusive term. This is
because it subsumes measurement and evaluation. It focuses not only on
the nature of the learner but also on what is to be learned and how it is to
be learned. In a real since, it is diagnostic in intent or purpose. This is due
to the fact that through educational assessment the strengths and
weaknesses of an individual learner can be identified and at the same
time, the effectiveness of the instructional materials used and the
curriculum can be ascertained.
Assessments are continuously being undertaken in all educational
settings. Decisions are made about content and specific objectives, nature
of students and faculty, faculty morale and satisfaction, and the extent to
which student performances meet standards. Payne (2003) describes a
typical example of how assessments can be a basis for decision making:
1. The teacher reviews a work sample, showing some column
additions are in error and there are frequent carrying errors.
2. He / She assigns simple problems on proceeding pages, with
consistent addition errors in some number combinations, as
well as repeated errors in carrying from one column to
another.
3. He / She give instruction through verbal explanation,
demonstration, trial and practice.
4. The student becomes a successful in calculations made in
each preparation step after direct teacher instruction.
5. The student returns to the original pages, completes it
correctly, and is monitored closely when new processes are
introduced.
From the foregoing example, it can be seen that there is a very close
association between assessment and instruction. The data useful in
decision – making may be related from informal assessments, such as
observations from interactions or from teacher – made tests.
Informed decision – making in education is very important owing to
the obvious benefits it can bring about (Linn, 1999). Foremost among
these benefits evaluation of feelings of competence in the area of
academic skill and the sense of one’s perception of being able to function
effectively in society is something obligatory. Finally, the affective side of
development is equally important. Personal dimensions, like feelings of self
– worth, being able to adjust to people and cope with various situations
lead to better overall life adjustment.
Purposes of Educational Assessment, Measurement and
Evaluation
Educational assessment, measurement and evaluation serve the
following purposes (Kellough, et al, 1993):
Improvement of Student Learning – Knowing how well students
are performing in class can lead teachers to devise ways and means
of improving student learning.
Identification of Students’ Strengths and Weaknesses – Through
measurement, assessment, and evaluation, teachers can be able to
single out their students’ strengths and weaknesses. Data on these
strengths and weaknesses can serve as bases for undertaking
reinforcement and / or enrichment activities for the students.
Assessment of the Effectiveness of a Particular Teaching
Strategy – Accomplishment of an instructional objective through the
use of a particular teaching strategy is important to teachers.
Competent teachers continuously evaluate their choice of strategies
on the basis of student achievement.
Appraisal of the Effectiveness of the Curriculum – Through
educational measurement, assessment, and evaluation, various
aspects of the curriculum are continuously evaluated by curriculum
committees on the basis of the results of achievement test results.
Assessment and Improvement of Teaching Effectiveness –
Results of testing are used as basis for determining teaching
effectiveness. Knowledge of the results of testing can provide school
administrators inputs on the instructional competence of teachers
under their charge. Thus, intervention programs to improve teaching
effectiveness can be undertaken by the principals or even
supervisors on account of the results of educational measurement
and evaluation.
Communication with and Involvement of Parents in Their
Children’s Learning – Results of educational measurement,
assessment, and evaluation are utilized by the school teachers in
communicating to parents their children’s learning difficulties,
knowing how well their children are performing academically can
lead them to forge a partnership with the school in improving and
enhancing student learning.
Types of Classroom Assessment
There are three general types of classroom assessment teachers
are engaged in (Airisian, 1994). These are as follows: official; sizing up;
and instructional.
Official assessment is undertaken by teachers to carry out the
bureaucratic aspects of teaching, such as giving students grades at the
end of each marking period. This type of assessment can be done through
formal tests, term papers, reports, quizzes, and assignments. Evidence
sought by teachers in official assessment is mainly cognitive.
Sizing up assessment, however, is done to provide teachers
information regarding the students’ social, academic, and behavioral
characteristics at the beginning of each school year. Information gathered
by teachers, in this type of assessment, provides a personality profile of
each of these students to boost instruction and foster communication and
cooperation in the classroom.
Instructional assessment is utilized in planning instructional
delivery and monitoring the progress of teaching and learning. It is
normally done daily throughout the school year. It, therefore, includes
decisions on lessons to teach, teaching strategy to employ, and
instructional materials and resources to use in the classroom.
Methods of Collecting Assessment Data
Airisian (1994) identified two basic methods of collecting information
about the learners and instruction, namely: paper and pencil; and
observational techniques.
When the learners put down into writing their answers to questions
and problems, the assessment method is pre – and – pencil technique.
Paper and pencil evidence that teachers are able to gather includes tests
taken by students, maps drawn, written reports; completed assignments
and practice exercises. By examining these evidences, teachers are able
to gather information about their students’ progress.
There are two general types of paper and pencil techniques: supply
and selection. Supply type requires the student to produce or construct an
answer to the question. Book report, essay question, class project, and
journal entry are examples of the supply – type of paper and pencil
technique.
Selection type, on the other hand, requires the student to choose
the correct answer from a list of choices or options. Multiple choice,
matching test, alternate response test are technique as the students
answer questions by simply choosing an answer from a set of options
provided.
The second method teachers utilize is observation. This method
involves watching the students as they perform certain learning tasks like
speaking, reading, performing laboratory investigation and participating in
group activities.
Sources of Evaluate Information
To be able to make correct judgments about students’ performance,
there is a need for teachers to gather accurate information. Thus, teachers
have to be familiar with the different sources of evaluative information.
Cumulative Record. It holds all the information collected on
students over the years. It is usually stored in the principal’s office or
guidance office and contains such things as vital statistics, academic
records, conference information, health records, family data and scores on
tests of aptitude, intelligence, and achievement. It may also contain
anecdotal and behavioral comments from previous teachers. These
comments are useful in understanding the causes of the students’
academic and behavioral problems.
Personal Contact. It refers to the teacher’s daily interactions with
his / her students. A teacher’s observation on students as he / she works
and relaxes, as well as daily conversation with them can provide valuable
clues that will be or great help in planning instruction. Observing students
not only tells the teacher how well students are doing but allows him / her
to provide them with immediate feedback. Observational information is
available in the classroom as the teacher watches and listens to students
in various situations. Examples of these situations are as follows:
1. Oral Reading. Can the student read well or not?
2. Answering Questions. Does the student understand concepts?
3. Following Directions. Does the student follow specified
instruction?
4. Seatwork. Does the student stay on – task?
5. Interest in the Subject. Does the student participate actively in
learning activities?
6. Using Instructional Materials. Does the student use the
material correctly?
Through accurate observations, a teacher can determine whether
the students are ready for next lesson. He / She can also identify those
students who are in need of special assistance.
Analysis. Through a teacher’s analysis of the errors committed by
students, he / she can be provided with much information about their
attitude and achievement. Analysis can take place either during or
following instruction. Through analysis, the teacher will be able to identify
immediately students’ learning difficulties. Thus, teachers have to file
samples of students’ work for discussion during parent – teacher
conferences.
Open – ended Themes and Diaries. One technique that can be
used to provide information about students is by asking them to write about
their lives in and out of the school. Some questions that students can be
asked to react to are as follows:
1. What things do you like and dislike about school?
2. What do you want to become when you grow up?
3. What things have you accomplished which you are proud of?
4. What subjects do you find interesting? Uninteresting?
5. How do you feel about your classmates?
The use of diaries is another method for obtaining data for
evaluative purposes. A diary can consist of a record, written every 3 or 4
days, in which students write about their ideas, concerns, and feelings. An
analysis of students’ diaries often gives valuable evaluative information.
Conferences. Conferences with parents and the students’ previous
teachers can also provide evaluative information. Parents often have
information which can explain why students are experiencing academic
problems. Previous teachers can also describe students’ difficulties and
the techniques they employed in correcting them. Guidance counselors
can also be an excellent source of information. They can also shed light on
test results and personality factors, which might affect students’
performance in class.
Testing. Through testing, teachers can measure students’ cognitive
achievement, as well their attitudes, values, feelings, and motor skills. It is
probably the most common measurement technique employed by teachers
in the classroom.
Types of Evaluation
Teachers need continuous feedback in order to plan, monitor, and
evaluate their instruction. Obtaining this feedback may take any of the
following types: diagnostic, formative, and summative.
Diagnostic evaluation is normally undertaken before instruction, in
order to assess students’ prior knowledge of a particular topic or lesson. Its
purpose is to anticipate potential learning problems and group / place
students in the proper course or unit of study. Placement of some
elementary school children in special reading programs based on a
reading comprehension test is an example of this type of evaluation.
Requiring entering college freshmen to enroll in Math Plus based on the
results of their entrance test in Mathematics is another example.
Diagnostic evaluation can also be called pre – assessment, since it
is designed to check the ability levels of the students in some areas so that
instructional starting points can be established. Through this type of
evaluation, teachers can be provided with the valuable information
concerning students’ knowledge, attitudes, and skills when they begin
studying a subject and can be employed as basis for remediation or
special instruction. Diagnostic evaluation can be based on teacher – made
tests, standardized tests or observational techniques.
Formative evaluation is usually administered during the
instructional process to provide feedback to students and teachers and
how well the former are learning the lesson being taught. Results of this
type of evaluation permit teachers modify instruction as needed. Remedial
work is normally done to remedy deficiencies noted and bring the slow
learners to the level of their classmates or peers. Basically, formative
evaluation asks, ”how are my students doing?” It uses pretests, homework,
seatwork, and classroom questions. Results of formative evaluation are
neither recorded, nor graded but are used for modifying or adjusting
instruction.
Summative evaluation is undertaken to determine students’
achievement for grading purposes. Grades provide the teachers the
rationale for passing or failing students, based on a wide range of
accumulated behaviors, skills, and knowledge. Through this type of
evaluation, students’ accomplishments during a particularly marking term
are summarized or summed up. It is frequently based on cognitive
knowledge, as expressed through test scores and written outputs.
Examples of summative evaluation are chapter tests, homework
grades, completed project grades, periodical tests, unit test and
achievement tests.
This type of evaluation answers the question, “how did my students
fare?” Results of summative evaluation can be utilized not only for judging
student achievement but also for judging the effectiveness of the teacher
and the curriculum.
Approaches to Evaluation
According to Escarilla and Gonzales (1990), there are two
approaches to evaluation, namely: norm – referenced and criterion –
referenced.
Non – referenced evaluation is one wherein the performance of a
student in a test is compared with the performance of the other students
who took the same examination. The following are examples of norm –
referenced evaluation:
1. Karl’s score in the periodical examination is below the mean.
2. Cynthia ranked fifth in the unit test in Physics.
3. Rey’s percentile rank in the Math achievement test is 88.
Criterion – referenced evaluation on the other hand, is an
approach to evaluation wherein a student’s performance is compared
against a predetermined or agreed upon standard. Examples of this
approach are as follows:
1. Sid can construct a pie graph with 75% accuracy.
2. Yves scored 7 out of 10 in the spelling test.
3. Lito can encode an article with no more than 5 errors in
spelling.
REFERENCES
Airisian, P.W. (1994). Classroom Assessment, 2nd Ed. New York: McGraw
Hill, Inc.
Bloom, B.S. (1970). The Evaluation of Instruction: Issues and Problems.
New York: Holt, Rinehart & Winston.
Clark, J. & I. Starr (1977). Secondary School Teaching Methods. New
York: Macmillan Publishing Company.
Escarilla, E. R. & E. A. Gonzales (1990). Measurement and Evaluation in
Secondary Schools. Makati: Fund for Assistance to Private
Education (FAPE).
Jaeger, R. M. (1997). Educational Assessment: Trends and Practices.
New York: Holt, Rinehart & Winston.
Kellough, R. D., et al (1993). Middle School Teaching Methods and
Resources, New York: Macmillan Publishing Company.
Payne, D. A. (2003). Measuring and evaluating Educational Outcomes.
New York: Macmillan Publishing Company.
CHAPTER 2
Test and Their Uses in Educational Assessment
The most common important aspect of student evaluation in most
classrooms involves the tests teachers make and administer to their
students (Grondlund & Linn, 1990). Teachers, therefore, need to
understand the different types of tests and their uses in the assessment
and evaluation of the students’ learning. This chapter orients prospective
teachers on tests and their uses in education.
Test Defined
A test is a systematic procedure for measuring an individual’s
behavior (Brown, 1991). This definition implies that it has to be developed
following specific guidelines. It is a formal and systematic way of gathering
information about the learners’ behavior, usually through paper – and –
pencil procedure (Airisian, 1989).
Through testing, teachers can measure students’ acquisition of
knowledge, skills, and values in any learning area in the curriculum. While
testing is the most common measurement technique teachers use in the
classroom, there are certain limitations in their use. As pointed out by
Moore (1992), tests cannot measure student motivation, physical
limitations and even environmental factors. The foregoing indicates that
testing is only one of students’ learning and achievement.
Uses of Tests
Tests serve a lot of functions for school administrators, supervisors,
teachers, and parents, as well (Arends, 1994; Escarilla & Gonzales, 1990).
School administrators utilize test results for making decisions
regarding the promotion or retention of students; improvement or
enrichment of the curriculum; and conduct of staff development programs
for teachers. Through test results, school administrators can also have a
clear picture of the extent to which the objectives of the school’s
instructional program is achieved.
Supervisors use test results in discovering learning areas needing
special attention and identifying teachers’ weaknesses and learning
competencies not mastered by the students. Test results can also provide
supervisors baseline data on curriculum revision.
Teachers, on the other hand, utilize tests for numerous purposes.
Through testing, teachers are able to – gather information about the
effectiveness of instruction; give feedback to students about their progress;
and assign grades.
Parents, too, derive benefits from tests administered to their
children. Through test scores, they are able to determine how well their
sons and daughters are faring in school and how well the school is doing
its share in educating their children.
Types of Tests
Numerous types of tests are used in school. There are different
ways of categorizing tests, namely: ease of quantification of response ,
mode of preparation, mode of administration, test constructor, mode of
interpreting results, and nature of response (Manarang & Manarang, 1983;
Louisell & Descamps, 1992).
As to mode of response, test can be oral, written or performance.
1. Oral Test – It is a test wherein the test taker gives his answer
orally.
2. Written Test – It is a test where answers to questions are
written by the test taker.
3. Performance Test – It is one in which the test taker creates
an answer or a product that demonstrates his knowledge or
skill, as in cooking and baking.
As to ease quantification of response, tests can either be
objective or subjective.
1. Objective Test – It is a paper and pencil test wherein
students’ answers can be compared and quantified to yield a
numerical score. This is because it requires convergent or
specific response.
2. Subjective Test – It is a paper – and – pencil test which is not
easily quantified as students are given the freedom to write
their answer to a question, such as an essay test. Thus, the
answer to this type of test is divergent.
As to mode of administration, tests can either be individual or
group.
1. Individual Test – It is a test administered to one student at a
time.
2. Group Test – It is one administered to a group of students
simultaneously.
As to test constructor, tests can be classified into standardized and
unstandardized.
1. Standardized Test – It is a test prepared by an expert or
specialist. This type of test samples behavior under uniform
procedures. Questions are administered to students with the
same directions and time limits. Results in this kind of test are
scored following a detailed procedure based on its manual
and interpreted based on specified norms or standards.
2. Unstandardized Test – It is one prepared by teachers for use
in the classroom, with no established norms of scoring and
interpretation of results. it is constructed by a classroom
teacher to meet a particular need.
As to the mode of interpreting results, tests can either be norm –
referenced or criterion – referenced.
1. Norm – referenced Test – It is a test that evaluates a
student’s performance by comparing it to the performance of a
group of students on the same test.
2. Criterion – referenced Test – It is a test that measures a
student’s performance against an agreed upon or pre –
established level of performance.
As to the nature of the answer, tests can be categorized into the
following types: personality, intelligence, aptitude, achievement,
summative, diagnostic, formative, socio – metric, and trade.
1. Personality Test – It is a test designed for assessing some
aspects of an individual’s personality. Some areas tested in
this kind of test include the following: emotional and social
adjustment; dominance and submission; value orientation;
disposition; emotional stability; frustration level; and degree of
introversion or extroversion.
2. Intelligence Test – It is a test that measures the mental ability
of an individual.
3. Aptitude Test – it is a test designed for the purpose of
predicting the likelihood of an individual’s success in a
learning area or field of endeavor.
4. Achievement Test – It is a test given to students to determine
what a student has learned from formal instruction in school.
5. Summative Test – It is a test given at the end of instruction to
determine students’ learning and assign grades.
6. Diagnostic Test – It is a test administered to students to
identify their specific strengths and weaknesses in past and
present learning.
7. Formative Test – It is a test given to improve teaching and
learning while it is going on. A test given after teaching the
lesson for the day is an example of this type of test.
8. Socio – metric Test – It is a test used in discovering learners’
likes and dislikes, preferences, and their social acceptance, as
well as social relationships existing in a group.
9. Trade Test – It is a test designed to measure an individual’s
skill or competence in an occupation or vocation.
CHAPTER 3
Assessment of Learning in the Cognitive Domain
Learning and achievement in the cognitive domain are usually
measured in school through the use of paper – and – pencil tests (Oliva,
1988). Teachers have to measure students’ achievement in all the levels of
the cognitive domain. Thus, they need to cognizant with the procedures in
the development of the different types of paper – and – pencil tests. This
chapter is focused on acquainting prospective teachers with methods and
techniques of measuring learning in the cognitive domain.
Behaviors Measured and Assessed in the Cognitive Domain
There are three domains of behavior measured and assessed in
schools. The most commonly assessed, however, is the cognitive domain.
The cognitive domain deals with the recall or recognition of knowledge and
the development into six hierarchical levels, namely: knowledge,
comprehension, application, analysis, synthesis, and evaluation.
1. Knowledge Level: behaviors related to recognizing and
remembering facts, concepts, and other important data on any
topic or subject.
2. Comprehension Level: behaviors associated with the
clarification and articulation of the main idea of what students are
learning.
3. Application Level: behaviors that have something to do with
problem – solving and expression, which require students to
apply what they have learned to other situations or cases in their
lives.
4. Analysis Level: behaviors that require students to think critically,
such as looking for motives, assumptions, cause – effect
relationship, differences and similarities, hypotheses, and
conclusions.
5. Synthesis Level: behaviors that call for creative thinking, such
as combining elements in new ways, planning original
experiments, creating original solutions to a problem and building
models.
6. Evaluation Level: behaviors that necessitate judging the value
or worth of a person, object, or idea or giving opinion on an issue.
Preparing for Assessment of Cognitive Learning
Prior to the construction of paper – and – pencil test to be use in the
measurement of cognitive learning, teachers have to answer the following
questions (Airisian, 1994): What should be tested; what emphasis to give
to the various objectives taught; whether to administer a paper and pencil
test or observe each student directly; how long the test should take; and
how best to prepare students for testing.
What Should Be Tested. Identification of the information, skills, and
behaviors to be tested is the first important decision that a teacher has to
take. Knowledge of what shall be tested will enable a teacher to develop
an appropriate test for the purpose. The basic rule to remember, however,
is that testing emphasis should parallel teaching emphasis.
How to Gather Information About What to Test. A teacher has to
decide whether he should give a paper and pencil test or simply gather
information through observation. Should he decide to use a paper – and –
pencil test, if he decides to use observation of students’ performance of the
targeted skill, then he has to develop appropriate devices to use in
recording his observations. Decisions on how to gather information about
what to test depends on the objective or the nature or behavior to be
tested.
How Long the Test Should Be. The answer to the aforementioned
question depends on the following factors: age and attention span of the
students; and type of questions to be used.
How Best to Prepare Students for Testing. To prepare students
for teaching, Airisian (1994) recommends the following measures; (1)
providing learners with good instruction; (2) reviewing students before
testing; (3) familiarizing students with question formats; (4) scheduling the
test; and (5) providing students information about the test.
Assessing Cognitive Learning
Teacher use two types of tests in assessing student learning in the
cognitive domain: objective test and essay test (Reyes, 2000). An objective
test is a kind of test wherein there is only one answer to each item. On the
other hand, an essay test is one wherein the test taker has the freedom to
respond to a question based on how he feels it should be answered.
Types of Objective Tests
There are generally two types of objective tests: supply type and
selection type (Carey, 1995). In the supply type, the student constructs
his / her own answer to each question. Conversely, the student chooses
the right answer to each item in the selection type of objective test.
Supply types of Objective Tests: The following types of tests fall
under the supply type of test: completion drawing type, completion
statement type, correction type, identification type, simple recall type, and
short explanation type (Ebel & Frisbie, 1998).
Completion Drawing Type – an incomplete drawing is
presented which the student has to complete.
Example: In the following food web, draw arrow lines
indicating which organisms are consumers and
which are producers.
Completion Statement Type – an incomplete sentence is
presented and the student has to complete it by filling in the
blank.
Example: The capital city of the Philippines is
__________________.
Correction Type – a sentence with underlined word or phrase
is presented, which the student has to replace to make it right.
Example: Change the underlined word / phrase to make
each of the following statements correct. Write
your answer on the space before each number.
__________ 1. The theory of evolution was popularized by
Gregor Mendel.
__________ 2. Hydrography is the study of oceans and ocean
currents.
Identification Type – a brief description is presented and the
student has to identify what it is.
Example: To what does each of the following refer? Write
your answer on the blank before each number.
__________ 1. A flat representation of all curved surfaces of
the earth.
__________ 2. The transmission of parents’ characteristics
and traits to their offsprings.
Simple Recall Type – a direct question is presented for the
student to answer using a word or phrase.
Example: What is the product of two negative numbers?
Who is the national hero in the Philippines?
Short Explanation Type – similar to an essay test but
requires a short answer.
Example: Explain in a complete sentence why the
Philippines was not really discovered by
Magellan.
Selection Types of Objective Test. Included in the category of
selection type, grouping type, matching type, multiple choice type,
alternate response type, key list test, and interpreting exercise.
Arrangement Type – Terms or objects are to be arranged by
the students in a specified order.
Example 1: Arrange the following events chronologically by
writing the letters A, B, C, D, E on the spaces
provided.
_______ Glorious Revolution _______ Russian Revolution
_______ American Revolution _______ French Revolution
_______ Puritan Revolution
Example 2: Arrange the following planets according to their
nearness to the sun, by using numbers, 1, 2, 3,
4, 5.
_______ Pluto _______Jupiter _______ Saturn
_______ Venus _______ Mars
Matching Type – A list of numbered items are related to a list
of lettered choices.
Example: Match the country in Column 1 with its capital city in
Column 2. Write letters only.
Column 1 Column 2
________ 1. Philippines a. Washington D. C.
________ 2. Japan b. Jeddah
________ 3. United States c. Jerusalem
________ 4. Great Britain d. Manila
________ 5. Israel e. London
f. Tokyo
g. New York
Multiple Choice Type – this type contains a question,
problem or unfinished sentence followed by several responses.
Example: The study of value is (a) axiology (c) epistemology
(b) logic (d) metaphysics.
Alternative Response Type – A test wherein there are only
two possible answers to the question. The true – false format is a
form of alternative response type. Variations on the true – false
include yes – no, agree – disagree, and right – wrong.
Example: Write True, if the statement is true; False, if it is false.
_________ 1. Lapulapu was the first Asian to repulse European
colonizers in Asia.
_________ 2. Magellan’s expedition of the Philippines led to the
first circumnavigation of the globe.
_________ 3. The early Filipinos were uncivilized before the
Spanish conquest of the archipelago.
_________ 4. The Arabs introduced Islam in Southern
Philippines.
Key List Test – A test wherein the student has to examine
paired concepts based on a specified set of criteria (Olivia, 1998).
Example: Examine the paired items in Column 1 and Column 2.
On the blank before each number, write:
A = If the item in column 1 is an example of the item in column 2;
B = If the item in column 1 is a synonym of the item in column 2;
C = If the item in column 2 is opposite of the item in column 1; and
D = If the item in Columns 1 and 2 are not related in any way.
Column 1 Column 2
_____ 1. capitalism economic system
_____ 2. labor intensive capital intensive
_____ 3. Planned economy command economy
_____ 4. opportunity cost demand and supply
_____ 5. free goods economic goods
Interpretive Exercise – It is a form of a multiple choice type
of test that can assess higher cognitive behaviors. According to
Airisian (1994) and Mitchell (1992), interpretive exercise provides
students some information or data followed by a series of
questions on that information. In responding to the questions in
an interpretive exercise, the students have to analyze, interpret,
or apply the material provided, like a map, excerpt of a story,
passage of a poem, data matrix, table or cartoon.
Example: Examine the data on child labor in Europe during the
period immediately after the Industrial Revolution in
the continent. Answer the questions given below
encircling the letter of your choice.
TABLE 1
Child Labor in the Years Right After the Industrial
Revolution in Europe
1. The employment of child labor was greatly used in
____________.
a. 1750 c. 1770
b. 1760 d. 1780
2. As industrialization became rapid, what year indicated a
sudden increase in the number of child laborers?
a. 1760 c. 1780
b. 1770 d. 1790
3. Labor unions and government policies were responsible
in addressing the problems of child labor. In what year this evident?
a. 1780 c. 1800
1750
1760
1770
1780
1790
1800
1820
1800
3000
5000
3400
1200
600
150
Number of Child LaborersYear
b. 1790 d. 1820
Essay Test
This type of test presents a problem or question and the student is to
compose a response in paragraph form, using his or her own words, and
ideas. There are two forms of the essay test: brief or restricted; and
extended.
Brief or Restricted Essay Test – This form of the essay test
requires a limited amount of writing or requires that a given
problem be solved in a few sentences.
Example: Why did early Filipino revolts fail? Cite and explain
2 reasons.
Extended Essay Test – This form of the essay test requires a
student to present his answer in several paragraphs or pages of
writing. It gives students more freedom to express ideas and
opinions and use synthesizing skills to change knowledge into a
creative idea.
Example: Explain your position on the issue of charter change
in the Philippines.
According to Reyes (2000) and Gay (1985), the essay test is
appropriate to use when learning outcomes cannot be adequately
measured by objective test items. Nevertheless, all levels of cognitive
behaviors can be measured with the use of the essay test as shown below.
Knowledge Level – Explain hoe Siddharta Guatama became
Buddha.
Comprehension Level – What does it mean when a person
had crossed the Rubicon?
Application Level – Cite three instances showing the
application of the Law of Supply and Demand.
Analysis Level – Analyze the annual budget of your college
as to categories of funds, sources of funds, major
expenditures; and needs of your college.
Synthesis Level – Discuss the significance of the People’s
Power Revolution in the restoration of democracy in the
Philippines.
Evaluation Level – Are you in favor of the political platform of
the People’s Reform Party? Justify your answer.
Choosing the type of test depends on the teacher’s purpose and the
amount of time to be spent for the test. As a general rule, teachers must
create specific tests that will allow students to demonstrate targeted
learning competencies.
CHAPTER 4
An Introduction to the Assessment of Learning in the
Psychomotor and Affective Domains
As pointed out in the previous chapter, there are three domains of
learning objectives that teachers have to assess. While it is true that
achievement in the cognitive domain is the one teachers’ measure
frequently, students’ growth in non – cognitive domains of learning should
also be given equal emphasis. This chapter expounds different ways by
which learning in the psychomotor and affective domains can be assessed
and evaluated.
Levels of Learning in the Psychomotor Domain
The psychomotor domain of learning is focused on processes and
skills involving the mind and the body (Eby & Kujawa, 1994). It is the
domain of learning which classifies objectives dealing with physical
movement and coordination (Arends, 1994; Simpson, 1966). Thus,
objectives in the psychomotor domain require significant motor
performance. Playing a musical instrument, singing a song, drawing,
dancing, putting a puzzle together, reading a poem and presenting a
speech are examples of skills developed in the aforementioned domain of
learning.
There are three levels of psychomotor learning: imitation,
manipulation and precision (Gronlund, 1970).
Imitation is the ability to carry out a basic rudiments of a skill
when given directions and under supervision. At this level the
total act is not performed skillfully. Timing and coordination of
the act are not yet refined.
Manipulation is the ability to perform a skill independently.
The entire skill can be performed in sequence. Conscious
effort is no longer needed to perform the skill, but complete
accuracy has not been achieved yet.
Precision is the ability to perform an act accurate, efficiently,
and harmoniously. Complete coordination of the skill has been
acquired. The skill has been internalized to such extent that it
can be performed unconsciously.
Based on the foregoing list of objectives, it can be noted that these
objectives range from simple reflex reactions to complex actions, which
communicate ideas or emotions to others. Moreover, these objectives
serve as a reminder to every teacher that students under his charge have
to learn a variety of skills and be able to think and act in simple and
complex ways.
Measuring the Acquisition of Motor and oral Skills
There are two approaches that teachers can use in measuring the
acquisition of motor and oral skills in the classroom: observation of student
performance and evaluation of student projects (Gay 1990).
Observation of Student Performance is an assessment approach
in which the learner does the desired skill in the presence of the teacher.
For instance, in physical Education class, the teacher can directly observe
how male students dribble and shoot the basketball. In this approach, the
teacher observes the performance of a student, gives feedback, and keeps
a record of his performance, if appropriate.
Observation of student performance can either be holistic or
atomistic (Louisell & Descamps, 1992). Holistic observation is employed
when the teacher gives a score or feedback based on pre – established
prototypes of how an outstanding, average, or deficient performance looks.
Prior to the observation, the teacher describes the different levels of
performance.
A teacher, for example, who required his students to make an oral
report on a research they undertook, describes the factors which go into an
ideal presentation. What the teacher may consider in grading the report,
include the following: knowledge of the topic; organization of the
presentation of the report; enunciation; voice projection; and enthusiasm.
The ideal present has to be described and the teacher has to comment on
each of these factors. A student whose presentation closely matches the
ideal described by the teacher would receive a perfect mark.
The second type of observation that can be utilized is atomistic or
analytic. This type of observation requires that a task analysis be
conducted in order to identify the major subtasks involved in the student
performance. For example, in dribbling the ball, the teacher has to identify
movements necessary to perform the task. Then, he has to develop pa
checklist which enumerates the movements necessary to the performance
of the task. These positions are demonstrated by the teacher. As students
perform the dribbling of the ball, the teacher assigns checkmarks for each
of the various subtasks. After the students’ has performed the specified
action, all checkmarks are considered and an assessment of the
performance is made.
Evaluation of Student Products is another approach that teachers
can use in the assessment of students’ mastery of skills. For example,
projects in different learning areas may be utilized in assessing students’
progress. Student products include drawings, models, construction paper
products, etc.
The same principles involved in holistic and atomistic observations
apply to the evaluation of projects. The teacher has to identify prototypes
representing different levels of performance for a project or do a task
analysis and assign scores by subtasks. In either case, the student has to
inform of the criteria and procedures to be used in the assessment of their
work.
Assessing Performance through Student Portfolios
Portfolio assessment is a new form of assessing students’
performance (Mitchell, 1992). A portfolio is but a collection of the students’
work (Airisian, 1994). It is used in the classroom to gather a series of
students’ performances or products that show their accomplishment and /
or improvement over time. It consists of carefully selected samples of the
students’ work indicating their growth and development in some curricular
goals. The following can be included in a student’s portfolio: representative
pieces of his / her writing; solved math problems; projects and puzzles
completed; artistic creations; videotapes of performance; and even tape
recordings.
Wolf (1989) says that portfolios can be used for the following
purposes:
Providing examples of student performance to parents;
Showing student improvement over time;
Providing a record of students’ typical performances to pass on to
the next year’s teacher;
Identifying areas of the curriculum that need improvement;
Encouraging students to think about what constitutes good
performance in a learning area; and
Grading students.
According to Airisian (1994), there are four steps to consider in
making use of this type of performance assessment. (1) establishing a
clear purpose; (2) setting performance criteria; (3) creating an appropriate
setting; and (4) forming scoring criteria or predetermined rating.
Purpose is very important in carrying out portfolio assessment. Thus,
there is a need to determine beforehand the objective of the assessment
and the guidelines for student products that will be included in the portfolio
prior to compilation.
While teachers need to collaborate with their colleagues in setting a
common criterion, it is crucial they involve their students in setting
standards or performance. This will enable the latter to claim ownership
over their performance.
Portfolio assessment also needs to consider the setting in which
students’ performance will be gathered. Shall it be a written portfolio? Shall
it be a portfolio of oral or physical performances, science experiments,
artistic productions and the like? Setting has to be looked into since
arrangements have to be made on how desired performance can be
properly collected.
Lastly, scoring methods and judging students’ performance are
required in portfolio assessment. Scoring students’ portfolio, however, is
time consuming as a series of documents and performances has to be
scrutinized and summarized. Rating scales, anecdotal records, and
checklists can be used in scoring students’ portfolios. The content of a
portfolio, however, can be reported in the form of a narrative.
Tools for Measuring Acquisition of Skills
As pointed out previously, observation of student performance and
evaluation of student products are ways by which teachers can measure
the students’ acquisition of motor and oral skills. To overcome the problem
relating to validity and reliability, teachers can use rating scales, checklists
or other written guides to help them come up with unbiased or objective
observations of student performance.
Rating scale is nothing but a series of categories that is arranged in
orders of quality. It can be helpful in judging skills, products, and
procedures. According to Reyes (2000), there are three steps to follow in
constructing a rating scale.
Identify qualities of the product to be assessed. Create a scale
for each quality or performance aspect.
Arrange the scales either from positive or negative or vice –
versa.
Write directions for accomplishing the rating scale.
Following is an example of a rating scale for judging a student
teacher presentation of a lesson.
Rating Scale for Lesson Presentation
Student Teacher ___________________________ Date ______________
Subject _____________________________________________________
Rate the student teacher on each of the skill areas specified below.
Use the following code: 5 = Outstanding; 4 = Very satisfactory; 3 =
Satisfactory; 2 = Fair; 1 = Needs improvement. Encircle the number
corresponding to your rating.
5 4 3 2 1 Audience contact
5 4 3 2 1 Enthusiasm
5 4 3 2 1 Speech quality and delivery
5 4 3 2 1 Involvement of the audience
5 4 3 2 1 Use of non – verbal communication
5 4 3 2 1 Use of questions
5 4 3 2 1 Directions and refocusing
5 4 3 2 1 Use of reinforcement
5 4 3 2 1 Use of teaching aids and instructionalmaterials
A checklist differs from a rating scale as it indicates the presence or
absence of specified characteristics. It is basically a list of criteria upon
which a student’s performance or end product is to be judged. The
checklist is used by simply checking off the criteria items that have been
met.
Response on a checklist varies. It can be a simple check mark
indicating that an action took place. For instance, a checklist for observing
student participation in the conduct of a group experiment may appear like
this:
1. Displays interest in the experiment.
2. Helps in setting up the experiment.
3. Participates in the actual conduct of the experiment.
4. Makes worthwhile suggestions.
The rater would simply check the items occurred during the conduct
of the group experiment.
Another type of checklist requires a yes or no response. The yes is
checked when the action is done satisfactorily; the no is checked when the
action is done unsatisfactorily. Below is an example of this type of
checklist.
Performance Checklist for a Speech Class
Name ___________________________________ Date ______________
Click Yes or No as to whether the specified criterion is met.
Did the student: YES ON
1. Use correct grammar? _______________ ______________
2. Make clear presentation? _______________ ______________
3. Stimulate interest? _______________ ______________
4. Use clear direction? _______________ ______________
5. Demonstrate poise? _______________ ______________
6. Manifest enthusiasm? _______________ ______________
7. Use appropriate _______________ ______________voice projection?
Levels of Learning in the Affective Domain
Objectives in the affective domain are concerned with emotional
development. Thus, affective domain deals with attitudes, feelings, and
emotions. Learning intent in this domain of learning is organized according
to the degree of internalization. Kratwhol and his colleagues (1964)
identified four levels of learning in the affective domain.
Receiving involves being aware of and being willing to freely attend
to a stimulus.
Responding involves active participation. It involves not only freely
attending to a stimulus but also voluntarily reacting to it in some way.
It requires physical, active behavior.
Valuing refers to voluntarily giving worth to an object, phenomenon
or stimulus. Behaviors at this level reflect a belief, appreciation, or
attitude.
Commitment involves building an internally consistent value system
and freely living by it. A set of criteria is established and applied in
making choices.
Evaluating Affective Learning
Learning in the affective domain is difficult and sometimes
impossible to assess. Attitudes, values and feelings can be intentionally
concealed. This is because learners have the right not to show their
personal feelings and beliefs, if they choose to do. Although the
achievement of objectives in the affective domain are important in the
educational system, they cannot be measured or observed like objectives
in the cognitive and psychomotor domains.
Teachers attempt evaluating affective outcomes when they
encourage students to express feelings, attitudes, and values about topics
discussed in class. They can observe students and may find evidence of
some affective learning.
Although, it is difficult to assess learning in the affective domain,
there are some tools that teachers can use in assessing learning in this
area. Some of these tools are the following: attitude scale; questionnaire;
simple projective techniques; and self – expression techniques (Escarilla &
Gonzales, 1990; Ahmann & Glock, 1991).
Attitude Scale is a form of rating scale containing statements
designed to gauge students’ feelings on an attitude or behavior. An
example of an attitude scale is shown below.
An Attitude Scale for Determining Interest in Mathematics
Name __________________________________ Date _______________
Each of the statements below expresses a feeling toward
mathematics. Rate each statement on the extent to which you agree. Use
the following response code: SA = Strongly Agree; U = Uncertain; D =
Disagree; SD = Strongly Disagree.
1. I enjoy my assignments in Mathematics.
2. The book we are using in the subject is interesting.
3. The lessons and activities in the subject challenge me to
give my best.
4. I do not find exercises during our lesson boring.
5. Mathematical problems encourage me to think critically.
6. I feel at ease during recitation and board work.
7. My grade in the subject is commensurate to the effort I
exert.
8. My teacher makes the lesson easy to understand
9. I would like to spend more time in this subject.
10. I like the way our teacher presents the steps in solving
mathematical problems.
Response to the items is based on the response code provided in
the attitude scale. A value ranging from 1 to 5 is assigned to the options
provided. The value of of 5 is usually assigned to the option “strongly
agree” and 1 to the option “strongly disagree.” When a statement is
negative, however, the assigned values are usually reversed. The
composite score is determined by adding the scale values and dividing it
by the number of statements or items.
Questionnaire can also be used in evaluating attitudes, feelings,
and opinions. It requires students to examine themselves and react to a
series of statements about their attitudes, feelings, and opinions. The
response style for a questionnaire can take any of the following forms:
checklist type, semantic differential, and likert scale
The Checklist type of response provides the students a list of
adjectives for describing or evaluating something and requires them to
check those that apply. For example, a checklist questionnaire on
students’ attitudes in a science class may include the following:
This class is ________________ boring.
________________ exciting.
________________ interesting.
________________ unpleasant.
________________ highly informative.
I find Science ________________ fun.
________________ interesting.
________________ very tiring.
________________ difficult.
________________ easy.
The scoring of this type of test is simple. Subtract the number of
negative statements checked from the number of positive statements
checked.
Semantic differential is another type of response on a
questionnaire. It is usually a five – point scale showing polar or opposite
objectives. It is designed so that attitudes, feelings, and opinions can be
measured by degrees from very favorable to very unfavorable. Given
below is an example of a questionnaire employing the aforementioned
response type.
Working with my group members is:
Interesting _____ : _____ : _____ : _____ : _____ Boring
Challenging _____ : _____ : _____ : _____ : _____ Difficult
Fulfilling _____ : _____ : _____ : _____ : _____ Frustrating
The composite score on the total questionnaire is determined by
averaging the scale values given to the items included in the
questionnaire.
Likert scale is one of the frequently used styles of response in
attitude measurement. It is oftentimes a five – point scale links the options
“strongly agree” and “strongly disagree”. An example of this kind of
response is shown below.
A Likert Scale for Assessing Students’ Attitude Towards
Leadership Qualities of Student Leaders
Name ____________________________________ Date _____________
Read each statement carefully. Decide whether you agree or
disagree with each of them. Use the following response code: 5 = Strongly
disagree; 4 = Agree; 3 = Undecided; 2 = Disagree; 1 = Strongly Disagree.
Write your response on the blank before each item.
Student leaders:
1. Have to work for the benefit of the students.
2. Should set example of good behavior to the
members of the organization.
3. Need to help the school in implementing campus
rules and regulations.
4. Have to project a good image of the school in the
community.
5. Must speak constructively of the school’s teacher
and administrators.
Scoring of a Likert scale is simlar to the scoring of an attitude scale
earlier presented in this chapter.
Simple projective techniques are usually used when a teacher
wants to probe deeper into the student’s feelings and attitudes. Escarilla
and Gonzales (1990) say that there are three types of simple projective
techniques that can 1be used in the classroom, namely: word association,
unfinished sentences, and unfinished story.
In word association, the student is given a word and asked to
mention what comes to his / her mind upon hearing it. For example, what
comes to your mind upon hearing the word corruption?
In an unfinished sentence, the students are presented partial
sentences and are asked to complete them with words that best express
their feeling, for instance:
Given the chance to choose, I _____________________________.
I am happy when _______________________________________.
My greatest failure in life was ______________________________.
In an unfinished story, a story with no ending is deliberately
presented to the students, which they have to finish or complete. Through
this technique, the teacher will be able to sense students’ worries,
problems, and concerns.
Another way by which affective learning can be assessed is through
the use of self – expression techniques. Through these techniques,
students are provided the opportunity to express their emotions and views
about issues, themselves, and others. Self – expression techniques may
take any of the following forms: log book of daily routines or activities,
diaries, essays and other written compositions or themes, and
autobiographies.
CHAPTER REVIEW
1. What is meant by psychomotor learning? What are the levels of
learning under the psychomotor domain? Explain each.
2. What are the two general approaches in measuring the acquisition
of motor and oral skills? Differentiate each.
3. What are the guidelines to observe in undertaking atomistic and
holistic observation?
4. What is portfolio assessment? What are the advantages of using this
type of assessment in evaluating student performance and student
products?
5. What are the guidelines to observe in using portfolio assessment in
the classroom?
6. What are the tools teachers can use in measuring students’
acquisition of motor and oral skills? Briefly define each.
7. What do we mean by affective learning? What are the different
levels of affective learning? Describe each briefly.
8. What are the techniques teachers can employ in evaluating affective
learning? Discuss each very briefly.
CHAPTER 5
Constructing Objective Paper – and – Pencil Tests
Constructing paper – and – pencil test is a professional skill.
Becoming proficient at it takes study, and practice. Owing to the
recognized importance of a testing program, a prospective teacher has to
assume this task seriously and responsibly. He / She needs to be familiar
with the different types of test items and how best to write them. This
chapter seeks to equip prospective teachers with the skill in constructing
objective paper – and – pencil tests.
General Principles of Testing
Ebel and Frisbie (1999) listed five basic principles that should guide
teachers in measuring learning and in constructing their own test. These
principles are discussed below.
Measure all instructional objectives. The test a teacher writes
should be congruent with all the learning objectives focused in class.
Cover all learning tasks. A good test is not focused only on one
type of objective. It must be truly representative of all targeted
learning outcomes.
Use appropriate test items. Test items utilized by a teacher have to
be in consonance with the learning objectives to be measured.
Make test valid and reliable. Teachers have to see to it that the
test they construct measures what it purports to measure. Moreover,
they need to ensure that the test will yield consistent results for the
students taking it for the second time.
Use test to improve learning. Test scores obtained by the students
can serve as springboards for the teachers to re-teach concepts and
skills that the former have not mastered.
Attributes of a Good Test as an Assessment Tool
A good test must possess the following attributes or qualities:
validity; reliability; objectivity; scorability; administrability; relevance;
balance; efficiency; diffculty; discrimination; and fairness (Sparzo, 1990;
Reyes 2000; Manarang and Manarang, 1993; Medina; 2002).
Validity – It is the degree to which a test measures what it
seeks to measure. To determine whether a test a teacher
constructed is valid or not, he / she has to answer the
following questions:
1. Does the test adequately sample the intended content?
2. Does it test the behaviors / skills important to the
content being tested?
3. Does it test all the instructional objectives of the content
take up in class?
Reliability – It is the accuracy with which a test consistently
measures that which it does measure. A test, therefore, is
reliable if it produces similar results when used repeatedly. A
test may be reliable but not necessarily valid. On the other
hand, a valid test is always a reliable one.
Objectivity – It is the extent to which personal biases or
subjective judgment of the test scorer is eliminated in checking
the student responses to the test items, as there is only one
correct answer for each question. For a test to be considered
objective, experts must agree on the right of the best answer.
Thus, objectivity is a characteristic of the scoring of the test
and not of the form of the test questions.
Scorability – It is easy to score or check as answer key and
answer sheet are provided.
Administrability – It is easy to administer as clear and simple
instructions are provided to students, proctors, and scorers.
Relevance – It is the correspondence between the behavior
required to respond correctly to a test item and the purpose or
objective in writing the item. The test item should be directly
related to the course objectives and actual instruction. When
used in relation to educational assessment, relevance is
considered a major contributor to test validity.
Balance – Balance in a test refers to the degree to which the
proportion of items testing particular outcomes corresponds to
the deal test. The framework of the test is outlined by a table
of specifications.
Efficiency – It refers to the number of meaningful responses
per unit of time. Compromise has to be made the available
time for testing, scoring, and relevance.
Difficulty – The test items should be appropriate in difficulty
level to the group being tested. In general, for a norm –
referenced test, a reliable test is one in which each item is
passed by half of the students. For a criterion – referenced
test, difficulty can be judged relative to the percentage passing
before and after instruction. Difficulty will indefinitely be based
on the skill and knowledge measured and student’s ability.
Discrimination – For a norm – referenced, the ability of an
item to discriminate is generally indexed by the difference
between the proportion of good and poor students who
respond correctly. For a criterion – referenced test,
discrimination is usually associated with pretest and posttest
differences of the ability of the test or item to distinguish
competent from less competent students.
Fairness – To ensure fairness, the teacher should construct
and administer the test in manner that allows students an
equal chance to demonstrate their knowledge or skills.
Steps in Constructing Classroom Tests
Constructing classroom tests is a skill. As such, there are steps that
a teacher has to follow (Reyes, 2000). These steps are outlined and
discussed below.
Identification of instructional objectives and learning
outcomes. This is the first step a teacher has to undertake
when constructing classroom tests. He / She has to identify
instructional objectives and learning outcomes, which will
serve as his / her guide in writing test items.
Listing of the topics to be covered by the Test. After
identifying the instructional objectives and learning outcome, a
teacher needs to outline the topics to be included in the test.
Preparation of Table of Specification (TOS). The table of
specifications is a two – way table showing the content
coverage of the test and the objectives to be tested. It can
serve as a blueprint in writing the test items later.
Selection of the Appropriate Types of Tests. Based on the
TOS, the teacher has to select test types that will enable him /
her to measure the instructional objectives in the most
effective way. Choice of test type depends on what shall be
measured.
Writing Test Items. After determining the type of test to use,
the teacher proceeds to write the suitable test items.
Sequencing the Items. After constructing the test items, the
teacher has to arrange them based on difficulty. As a general
rule items have to be sequenced from the easiest to the most
difficult for psychology reason.
Writing the Directions or Instructions. After sequencing
items, the teacher has to write clear and simple directions,
which the students will follow in answering the test questions.
Preparations of the Answer Sheet and Scoring Key. To
facilitate checking of students’ answers, the teacher has to
provide answer sheets and prepare a scoring key in advance.
Preparing the Table of Specifications (TOS)
As already mentioned the table of specifications is the teacher’s
blueprint in constructing a test for classroom use. According to Arends
(2001), the TOS is valuable to teachers for two reasons. First, it helps
teachers decide on what to include and leave out in a test. Second, it helps
them determine how much weight to give for each topic covered and
objective to be tested.
There are steps to observe in preparing a table of test specifications.
1. List down the topics covered for inclusion in the test.
2. Determine the objectives to be assessed by the test.
3. Specify the number of days / hours spent for teaching a
particular topic.
4. Determine percentage allocation of test items for each of the
topic covered. The formula to be applied is as follows:
% for a Topic = Total number of days / hours spent
divided by the total number of days / hours spent
teaching the topic.
Example: Mrs. Sid Garcia utilized 10 hours for teaching the
unit on Pre – Spanish Philippines. She spent 2
hours in teaching the topic, “Early Filipinos and
their Society.” What percentage of test items
should she allocate for the aforementioned topic?
Solution: (100) = 20%
5. Determine the number of items to construct for each topic.
This can be done by multiplying the percentage allocation for
each topic by the total number of items to be constructed.
Example: Mrs. Sid Garcia decided to prepare a 50 – item test
on the unit, “Pre – Spanish Philippines.” How many
items should she write for the topic mentioned in
step number 4?
Solution: 50 items x 0.20 (20%) = 10 items
24
6. Distribute the number of items to the objectives to be tested.
The number of items allocated for each objective depends on
the degree of importance attached by the teacher to it.
After going through the six steps, the teacher has to write the TOS in
a grid or matrix, as shown below.
Table of Specification for a 50 – Item Test in Economics
Topic / Objective Knowledge Comprehension Application Analysis Total
5
2
6
7
10
10
15
15
5020
1
3
3
3
1010
2
2
3
3
10
2
3
3
2
Total
The Nature ofEconomics
EconomicsSystems
Law of Demand& Supply
Price Elasticity of Demands &
Supply
General Guidelines in Writing Test Items
Airisian (1994) identified five basic guidelines in writing test items.
These guidelines are as follows:
1. Avoid wording that is ambiguous and confusing.
2. Use appropriate vocabulary and sentence structure.
3. Keep questions short and to the point.
4. Write items that have one correct answer.
5. Do not provide clues to the answer.
Criteria for Providing Test Directions
Test directions are very important in any written test as the inability
of the test taker to understand them affects the validity of a test. Thus,
direction should be complete, clear and concise. The students must be
aware of what is expected of them. The method of answering has to be
kept as simple as possible. Test directions should also contain instructions
on guessing.
The following criteria should be kept in mind when writing directions
for a test (Linn, 1999):
Assume that the examinees and the examiner know nothing at
all about the objective tests.
In writing directions, use a clear, succinct style. Be as explicit
as possible but avoid long drawn – out explanations.
Emphasize the more important directions and key activities
through the use of understanding, italics, or different type size
or style.
Field or pretest the directions with a sample of both
examinees and examiners to identify possible
misunderstandings and inconsistencies and gather
suggestions for improvement.
Keep directions for different forms, subsections or booklets as
uniform as possible.
Where necessary or helpful, give practice items before each
regular section. This is very important when testing young
children or those unfamiliar with the objective tests, or
separate answer sheets.
Writing Multiple – Choice Items
The most widely used form of the test is the multiple – choice item.
This is because of its versatility. It can be used in measuring different kinds
of content and almost any type of cognitive behavior, from factual
knowledge to analysis of complex data. Furthermore, it is easy to score.
A multiple – choice item is composed of a stem, which sets up the
problem and asks a question, followed by a number of alternative
responses. Only one of the alternatives is the correct answer, the other
alternatives are distractors or foils.
The principal goal for a multiple – choice item construction is to write
clear, concise or unambiguous items. Consider the example below.
Poor: The most serious disease in the world is -
(A) Mental illness (C) Heart disease
(B) AIDS (D) Cancer
The correct answer depends on what is meant by “serious.”
Considering that heart disease leads to more deaths, mental illness affects
a number of people, and AIDS is a world – wide problem nowadays, there
are three possible answers. Nevertheless, the question can be reworded
as follows, for example:
Improved: The leading cause of death in the world today is:
(A) Mental illness (C) Heart disease
(B) AIDS (D) Cancer
To be able to write effective multiple – choice items, the following
guidelines should be followed:
1. Each item should be clearly stated, in the form of a
question or an incomplete statement.
2. Do not provide grammatical or contextual clues to the
correct answer. For instance, the use of a before the options
indicates that the answer begins with a vowel.
3. Use language that even the poorest readers will
understand.
4. Write a correct or best answer and several plausible
distractors.
5. Each alternate response should fit the stem in order to
avoid giving clues to its correctness.
6. Refrain from using negatives or double negatives. They
tend to make the items confusing and difficult.
7. Use all of the above and none only when they will
contribute more than another plausible distractor.
8. Do not use items directly from the textbook. Test for
understanding not memorization.
Examine the following multiple – choice items.
Sample 1: A two – way grid summarizing the relationship
between test scores and criterion scores is
sometimes referred to as an:
(A) Correlation coefficient. (C) Probability histogram.
(B) Expectancy table. (D) Bivariate frequencydistribution
Sample 1 is faulty because of the use of article an. This is because
this article can lead the student to the correct answer, which is B.
Improved: Two – way grids summarizing test – criterion
relationships are sometimes called:
(A) Correlation coefficient. (C) Probability histogram.
(B) Expectancy table. (D) Bivariate frequencydistribution
Sample 2: Which of the following descriptions makes clear the
meaning of the word “electron”?
(A) An electronic gadget (D) A voting machine
(B) Neutral particles (E) The nuclei of atoms
(C) Negative particles
Sample 2 is poorly written owing to its use of distractors that are not
plausible or closely related to each other. Options A and D are not in
anyway associated with the remaining choices or alternatives.
Improved: Which of the following phrases is a description of an
electron?
(A) Neutral particle (D) Related particle
(B) Negative particle (E) Atom nucleus
(C) Neutralized proton
Sample 3: What is the area of a right triangle whose sides
adjacent to the right angle are 4 inches and 3 inches,
respectively?
Sample 3 is also erroneously written as it used the option none of
the above without caution. Why? This is because the answer is 6 inches
and the bright student will definitely choose option D. on the other hand,
the student who solved the problem incorrectly and came up with an
answer not found among the choices, would choose D, thereby getting the
correct answer for the wrong reason. The answer, “none of the above” can
be a good alternative if the correct answer is included among the options
or choices.
Improved: What is the area of a right triangle whose sides
adjacent to the right angles are 4 inches,
respectively?
(A) 6 square inches (D) 13 square inches
(B) 7 square inches (E) none of the above
(C) 12 square inches
Using Multiple Choice Items in Assessing Problem
Solving and Logical Thinking
Schools today are stressing on problem – solving skills owing to
society’s pressures on the former to produce individuals with significant
skills in the aforementioned area. A number of terms have been used to
describe the basic operations of application. Terms like critical thinking and
logical reasoning are used as rubrics under which the basic processes of
problem identification, specification of alternative solutions, evaluation of
consequences, and solution selection are grouped.
Creating problem – solving measures follows a step – by – step
procedures (Haladyna & Downing, 1999).
Step 1. Decide on the principle / s to be tested. Criteria to be
considered should:
Be known principles but the situation in which the principles
are to be applied should be new.
Involve significantly important principles.
Be pertinent to a problem or situation common to all students.
B e within the range of comprehension of all students.
Use only valid and reliable sources from which to draw data
Be interesting to students.
Step 2. Determine the phrasing of the problem situation so as to
require the students in drawing their conclusion to do one
of the following:
Make a prediction.
Choose a course of action.
Offer an explanation for an observed phenomenon.
Criticize a prediction or explanation made by others.
Step 3. Set up the problem situation in which the principle or
principles selected operate. Present the problem to the
class with directions to draw a conclusion or conclusions
and give several supporting reasons foe their answer.
Step 4. Edit the students’ answers, selecting those that are most
representative of their thinking. These will include
conclusions and supporting reasons that are both
acceptable and unacceptable.
Step 5. To the conclusions and reasons obtained from the students,
the teacher now adds any others that he or she feels are
necessary to cover the salient points. The total number of
items should be at least 50% more than is desired in the
final form to allow for elimination of poor items. Some types
of statements that can be used are as follows:
True statements of principles and facts
False statements of principles and facts
Acceptable and unacceptable analogies
Appeal to acceptable or unacceptable authority
Ridicule
Assumption of the conclusion
Teleological explanations
Step 6. Submit tests to colleagues or evaluators for criticisms.
Revise test based on these criticisms.
Step 7. Administer test. Follow with thorough class discussion.
Step 8. Conduct an item analysis.
Step 9. In the light of steps 7 and 8, revise the test.
Following are some examples of problem – solving items.
1. Ulysses wanted to go to the US. But Ulysses’ father, who is quite
strict with him, stated emphatically that he could not go unless he
got a grade of 1.25 in both his freshman English courses,
Ulysses’ father always keep his promises. When summer came,
Ulysses went to the US. If from this information, you conclude
that Ulysses earned 1.25, you must be assuming that:
(A) Ulysses had never obtained a grade of 1.25 before.
(B) Ulysses had no money of his own.
(C) Ulysses’ father was justified in saying what he did.
(D) Ulysses went to the US with his father’s consent.
(E) Ulysses was very sure that he would be able to go.
2. Consider these facts about the coloring of animals:
Plant lice, which live on the stems of green plants,
are green.
The grayish – mottled moth resembles the bark of
the trees on which it lives.
Insects, birds, and mammals that live in the desert
are usually sandy or grey.
Polar bears and other animals living in the Arctic
region are white.
Which one of the following statements do these facts tend to
support?
Animals that prey on others use colors as disguise.
Some animals imitate the color and shape of other natural
objects for protection.
The coloration of animals has to do with their surroundings.
Protective coloration is found more among insects and
birds than among mammals.
Many animals and insects have protective coloring.
Writing Alternate – Response Items
An alternate – response item is one wherein there are only two
possible answers to the stem. The true – false format is an alternate –
response item. Some variations of the basic true – false item include yes –
no, right – wrong, and agree – disagree items.
Alternate – response items seem easy to construct. Writing good
alternate – response items, however, requires skill so as to avoid triviality.
Writing good true – false items is difficult as there are few assertions that
are unambiguously true or false. Besides, they are sensitive to guessing.
Some guidelines to follow in writing alternate – response items are
given below.
1. Avoid the use of negatives.
2. Avoid the use of unfamiliar or esoteric language.
3. Avoid trick items that appear to be true but are false because of
an inconspicuous word or phrase.
4. Use quantitative and precise rather than quantitative language
where possible.
5. Don’t make true items longer than false items.
6. Refrain from creating a pattern of response.
7. Present a similar number of true and false statements.
8. Be sensitive to the use of specific determiners. Words such as
always all, never, and none indicate sweeping generalizations,
which are associated with false items. Conversely, words like
usually and generally are associated with true items.
9. A statement must only have one central idea.
10. Avoid quoting exact statements from the textbooks.
Let us go over examples of the alternate response test items.
Sample 1. The raison d’etre for capital punishment is retribution
according to some peripatetic politicians.
This sample alternate response item is poorly written for it used
words that are very unfamiliar or difficult to understand by an average
student.
Improved: According to some politicians, the justification for the
existence of capital punishment can be traced to the
biblical statement, “an eye for an eye, a tooth for a tooth.”
Sample 2. From time to time efforts have been made to explain the
notion that there may be a cause – and – effect
relationship between arboreal life and primate anatomy.
Sample 2 id again faulty as it was copied exactly between from the
textbook.
Improved: There is a known relationship between primate anatomy
and arboreal life.
Sample 3. Many people voted for Gloria Macapagal – Arroyo in the
last presidential election.
Sample 3 also violates the rule on writing alternate response items
owing to its use of not precise language. As such it is open to numerous
and ambiguous interpretation.
Improved: Gloria Macapagal – Arroyo received more than 50% of
the votes cast in the last presidential election.
Alternate – response items allow teachers to sample a number of
cognitive behaviors in a limited amount of time. Even the scoring of
alternative – response items tends to be simple and easy. Nonetheless,
there are content and learning outcomes that cannot be adequately
measured by alternate – response items, like problem – solving and
complex learning.
Writing Matching Items
Matching items are designed to measure students’ ability to single
out pairs of matching phrases, words or other related facts from separate
lists. It is basically an efficient arrangement of a set multiple – choice items
with all stems, called premises, having the same set of possible alternative
answers. Matching items are appropriate to use in measuring verbal
associative knowledge (Moore, 1997) or knowledge such as inventors and
inventions, titles and authors, or objects and their basic characteristics.
To be able to write good matching items, the following guidelines
have to be considered in the process.
1. Specify the basis for matching the premises with the
responses. Sound testing practice dictates that the directions spell
out the nature of the task. It is unfair and reasonable that the student
should have to read through the stimulus and response list in order
to discern the basis for matching.
2. Be sure that the whole matching exercise is found on one
page only. Splitting the exercise is confusing, distracting, and time –
consuming for the student.
3. Avoid including too many premises on one matching item.
If a matching exercise is too long, the task becomes tedious and the
discrimination too fine.
4. Both the premises and responses in the same general
category or class (e.g. inventors – inventions; authors – literary
works; objects - characteristics).
5. Premises or responses composed of one or two words
should be arranged alphabetically.
Analyze the following matching exercise. Does it follow the
suggestions on writing a matching exercise?
Directions: Match Column A with Column B. You will be given one
point for each correct match.
Column A Column B
1. Execution of Rizal a. 1521
2. Pseudonym of Ricarte b. 1896
3. Hero of Tirad Pass c. Gregorio del Pilar
4. Arrival of the Spaniards in the d. Spolarium
Philippines e. Vibora
5. Masterpiece of Juan Luna
The matching exercise is poorly written as the premises in column A
do not belong to same category. Thus, answers can easily be guessed by
the student. Below is the version of the above matching exercise.
Column A Column B
1. National Hero of the Philippines a. Aguinaldo
2. Hero of Tirad Pass b. Bonifacio
3. Brain of the Katipunan c. Del Pilar
4. Brain of the Philippine Revolution d. Rizal Jacinto
5. The Sublime Paralytic e. Mabini
f. Rizal
Writing Completion Items
Completion items require the students to associate an incomplete
statement with a word of phrase recalled from memory (Ahman, 1991).
Each completion test item contains a blank, which the student must fill in
correctly with one word or a short phrase. Inasmuch as the student is
required to write test items are useful for the testing of specific facts.
Guidelines in constructing completion items are as follows:
1. As a general rule, it is best to use only one blank in a
completion item.
2. The blank should be placed near or at the end of the
sentence.
3. Give clear instructions indicating whether synonyms will be
correct and whether spelling will be a factor in scoring.
4. Be definite enough in the incomplete statement so that only
one correct answer is possible.
5. Avoid using direct statements from the textbooks with a
word or two missing.
6. All blanks for all items should be of equal length and long
enough to accommodate the longest response.
Go over the following sample items:
Directions: On your answer sheet, write the expression that
completes each of the following sentences.
1. __________ is money earned from the use of money.
2. The Philippines is at the _________ and ________ of ________.
Sample 1 is poorly written as a well – written completion item should
have its blank either near or at the end of the sentence. In like manner
Sample 2 is also poorly written as the statement is over – mutilated.
Following are the improved versions of these sample items.
1. Money earned from the use of money is called _________.
2. The Philippines is located in the continent of _________.
Writing Arrangement Items
Arrangement items are used for knowledge of sequence and order.
Arrangement of words alphabetically, of events chronologically, of
numbers according to magnitude, stages in a process, incidents in a story
or novel in a word, are a few cases of this type of test. Some guidelines on
preparing this type of test are as follows:
1. Items to be arranged should belong to one category only.
2. Provide instructions on the rationale for arrangement or
sequencing.
3. Specify the response code students have to use in arranging the
items.
4. Provide sufficient space for the writing to the answer.
Following are examples or arrangement items.
Sample 1 Directions: Arrange the following decimals in the order
of magnitude by placing 1 above the smallest, 2 above
the next, 3 above the third, and 4 above the biggest.
(a) 0.2180 (b) 0.2801 (c) 0.2018 (d) 0.2081
Sample 2 Directions: The following words are arranged at
random. On your answer sheet, rearrange the words so
that they will form a sentence.
much the costs rose
Sample 3 Directions: Each group of letters below spell out words
item if the letters are properly arranged. On your answer
sheet, rearrange the letters in each group to form a
word.
ybo ebul swie atgo
Writing Completion – Drawing Items
As pointed out in the previous chapter, a completion – drawing item
is one wherein an incomplete drawing is presented which the student has
to complete. The following guidelines have to be observed in writing the
aforementioned type of test item:
1. Provide instruction on how the drawing will be completed.
2. Present the drawing to be completed.
Writing Correction Items
The correction type of test item is similar to the completion item,
except that some words or phrases have to be changed to make the
sentence correct. The following have to be considered by the teacher in
writing this kind of test item.
1. Underline or italicize the word of phrase to be corrected in
a sentence.
2. Specify in the instruction where students will write their
correction of the underlined or italicized word or phrase.
3. Write items that measure higher levels of cognitive
behavior.
Following are examples of correction items written following the
guidelines in constructing this kind of item.
Directions: Change the underlined word or phrase to make each of
the following statements correct. Write your answer on
the space before each number.
1. Inflation caused by increased demand is known as
oil – push.
2. Inflation is the phenomenon of falling prices.
3. Expenditure on non – food items increases with
increased income according to Keynes.
4. The additional cost for producting an additional unit
of a product is average cost.
5. The sum of the fixed and variable costs is total
revenue.
Writing Identification Items
An identification type of test item is one wherein an unknown
specimen is to be identified by name or other criterion. In writing this type
of item, teachers have to observe the following guidelines:
1. The direction of the test should indicate clearly what has to
be identified, like persons, instruments, dates, events, steps, in a
process and formulas.
2. Sufficient space has to be provided for the answer to each
item.
3. The question should not be copied verbatim from the
textbook.
Following are examples of identification items written following the
guidelines in constructing this type of test item.
Directions: Following are phrase definitions of terms. Opposite
each number, write the term defined.
1. Weight divided by volume.
2. Degree of Hotness or coldness of a body
3. Changing speed of a moving body
4. Ratio of resistance to effort
Writing Enumeration Items
An enumeration item is one wherein the student has to list down
parts or elements / components of a given concept or topic. Guidelines to
follow in writing type of test items include the following:
1. The exact numbers of expected answers have to be specified.
2. Spaces for the writing of answers have to be provided and should
be of the same length.
Below are examples of enumeration items.
Directions: List down or enumerate what are asked for in each of
the following.
Underlying Causes of World War I and II
1. ______________________ 4. ______________________
2. ______________________ 5. ______________________
3. ______________________
Factors Affecting the Demand for a Product
6. ______________________ 9. ______________________
7. ______________________ 10. _____________________
8. ______________________
Writing Analogy Items
An analogy item consists of a pair of words, which are related to
each other (Calmorin, 1994). This type of item is often used in measuring
the student’s skill in easing association between paired words or concepts.
Examples of this type of item are given below.
Example 1: Black is to white, as peace is to ______________.
(a) Unity (c) Harmony
(b) Discord (d) Concord
Example 2: Bonifacio is for the Philippines, while ______________
is for the United States of America.
(a) Jefferson (c) Madison
(b) Lincoln (d) Washington
The following guidelines have to be considered in constructing
analogy items: (Calmorin, 1994).
1. The pattern of relationship in the first pair of words must be
the same pattern in the second pair.
2. Options must be related to the correct answer.
3. The principle of parallelism has to be observed in writing
the options.
4. More than three options have to be included in each
analogy item to lessen guessing.
5. All items must be grammatically consistent.
Writing Interpretative Test Item
Interpretative test item is often used in testing higher cognitive
behavior. This kind of test item may involve analysis of maps, figures, or
charts or even comprehension of written passages. Airisian (1994)
suggested the following guidelines in writing this kind of test item:
1. The interpretative exercise must be related to the
instruction provided the students.
2. The material to be presented to the students should be
new to the students but similar to what was presented during
instruction.
3. Written passages should be as brief as possible. The
exercise should not be a test of general reading ability.
4. The students have to interpret, apply, analyze and
comprehend in order to answer a given question in the exercise.
Writing Short Explanation Items
This type of item is similar to an essay test but requires a short
response, usually a sentence or two. This type of question is a good
practice for the students in expressing themselves concisely. In writing this
type of test item, the following guidelines have to be considered:
1. Specify in the instruction of the test, the number of
sentences that students can use in answering the question.
2. Make the question brief and to the point for the students not
to be confused.
CHAPTER REVIEW
1. What are the basic principles of testing that teachers must
consider in constructing classroom tests? Explain each briefly.
2. What are the steps or procedures teachers have to follow
in writing their own tests? Explain the importance of each of
them.
3. What is the table of specification (TOS)? How is it
prepared?
4. What are the general guidelines in writing test items?
5. What are the specific guidelines to be observed in writing
the following types of test item:
5.1 Multiple – choice;
5.2 True – false;
5.3 Matching item;
5.4 Arrangement item;
5.5 Identification item;
5.6 Correction item;
5.7 Analogy;
5.8 Interpretative exercise;
5.9 Short explanation item?
CHAPTER 6
Constructing and Scoring Essay Tests
Many new teachers believe that easy tests are the easiest type of
assessment instrument to construct and score. This is not actually true.
The expenditure of time and effort is necessary if essay items and tests
are to yield meaningful information. An essay test permits direct
assessment of the attainment of numerous goals and objectives. In
contrast with the objective test item types, an essay test demands less
construction time per fixed unit of student time but a significant increase in
labor and time for scoring. This chapter exposes you to the problems and
procedures involved in developing, administering, and scoring of essay
tests.
General Types of Essay Items
There are two types of essay items: extended response and
restricted response.
An extended response essay item is one that allows for an in –
depth sampling of a student’s knowledge, thinking processes, and problem
– solving behavior relative to a specific topic. The open – ended nature of
task posed by an instruction such as “discuss essay and objective tests” is
challenging to a student. In order to answer this question correctly, the
student has to recall specific information and organize, evaluate, and write
an intelligible composition. Since it is poorly structured, such a free –
response essay item would tend to yield a variety of answers from the
examinees, both with respect to content and organization, and thus inhibit
reliable grading. The potential ambiguity of an essay task is probably the
single most important contributor to unreliability. In addition, the more
extensive the responses required and the fewer questions a teacher may
ask would definitely result to lower content validity of the test.
On the other hand, a restricted response essay item is one where
the examinee is required to provide limited response based on a specified
criterion for answering the question. It follows, therefore, that a more
restricted response essay item is, in general, preferable. An instruction
such as “discuss the relative advantages and disadvantages of essay tests
with respect to (1) reliability, (2) objectivity, (3) content validity, and (4)
usability” presents a better defined task more likely to lend itself to reliable
scoring and yet allows examinees sufficient opportunity or freedom to
organize and express their ideas creatively.
Learning Outcomes Measured Effectively with Essay
Items
Essay questions are designed to provide the students the
opportunity answer questions in their own words (Orristein, 1990). They
can be used in assessing the student’s skill in analyzing, synthesizing,
evaluating, thinking, logically, solving problems, and hypothesizing.
According to Gronlund and Linn (1990), there are 12 complex learning
outcomes that can be measured effectively with essay items. There are the
abilities to:
Explain cause – effect relationships;
Describe relevant arguments;
Formulate tenable hypothesis;
State necessary assumptions;
Describe the limitations of data;
Explain methods and procedures;
Produce, organize, and express ideas;
Integrate learning in different areas;
Create original forms; and
Evaluate the worth of ideas.
Content versus Expression
It is frequently claimed the essay item allows the student to present
his or her knowledge and understanding and to organize the material in
unique form and style. More often or not, factors like expression, grammar,
spelling and the like are evaluated in relation to content. If the teacher has
attempted to develop students’ skills in expression, and if this learning
outcome is included in the table of specifications, the assessment of such
skills is just right and valid. If these skills are not part of the instructional
program, its not right to assess them. If the score of each essay question
includes an evaluation of the mechanics of English, this should be made
known to the student possible separate scores should be given to content
and expression.
Specific Types of Essay Questions
The following set of essay questions is presented to illustrate how an
essay item is phrased or worded to elicit particular behaviors and levels of
response.
I. Recall
A. Simple Recall
1. What is the chemical formula for sodium bicarbonate?
2. Who wrote the novel, “The Last of the Mothicans?”
B. Selective Recall in which a basis for evaluation
or judgment is suggested
1. Who among the Greek philosophers affected your
thinking as a student?
2. Which method of recycling is the most appropriate to
use at home?
II. Understanding
A. Comparison of two phenomena on a single
designated basis
1. Compare 19th century and present – day Filipino writers
with respect to their involvement in societal affairs.
B. Comparison of two phenomena in general
1. Compare the Philippine Revolution of 1896 with that of
People’s Power Revolution of 1986.
C. Explanation of the use or exact meaning of a
phrase or statement.
1. The legal system of the Mesopotamians was anchored
on the principle of an eye for an eye, a tooth for a tooth.
What dies these principle mean?
D. Summary of a text or some portion of it
1. What is the central idea of communism as an
economic system?
E. Statement of an artist’s purpose in the
selection or organization
1. Why did Hemingway describe in detail the episode in
which Gordon, lying wounded, engages the oncoming
enemy?
III. Application. It should be clearly understood that
whether or not question requires application depends on the
preliminary educational experience. If an analysis has been
taught explicitly, a questionnaire analysis is but a simple recall.
A. Causes or Effects
1. Why did Fascism prevail in Germany and Italy but not
in Great Britain and France?
2. Why does frequent dependence on penicillin for
treatment minor ailment result in its reduced
effectiveness against major invasion of body tissues
by infectious bacteria?
B. Analysis
1. Why was Hamlet torn by conflicting desires?
2. Why was the Propaganda Movement a successful
failure?
C. Statement of Relationship
1. A researcher reported that teaching styles correlates
with student achievement at about 0.75. What does this
correlation mean?
D. Illustrations or examples of principles
1. Identify three examples of the uses of the hammer in a
typical Filipino home.
E. Application of rules or principles in specified
situations
1. Would you weigh more or less on the moon? Why or
why not?
F. Reorganization of facts
1. Some radical Filipino historians assert that the Filipino
revolution against Spain was a revolution from the top
not from below. Using the same observation, what
other conclusion is possible?
IV. Judgment
A. Decision for or against
1. Should members of the Communist Party of the
Philippines be allowed to teach in colleges and
universities? Why or why not?
2. Nature is more influential than the environment in
shaping an individual’s personality. Prove or disprove
this statement.
B. Discussion
1. Trace the events that led to the downfall of the
dictatorial regime of Ferdinand Marcos.
C. Criticism of the adequacy, correctness, or
relevance of a statement
1. Former President Joseph Estrada was convicted for
the case of plunder by the Sandiganbayan. Comment
on the adequacy of the evidence used by the said
tribunal in reaching a decision on the case field
against the former chief executive of the country.
D. Formulation of new questions
1. What should be the focus of researches in education
to explain the incidence of failure among students with
high intelligence quotient?
2. What questions should parents ask their children in
order to determine the reasons why they join
fraternities and sororities?
Following are examples of essay questions based on Bloom’s
Taxonomy of Cognitive Objectives.
A. Knowledge
Explain how Egypt came to be called the gift of the Nile.
B. Comprehension
What is meant when a person says, “I had just crossed the
bridge?”
C. Application
Give at least three examples of how the law of supply operates in
our economy today.
D. Analysis
Explain the causes and effects of the People’s Power Revolution
on the political and social life of the Filipino people.
E. Synthesis
Describe the origin and significance of the celebration of
Christmas the world over.
Sources of Difficulty in the Use of Essay Tests
There are four sources of difficulty that are likely to be encountered
by teachers in the use of essay tests (Greenberg, et al, 1996). Let us over
each of these difficulties and look into ways to minimize them.
Question Construction. The preparation of the essay item is the
most important in the development process. Language usage and word
choice are particularly important during the construction process. The
language dimension is very critical not only because it controls the
comprehension level of the item for examinee, but it also specifies the
parameters of the task. As a test constructor, you need to narrowly specify,
define, and clarify what it is that you want from the examinees. Examine
this sample essay question, “Comment on the significance of Darwin’s
Origin of Species.” The question is quite broad considering that there are
several ways of responding to it. While the intention of the teacher who
wrote this item was to provide opportunity for the students to display their
mastery of the material, students could write for an hour and still not
discover what their teacher really wants them to relative to the
aforementioned topic. An improved version of the same question follows:
“Do you agree with Darwin’s concept of natural selection resulting in the
survival of the fittest and the elimination of the unfit?” Why or why not?
Reader Reliability. A number of studies had been conducted then
and now on the reliability of grading free – response test items. Results of
these researches failed to demonstrate consistently satisfactory agreement
among essay raters (Payne, 2003). Some of the specific contributory
factors in the lack of reader reliability include the following: quality of
composition and penmanship; item readability; racial or ethnic prejudice on
essay scoring and subjectivity of human judgment.
Instrument Reliability. Even if an acceptable level of scoring is
attained, there is no guarantee that measurement of desired behaviors will
be consistent. There remains the issue of the sampling of objectives or
behaviors represented by the test. One way to increase the reliability of an
essay test is to increase the number of questions and restrict the length of
the answers. The more specific and narrowly defined the questions, the
less likely they are to be ambiguous to the examinee. This procedure
should result in more uniform understanding and performance of assigned
and scoring. It also helps ensure better coverage of the domain of
objectives.
Instrument Validity. The number of test questions influences both
the validity and reliability of essay questions. As commonly constructed, an
essay test contains a small number of items; thus, the sampling of desired
behaviors represented in the table of specification will be limited, and the
test suffering from decreased or lowered content validity.
There is another sense in which the validity of an essay test may be
questioned. Theoretically, the essay test allows the examinees to construct
a creative, organized, unique and integrated communication. Nonetheless,
these examinees spend most of their time very frequently in simply
recalling and organizing information, rather than integrating it. The
behavior elicited by the test, then, is not that hoped for by the teacher or
dictated by the table of specifications. Again, one way of handling the
problem is by increasing the number of items on the test.
Guidelines foe Constructing, Evaluating and Using
Essay Tests
Consider the following suggestions for constructing, evaluating and
using essay tests:
Limit the problem that the question poses so that it will have a
clear or definite meaning to most students.
Use simple words which will convey clear meaning to the
students.
Prepare enough questions to sample the material of the
subject area broadly, within a reasonable time limit.
Use the essay question for purposes it best serves, like
organization, handling complicated ideas and writing.
Prepare questions which require considerable thought, but
which can be answered in relatively few words.
Determine in advance how much weight will be accorded each
of the various elements expected in a complete answer.
Without knowledge of students’ names, score each question
for all students.
Require all students to answer, all questions on the test.
Write questions about materials immediately relevant to the
subject.
Study past questions to determine how students performed.
Make gross judgments of the relative excellence of answers
as a first step in grading.
Word a question as simple as possible in order to make the
task clear.
Do not judge papers on the basis of the external factors
unless they have been clearly stipulated.
Do not make a generalized estimate of an entire paper’s
worth.
Do not construct a test consisting of only one question.
Scoring Essay Tests
Most teachers would agree that the scoring of essay items and tests
is among the most time – consuming and frustrating tasks associated with
classroom assessment. Teachers are frequently not willing to devote a
large chunk of time necessary for checking essay tests. It almost goes
without saying that if reliable scoring is to be achieved, there is a need for
the teacher to spend considerable time and effort.
Before focusing on the specific methods of scoring essay tests, let
us consider the following guidelines. First, it is critical that the teacher
prepare in advance a detailed ideal answer. This is necessary as it will
serve as the criterion by which each student’s response will be judge. If
this is not done, the results could be terrible. The subjectivity of the teacher
could seriously prevent consistent scoring, and it also possible that student
responses might dictate what constitutes correct answers. Second, student
papers should be scored anonymously, and that all answers to a given
item be scored one at a time, rather than grading each student’s total test
separately.
As already pointed out, essay questions are the most difficult to
check owing to the absence of uniformity of response on the part of the
students who took the test. Moreover, there are a number of distractors on
the students’ responses that can contribute to subjective scoring of an
essay item (Hopkins et al, 1990). These distractors include the following:
handwriting, style, grammar, neatness, and knowledge of the students.
There are two ways of scoring an essay test: holistic and analytic
(Kubiszyn & Borich, 1990).
Holistic Scoring. In this type of scoring, a total score is assigned to
each essay question based on the teacher’s general impression of over –
all assessment. Answers to an essay question are classified into any of the
following categories: outstanding; very satisfactory; fair; and poor. A score
value is then assigned to each of these categories. Outstanding response
gets the highest score, while poor response gets the lowest score.
Analytic Scoring. In this type of scoring, the essay is scored in
terms of its components. An essay scored in this manner has separate
points for organization of ideas; grammar and spelling; and supporting
arguments or proofs.
As an essay test is difficult to check, there is a need for teachers to
ensure objectivity in scoring students’ responses (Hopkins et al, 1990). To
minimize subjectivity in scoring an essay test, the following guidelines have
to be considered by the teacher (Airisian, 1994):
Decide what factors constitute a good answer before
administering an essay question.
Explain these factors in the test item.
Read all answers to a single essay question before reading
other questions.
Reread essay answer a second time after initial scoring.
CHAPTER 7
Administering and Scoring Objective
Paper – and – Pencil Test
While it is true that test formats and content coverage are important
ingredients in constructing paper – and – pencil tests, the conditions under
which students shall take the test are equally essential.
This chapter is focused on how tests should be administered and
scored.
Arranging Test Items
Before administering a teacher – made test, test items have to be
reviewed. Once the review is completed, these items have to be
assembled into a test. The following guidelines should be observed in
assembling a test (Airisian, 1994; Jacobsen et al, 1993):
1. Similar items should be grouped together. For example,
multiple – choice items should be together and separated from
true – false items.
2. Arrange test items logically. Test items have to be arranged
from the easiest to the most difficult.
3. Selection items should be placed at the start of the test and
supply items at the end.
4. Short – answer items should be placed before essay items.
5. Specify directions that students have to follow in responding
to each set of grouped items.
6. Avoid cramming items too close to each other. Leave
enough space for the students to write their answers.
7. Avoid splitting multiple – choice or matching items across
two different pages.
8. Number test items consecutively.
Administering the Test
Test administration is concerned with the physical and psychological
setting in which students take the test, for the students to do their best
(Airisian, 1994). Some guidelines that teachers should observe in
administering a test are discussed below.
Provide a quite and comfortable setting. This is essential
as interruptions can affect students’ concentration and their
performance in the test.
Anticipate questions that students may ask. This is also
necessary as students’ questions can interrupt test – taking.
In order to avoid questions, teachers have to proofread their
test question before administering it to the class.
Set proper atmosphere for testing. This means that
students have to know in advance that they will be given a
test. In effect, such information can lead them to prepare for
the test and reduce test anxiety.
Discourage cheating. Students cheat for a variety of
reasons. Some of these are pressures from parents and
teachers, as well as intensive competition in the classroom.
To prevent and discourage cheating Airisian (1994)
recommends the following strategies: strategies before
testing; and strategies during testing.
Strategies before Testing
Teach well.
Give students sufficient time to prepare for the test.
Acquaint the students with the nature of the test and its
coverage.
Define to the students what is meant by cheating.
Explain the discipline to be imposed when caught cheating
Strategies during Testing
Require students to remove unnecessary materials from their
desks.
Have students sit in alternating seats.
Go around the testing room and observe students during
testing.
Prohibit the borrowing of materials like pen and eraser.
Prepare alternate forms of the test.
Implement established cheating rules.
Help students keep track of time.
Scoring Test
After the administration of a test, the teacher needs to check the
students’ test papers in order to summarize their performance on the test.
The difficulty of checking a test differs with the kind of test items used.
Selection items are the easiest to scores, followed by short answer
response and completion items. The most difficult to score, however is the
essay item.
Scoring Objective Tests. The following guidelines have to be
considered by a teacher in scoring an objective test:
Key to correction has to be prepared in advance for use in
scoring the test paper.
Apply the same rules to all students in checking students’
responses to the test questions.
Score each part of the test to have a clear picture of how
students fared in order to determine areas they failed to
master.
Sum up the scores for grading purposes.
Conducting Post – test Review
After scoring a test and recording results, teachers have to provide
students information on their performance. This can be done by writing
comments on the test paper to indicate ho students fared in the test.
Answers to the items have to be reviewed in class for the students to know
where they committed mistakes. In so doing, students will become aware
of the right answer and how the test was scored and graded.