EDUC 603: Evaluation of Student Learning Course Notes · EDUC 603 Course Notes Page 2 ; ......

EDUC 603: Evaluation of Student Learning Course Notes

Note: Italicized objectives will not be addressed within the course exams. Bold faced objectives will be measured within the exams, but they are not explicitly addressed within the course text.

Part One: Assessment “Big Picture”

Topic 1. Why Do Teachers Need to Know about Evaluation and Assessment? 1.1 Define assessment.

Educational assessment is a formal attempt to determine a learner’s status with respect to educational variables of interest.

1.2 Identify and describe educational variables of interest that are important to you (not just achievement variable), including:

Expected Terminal Outcome Performance Unexpected Terminal Outcome Performance Embedded Practice Performance Expected Attitudes Unexpected Attitudes Instructional Component Perception Social Interaction Concerns Instructor Concerns Learner Concerns

See Appendix B: Assessment Item Samples by Variable Type for more details about these different variables.

EDUC 603 Course Notes Page 2

1.3 Distinguish between definitions and examples of assessment versus evaluation.

Assessment refers to determining the status of learners with respect to educational variables of interest, and evaluation represents placing a value on the status identified. For example, measuring a learner’s level of skill acquisition on a test represents an act of assessment, but labeling the test score with a letter grade represents evaluation. Collecting achievement and attitude data from a group of learners after they experience an educational program represents assessment, but using the assessment data to identify strengths and weaknesses of the program itself, and suggesting changes to the program based on the assessment data, represents evaluation.

1.4 Describe some of the important reasons why educators should apply effective

assessment and evaluation principles and procedures within their professional practice.

More Traditional Reasons:

Diagnosing Students’ Strengths and Weaknesses Monitoring Student Progress Assigning Grades Determining Effectiveness of Instruction

More Modern Reasons:

Influence Public Perceptions of Educational Effectiveness Evaluate Teachers Clarify Instructional Intentions Continually Improve Professional Practice


Topic 2: Reliability of Assessments 2.1 Given a description of reliability measures applied to a specific set of data, determine

whether stability (t-rt), alternate-form, or internal consistency reliable has been estimated.

2.2 Given a description of test analysis need, determine the best type of reliability to

estimate (stability, alternate-form, or internal consistency).

Evidence of Internal Validity, External Validity, and Reliability

Internal External

Validity (Relevance)

Does the assessment measure what it purports to measure (did experimental treatment or instruction make the difference...or was it something else?)

To what extent are assessment results comparable and transferable (can you generalize to other samples in the population?)

Reliability (Consistency)

Do the same methods yield the same assessment result?

Using the same methods, will you consistently obtain the same assessment results elsewhere?

Note: An assessment can be reliable, but not valid. But an assessment cannot be valid if it isn’t reliable. Stability Reliability (test-retest) = The degree to which a learner’s performances on the same

assessment administered at two or more different times are similar.

Alternate-Form Reliability = The degree to which a learner’s performances on different

versions of an assessment are similar. Internal Consistency Reliability = The degree to which items within an assessment are

functioning in a consistent manner.


2.3 Describe the purpose of the standard error of measurement. 2.4 Use the standard error of measurement to help interpret the meaning of individual test

scores.

Standard Error of Measurement = An estimate of the consistency of an individual’s test performance on other equally-difficult (or easy) tests. It is calculated from the standard deviation of the test scores as well as the reliability of the test itself. Teachers should always be aware that their tests, as well as those developed by others, are not 100% precise. Any individual score reflects an estimate of ability. The lower the standard error, the more precise the estimate…but it will rarely be zero.

2.5 Classify given data sets as measured along a nominal, ordinal, interval, or

ratio scale. Generally, there are four types of numerical measurement data:

Measurement Definition Examples

Nominal Numbers are assigned to categories. Differences between numbers don’t mean anything (no mathematical significance) other than differences in categories.

religion, gender, occupation, political party preference, class

Ordinal Numbers represent a hierarchical or rank ordering. Differences between numbers mean something, but differences between numbers are not equal.

level of education, attitude survey responses

Interval Numbers represent a particular hierarchy and order, and the differences between them are relatively equal, but there is no true zero on the scale.

IQ tests, many other norm-referenced test scores

Ratio Numbers represent a particular hierarchy and order, and the differences between them are equal, and the scale is anchored by a true zero.

number of sibling, wages earned, number of arrests, performance on criterion-referenced tests

2.6 Identify the definitions of the following basic statistical terms: Mean, Median,

Mode, Range, Standard Deviation, Correlation Coefficient

See Appendix D: Basic Measurements and Statistics for Teachers.


Topic 3: Evidence of Validity 3.1 Distinguish between definitions and examples of reliability estimates and evidence of

validity.

Reliability asks: “Do the same methods yield the same assessment result?” Validity asks: “Does the assessment measure what it purports to measure (did instruction make the difference...or was it something else?)”

3.2 Given descriptions of tests conducted to determine evidence of validity for specific

assessment data, determine whether content-related, criterion-related or construct-related evidence is being estimated.

Evidence of Validity Description

Content The extent to which an assessment procedure adequately represents the content of the curricular aim being measured

Construct The extent to which empirical evidence confirms that an inferred construct exists, and that a given assessment procedure is measuring the construct accurately

Criterion The degree to which performance on an assessment procedure accurately predicts a learner’s performance on an external criterion

3.3 Describe the relationship between validity and reliability. If an assessment is permitting valid score-based inferences, you can be assured that the assessment is also yielding reasonably reliable scores. In fact, an assessment cannot be valid if it is not reliable. BUT, an assessment can be reliable without being valid.


Topic 4: Fairness Bias = Offending or unfairly penalizing any group of students on the basis of personal

characteristics, such as gender, ethnicity, SES, religion, race or other variables affecting communication (i.e. learning disabilities, language issues, general writing abilities etc.)

Biased items lead to distortions in truly identifying how well an individual student can perform the actual skill you are trying to measure.

4.1 Given an assessment item, identify where it might be biased and suggest strategies

for improving the item.

Step One: Look for examples of “offensiveness”

Negative stereotypes Language

Step Two: Determine whether or not content and/or mode of testing

exhibits unfair penalization:

Esoteric content (only understood by sub-group) Vocabulary (using words like “esoteric”) Writing skills (biased towards the better writers)

4.2 Describe major difficulties facing educators who wish to use empirical methods to identify and correct item bias within the tests they create themselves.

Major difficulty: By the time you have collected data from enough subjects in a sub-category, the assessment and other factors associated with bias would most likely have changed.


Part Two: Developing and/or Evaluating Effective

Assessment Items

Topics 5 & 6: Developing Selected and Constructed Response test Items 6.1 Edit poorly-worded assessment directions to make them more well-written. 6.2 Edit poorly-constructed selected response assessment items to make them

more well-written. 6.3 Given an instructional objective, write selected response and/or constructed

response assessment items that appropriately measure the SKA indicated within the objectives under the conditions stated (if applicable).

Information and examples to help address these outcomes are presented in the separate Guide for Developing Effective Assessment Items.


Part Three: Assessing Products of Context-Driven Instruction

Topics 7 & 8: Developing Performance and Portfolio Assessments

7.1 Construct a well-written rubric scoring guide for a given instructional goal or

task. Information and examples to help address these outcomes are presented in the separate Guide for Developing Effective Assessment Items.

8.1 Describe the different roles portfolios CAN play within an instructional environment. Role Description Artifact Creation as Instructional Context

An electronic portfolio is defined by the digital artifacts it presents. The content of such artifacts do not often relate directly to the use of technology, but successfully using technology to create artifacts often necessitates the learning and/or application of a variety of worthwhile skills. This represents a very concrete learning context. In addition to defining concrete creation-oriented learning contexts, the actions surrounding the development of digital material often defines experiences that involve learning and/or applying problem solving as well as collaboration skills.

Goal-Setting

Portfolios can help define both large “meta” instructional goals as well as smaller goals. Planning the creation of portfolio artifacts involves teacher-learner communication and clear goal-setting. If analytic rubrics will be used to evaluate the artifacts, specific categories and items within the rubric constitute clear goals available for review at any time throughout the learning process.

Assessment

Successfully developing artifacts for an electronic portfolio can constitute evidence of learning. The learning of content-related as well as technology and collaboration skills can often be clearly identified by within a successfully-completed portfolio artifact. Designing and developing electronic portfolio artifacts generally constitutes a complex set of tasks, so detailed assessment instruments (including analytical rubrics) are often used. This type of assessment can encourage the learning and testing of higher-order, critical-thinking intellectual skills. In addition, learners can use detailed assessment rubrics as guides to help them acquire the intended skills.

Reflection

The experience of designing, developing, and presenting electronic artifacts provides numerous opportunities to reflect on the learning experience. It is very easy to include reflection requirements within the portfolio. Directing reflective activities and experiences is a very effective instructional strategy, particularly for adult learners.

Communication

Electronic portfolios make it easy to distribute artifacts to others (family, friends, colleagues, and potential employers), especially if the digital portfolio is Web-


based. Electronic portfolios can also provide the mechanisms for helping group members living in different geographic locations work collaboratively on projects.

Instructor Planning and Management Tool

Creating a learning environment in which learners must develop electronic portfolio artifacts can help teachers manage the instructional process by enabling them to view, track, and evaluate progress. Also, determining the types of artifacts to be included within student portfolios and creating the analytic rubrics to help guide student portfolio development constitute effective planning practice.

Learner Organization Tool

Portfolio development can help learners organize their time and resources throughout a learning experience. “Assignments,” “In-Progress” and “Completed” folders, as well as calendars, timelines and progress checklists can help to organize resources and monitor progress. In addition, analytic assessment rubrics can be used as instructional scaffolds, and existing artifacts can be used as instructional examples.

8.2 Given a specific portfolio assessment scenario, critique how well the portfolio adheres

to the seven key ingredients of portfolio assessment.

Seven key ingredients of portfolio assessment mostly according to Popham (pp. 216-217):

1. Make sure your students “own” their portfolios by providing opportunities for

creative expression and self-reflection. 2. Ensure a wide variety of artifacts are included, based on the role of the

portfolio. 3. Create a clear, simple organizational scheme for collecting and storing

artifacts (and include strategies for organizing “assignments,” “in-progress,” and “completed” work).

4. Clearly identify and communicate how each artifact will be assessed and evaluated.

5. Compel students to continually evaluate their “In-Progress” work and its resulting artifact(s).

6. Schedule and conduct portfolio conferences. 7. Involve parents in the portfolio assessment process (technology can be used

to help with this!).


Part Four: Using Assessment Data to Improve Instructional Practice

Topic 13: Improving Teacher-Developed Assessments

13.1 Differentiate between examples of judgmental and empirical assessment item

improvement strategies. Judgment improvement strategies reflect ways to improve assessments simply by making personal judgments about the items. These strategies might include: Determining if items follow rules for good design (use separate Guide) Judging accuracy of content Determining if all objectives are measured Determining if bias is as absent as possible

Judgment evaluations of assessment can be conducted by teachers as well as the students who participate in the assessments. Questions about tests can be asked about the clarity of the items, vocabulary, etc. Empirical improvement strategies refer to the analysis of actual data to make decisions about changes needed to improve the quality of assessments.

13.2 Given a set of scores (pretest and/or posttest scores) for a particular test, calculate each item’s discrimination index (D) for both norm-referenced as well as criterion-referenced assessment items.

See the Popham text, Chapter 11.

13.3 Given the results of an item (distracter) analysis for a multiple-choice assessment (including p and D values) along with a general description of the purpose of the test, determine whether or not specific test items should be included in subsequent versions of the test.



Topic 14: Making Sense Out of Standardized Test Scores 14.1 Distinguish between examples of norm-referenced and criterion-referenced

assessments. Criterion-referenced measurements represent “test” or performance items designed to measure performance on an explicit set of skills, knowledge, and attitudes indicated within clearly-stated objectives. Norm-referenced measurements interpret and report “test” or performance results relative to how all subjects perform on the measurement instrument (IQ tests and Iowa Test of Basic Skills, for example). Examples of criterion and norm-referenced measurements:

Criterion-referenced Norm-referenced Juan’s performance on the Survey of Essential Skills (SES), the Montgomery County School District’s (MCSD) test measuring performance on Virginia SOL reading objectives, indicated that he had mastered every objective for Grade 1. Juan’s performance on the SES also indicated that he had mastered all the MCSD reading objectives for the first semester of Grade 2.

Juan is a Grade 1 pupil at Harding Avenue Elementary School in Blacksburg. At the end of the year, he obtains a grade-placement score of 2.7 on the Virginia Reading Test (VRT). Juan’s score on the VRT places him at the 98th percentile among all children in a large sample that took the VRT during its last revision.

10.3 Compare and contrast percentile, grade-equivalent, and scale test scores.



Some Important Vocabulary… Aptitude versus Achievement

Aptitude: Likelihood of a student performing well in a future setting (generally academic). IQ, SAT are examples. Achievement: Knowledge or skills a student possesses. Virginia SOL, ACT (traditionally) are examples.

Standardized Test:

Norm-referenced or criterion-referenced aptitude or achievement assessments designed to be administered, scored, and interpreted in a standardized manner. Traditionally selected-response.

Interpreting Standardized Test Scores: Important Terms

Mean: Average assessment raw score for a population. Standard Deviation: Degree to which raw scores in a population vary from the mean [average difference between individual scores and the population mean]. Standard Error of Measurement: Estimate of the measurement error inherent in an assessments ability to accurately measure performance. It reflects the general likelihood of scoring the same on an equivalent retest. Raw Score: Actual number of items scored correctly from an entire testing instrument. Scale Score: Raw scores converted to some sort of scale, usually to compensate for differences in scores due to different instrument versions. Often hard to interpret. Percentile Score: Percent of students in the norm group that the students outperformed (relative score). Easy to interpret. Stanine Score: The position of a score within nine equivalent range segments in a score distribution. Grade Equivalent Score: Performance inference indicating the probable grade level (left of the decimal) and month (right of the decimal) the student performance reflects. Most accurate with curricular content common across all members of the norm group (reading, math). Easy to misinterpret. Pass Rates: Percent of students in a population who scored above the pass/fail cut-off score (raw or scaled scores).


Appendix A: Learning vs. Instruction vs. Teaching Learning

Development of new knowledge, skills, or attitudes resulting from an individual’s "external" interaction with her/his environment (media*/information) and/or "internal" interaction between new and previously-existing knowledge structures.

* Media (Medium is the singular) represent the physical elements within the environment that communicate messages through one or more content forms (text, audio, still images, animations, video, models, etc.). Realistically, learning can only be inferred by observing a permanent and persistent change in a person’s behavior.

Instruction

Arranging an environment (media-presented information) specifically to facilitate learning.

The following list represents some important factors affecting how a person learns from a particular arrangement of the environment. This isn’t THE list (I don’t think THE list exists), just some important factors. Each factor is classified as being “internal” to the learner, or “external” (part of the instructional environment).

Cognitive abilities and developmental level of the learner [internal] Previous experiences of the learner [internal] Instructional components and conditions present within the context (i.e. type and

amount of practice with feedback) [external] Degree of message abstraction Perceived relationship, if any, between verbal and visual stimuli [external and

internal] Clarity and effectiveness of messages presented (i.e. examples, nonexamples)

[external] Amount, type and rate of information presented at any given time [external] Amount and type of support mechanisms (scaffolds) accessible throughout the

learning experience [external] Motivational strategies integrated into the learning experience [external] Type and degree of interactions possible with instructional media present

[external] Social interaction within the learning context [external] Intrinsic, personal motivational level or “state” [internal] Personal learning style [internal] Overall instructional context. [external]


Teaching

Teaching represents designing, implementing, facilitating, assessing and evaluating effective instruction. This involves, but is not limited to…

• Deciding what is important to be learned (goals and objectives) • Planning instructional experiences • Assessing and evaluating learners • Managing resources • Facilitating communication between students, teachers, parents, experts


Appendix B: Assessment Item Samples by Variable Type

1. Data Type: Expected terminal outcome performance When: Conclusion of the lesson/unit/course How: Posttest Sample: Specific test items referenced precisely to instructional objectives (two items per

outcome can help establish reliability)

2. Data Type: Unexpected terminal outcome performance

When: Conclusion of the lesson/unit/course How: Posttest Sample: Essay items or problems in which learners must apply skills and knowledge

learned to new situations. New conditions presented which were not included in the original objectives.

3. Data Type: Embedded practice performance When: Throughout the instructional experience. How: End of chapter questions, embedded practice items following information and

examples. Sample: All items should be referenced to specific objectives, and the performance elicited

within each practice item should be the exact performance stated in its corresponding objective under the exact same conditions stated in the objective. Feedback should be as soon as possible.


4. Data Type: Expected attitudes

When: Usually at the conclusion of the lesson/unit/course How: Self-administered questionnaire, interviewer-administered questionnaire (telephone,

chat, e-mail dialogue), interview-administered questionnaire (in person), open-ended in-person interviews, focus groups, direct observations

Samples*: 4.01 The information presented in this program [lesson, unit, etc.] was interesting. 4.02 The information presented in this program was easy for me to understand. 4.03 I tried hard to understand the material presented in this lesson. 4.04 I concentrated on learning throughout this entire lesson. 4.05 I enjoyed learning about the things presented in this lesson. 4.06 I would like to learn more about the information presented in this lesson. 4.07 I am confident that I will perform well on the upcoming test. 4.08 The lesson introductions really caught my attention. 4.09 The information and activities held my attention throughout each of the lessons. 4.10 The information and examples presented in this lesson were relevant to my life. 4.11 I am certain that the things I learned in this program will be useful to me in the real

world. 4.12 The things I learned in this course are valuable to me right now. 4.13 The rewards I received for trying hard in this program were fair. 4.14 I am satisfied with what I learned from this course. 4.15 I am confident that I will be able to successfully do the things I learned how to do in

this course when I need to in future. 4.16 I learned a lot from this lesson.


5. Data Type: Unexpected attitudes When: Usually at the conclusion of the lesson/unit/course How: Self-administered questionnaire, interviewer-administered questionnaire (telephone,


Samples*: 5.01 Compared to other courses I have taken, this was one of the best. 5.02 I would recommend this program to my friends. 5.03 The best thing about this lesson was _________. 5.04 The worst thing about this lesson was ________. 5.05 If I could change one thing about this program, I would change.... 5.06 If I had to teach this course to others, I would want use the same material

presented.


6. Data Type: Program or product ID component perception

When: Usually at the conclusion of the lesson/unit/course, though they might be more

useful in a formative evaluation if they were asked after every lesson or unit. Many of these items would not be asked outside a formative evaluation.

How: Self-administered questionnaire, interviewer-administered questionnaire (telephone,


Samples*: 6.01 It was always clear to me what I was suppose to be learning throughout this

program. 6.02 I was informed about what I already needed to know before beginning this program. 6.03 Information in the program helped me relate what I was about to learn with my own

previous experiences. 6.04 Information in the program helped me understand how accomplishing the course

goals would be useful to me. 6.05 Rewards and incentives for me learning a lot and succeeding within this program

were made clear 6.06 It was clear to me how I would be held accountable for learning the things I was

suppose to learn in this program. 6.07 It was clear to me what role the instructor was going to play in helping me learn the

things I needed to learn within this program. 6.08 I always knew where I could go for help when I needed it during this course. 6.09 The context for learning and applying the things I needed to learn in this program

was meaningful and purposeful. 6.10 The information presented within the program was well-organized and clear. 6.11 Enough good examples were provided for the different things I needed to learn. 6.12 It was made clear to me exactly how I was expected to recall the new information I

was learning. 6.13 When I had to learn rules, the process of applying the rules to specific situations was

broken down into steps, and these steps were clearly communicated to me. 6.14 The instructor/instruction modeled for me how to correctly perform the new things I

was learning how to do. 6.15 New terms and concepts were used in sentences that I could easily understand. 6.16 Enough guidance was provided for me as I applied the new skills and knowledge to

practice situation. 6.17 I had enough opportunities to explore the learning environment with minimal

instructor guidance and intervention. 6.18 I had enough opportunities to practice the things I needed to practice to ensure that

I really did learn what the program intended. 6.19 Feedback over my practice performance was provided in a timely manner. 6.18 When I did not perform a practice item correctly, I was clearly informed about what I

did wrong. 6.20 When I did not perform a practice item correctly, I was given assistance for learning

how to not make the same mistakes in future practice. 6.21 I was encouraged to summarize the key ideas and restate the instructional

objectives at the end of each lesson or unit. 6.22 The quiz items matched the practice items.


7. Data Type: Social interaction concerns When: Usually at the conclusion of the lesson/unit/course, though they might be more

useful in a formative evaluation if they were asked after every lesson or unit. Many of these items would not be asked outside a formative evaluation.


chat, e-mail dialogue), interview-administered questionnaire (in person), open-ended in-person interviews, MAYBE focus groups (though some items might be too sensitive), direct observations

Samples*: 7.01 Competing against my classmates helped motivate me to learn. 7.02 Being forced to work with a partner was good for me. 7.03 My assigned partner(s) helped me learn the things I needed to learn in this class. 7.04 It was beneficial to me that I could only achieve my learning goals for this course if

all my group members also achieved theirs. 7.05 Receiving a joint reward with all my group members for accomplishing our

instructional goals was fair. 7.06 I felt as if the group I was assigned to did establish its own identity. 7.07 I was clear about the role i was to perform within my assigned group. 7.08 My assigned role in the group was fair. 7.09 There were too many students assigned to my group. 7.10 I had to help other people in my group members learn some of the skills for this

course. 7.11 It was easy for me to communicate with my partners. 7.11 I prefer to work by myself on course projects and assignments.


8. Data Type: Instructor concerns

When: Usually at the conclusion of the program, though they might be more useful in a

formative evaluation if they were asked after every lesson or unit. Many of these items would not be asked outside a formative evaluation.


chat, e-mail dialogue), interview-administered questionnaire (in person), open-ended in-person interviews, focus groups if a number of instructors are involved, direct observations

Samples*: 8.01 I was clear about exactly what my role would be in this instructional program. 8.02 I was able to get adequate help in solving technical problems associated with

implementing this instruction. 8.03 There were too many opportunities for the students to communicate with me about

the instructional material presented throughout the program. 8.04 Program implementation materials and directions were adequate. 8.05 I felt the content presented within the program was complete and accurate.


9. Data Type: Learner concerns When: Usually at the conclusion of the lesson/unit/course. How: Self-administered questionnaire, interviewer-administered questionnaire (telephone,


Samples*: 9.01 There were not enough opportunities for me to interact with the instructor

throughout this program. 9.02 It was easy for me to communicate with my instructor when I needed help. 9.03 Accessing the course materials was easy. 9.04 Navigating the course website was clear and easy. 9.05 I feel that having web-based material really helped me learn what i was suppose to

learn from this program. 9.06 I had control over the information presented to me related to the course. 9.07 If I needed to, I could choose to access more practice. 9.08 If I needed to, I could easily access additional information and examples over the

concepts presented in the course.

*These sample items are presented in a manner to accommodate a self-report questionnaire utilizing a scale method of response (Strongly Agree, Agree, Disagree, Strongly Disagree). They could be worded differently to serve interviewer-administered, open-ended interview, or focus-group approaches if desired. Also, most items are worded in a positive fashion for the sake of comparative clarity. Each Item could also be worded in a negative manner if desired.


Appendix C: Advantages and Disadvantages of Different Assessment Data Collection Procedures

Procedure Advantages Disadvantages

Criterion-referenced practice items

Should already be part of the instructional material. Provides “hard” data for the evaluator and stakeholders.

Often difficult to record and organize practice performance data. If no practice exists, it may be difficult or impossible to develop and implement into existing instructional program.

Criterion-referenced terminal assessment items

Should already be part of the instructional material. Provides “hard” data for the evaluator and stakeholders.

If a posttest already exists, there is a good chance that the items do not align perfectly with the objectives. Could be difficult and expensive to develop new assessment instruments.

Open-ended essay items

Economic way to elicit multiple skill assessment. Provides “hard” data for the evaluator and stakeholders.

Bias against poor writers, often difficult and time-consuming to grade (inter-rater reliability becomes an issue if more than one person evaluates), difficult to perform congruence analysis with objectives.

Transfer/application assessment items

Many stakeholders value transfer performance. Can provide opportunity to validate posttest measures.

Many instructional programs do not include transfer opportunities that align with program objectives. May be time consuming.

Self-administered questionnaire

Inexpensive. Can be quickly administered if distributed to group. Can be embedded throughout program if desired.

No control for misunderstood questions, missing data, or untruthful responses. Not suited for exploration of complex issues.

Interviewer-administered questionnaire (telephone, chat, e-mail dialogue)

Relatively inexpensive. Can be convenient for respondents. Best suited for short, non-sensitive topics.

As a rule, not suitable for children, the elderly, or non-English speaking people. Not suitable for lengthy questionnaires or sensitive topics.

Interview-administered questionnaire (in person)

Interviewer controls situation, can probe irrelevant or evasive comments; with good rapport, may obtain useful open-ended comments.

Expensive. May present logistics problems (time, place, privacy, access). Often requires lengthy data collection period.

Open-ended in-person interviews

Usually yields richest data, details, new insights. Best if in-depth information is desired.

Same as above; also, may be difficult to analyze.

Focus groups

Useful to gather ideas, different viewpoints, new insights. Participants may feel more comfortable sharing feelings they know others also feel.

Not suited for generalizations about population being studied. “Crowd” mentality may cause respondents to articulate negative or positive feelings in an untruthful, strong manner. Data may be difficult to analyze.

Direct observations If well executed, best for obtaining data about behavior of individuals and groups.

Usually expensive. Needs well qualified staff. Observation may affect behavior being studied.

Note: Many of these suggestions were articulated in the NSF publication User-Friendly

Handbook for Project Evaluation (NSF 93-152).


Appendix D: Basic Measurement and Statistics for Teachers

Descriptive Statistics

Descriptive statistics are numbers representing some characteristics of a set of numerical data.

Type

Procedure

Definition

Common Symbols

Measures of Central Tendency

Mean Average score of the entire set or “distribution” of scores

_ X, M

Mode Most commonly occurring score in a sample

Median Middle score in a distribution Measures of Variability

Variance Degree of variability of individual scores in a sample

S2

Standard Deviation

Square root of the variance (uses same scale as measurement)

SD

Measures of Relationship Correlation

Coefficient

Degree of correspondence between 2 characteristics of a sample (+1.00=perfect positive relationship, -1.00=perfect negative relationship)

r


Inferential Statistics Inferential statistics involve a chain of reasoning that connects the observed data to populations of data too large to observe completely.

Data Type

Procedure

Definition

Common Symbols

Ratio (sometimes Interval)

t-test Statistical significance* of the difference between two means

F-test Statistical significance of the differences between means in an ANOVA, MANOVA, ANCOVA

F (degrees of freedom), p

Analysis of Variance (ANOVA)

Statistical significance of the difference between two or more means

F, p

Analysis of Covariance (ANCOVA)

Statistical significance of the difference between two or more means across two or more related independent variables

F, p

Multivariate Analysis of Variance (MANOVA)

Same as ANOVA except more than one dependent variable is employed

F, p

Multiple Comparisons (Tukey, Sheffé, Newman-Keuls, etc.)

Compare the differences between two means within an ANOVA, ANCOVA, or MANOVA design

t-test, F, p

Nominal, Ordinal or Interval Scale

Chi-square

Determine if a relationship exists between two categorical variables

χ2

Mann-Whitney U-test Wilcoxon signed rank test

Same as t-test except nonparametric data are analyzed

p

Kruskal-Wallis test Friedman test

Same as ANOVA except nonparametric data are analyzed

*”Significance” is the probability (p) that any differences are due to chance instead of the independent variable. The alpha (α ) is the level of significance chosen in advance by the researcher.


Appendix E: Conducting a One-to-One Formative Evaluation

What is the purpose of this evaluation type?

Identify the following:

Content clarity Clarity of directions Completeness of instruction Difficulty level Quality Typographical/grammatical errors General motivational appeal

When is this type of evaluation usually conducted?

One-to-one evaluations are usually conducted after the expert review evaluation but before any other type of formative evaluation.

Who usually conducts this type of evaluation?

Instructional designer(s) [teacher] Who should participant in the evaluation?

The instructional material developer “walks" through the material with an individual learner from the target population. If possible, this type of evaluation should be repeated with members of the target population representing different skill levels, gender, ethnicity, motivation etc.

What planning strategies should the evaluator employ prior to conducting the evaluation?

The most important planning strategy is simply determining the information that needs to be collected. The information will be either intrinsic information about the instructional material, or information about the effects of the instruction. The general criteria for making this determination centers around how "rough" the instruction is at the point of the evaluation. The rougher it is, the more likely intrinsic information will be the most useful in informing future revisions.

Some specific criteria in judging the "roughness" of an instructional unit:

The unit has not been reviewed by anyone yet. More one-to-one evaluations or small group and field test evaluations will be

conducted. Either the learner population, the instructional content, or the instructional

strategies are unfamiliar to the designers. The performance measures are in need of revision.


If the thrust of the one-to-one is intrinsic, the following types of information may be appropriate for collection:

Is the instruction clear? Are the directions clear? Is the instruction complete? Is the instruction too difficult or too easy? Are the visual or aural qualities accurate? Are there typographical or grammatical errors?

For learning effects data, performance measures can be used in which the learner not only completes the measures, but critiques them.

What procedure should be followed during the evaluation?

1. Prepare standard interview questions to ask the learner after each instructional

activity. 2. Design standard forms to record and store learner’s reactions during instruction. 3. Prior to the instruction, prepare the learner for the evaluation experience by

explaining the goals of the instructional material and their role in helping to improve it.

4. Manage the evaluation using questions and data collection tools developed prior to the evaluation as a guide.

5. Close the evaluation by administering any data collection instruments that have been developed for the procedure, including a debriefing section in which the learner is interviewed for any additional information that may not have found its way into the data collection instruments.

6. Review and analyze the data to develop recommendations for improving the effectiveness of the instruction based on the learner’s viewpoint.

7. Revise the instruction if needed. 8. Repeat the evaluation using different learners. It is recommended that three

learners be evaluated to validate corrective actions stated in the recommendations.

What data should be collected (describe typical instruments used)?

Practice and posttest performance General attitudes (from survey as well as interview questions) Procedural problems Time Accuracy of material Ease of use

All data can be evaluated at a glance, with a list of potential revisions documented.

What is the final product? “To do” list of revisions.


What are some of the special problems and concerns facing the evaluator(s)?

Distant Subjects Some subjects may not be able to make the one-to-one session because of logistical reasons. It is suggested that these learners still be reached through other means. For example, written one-to-one questions can be inserted into the learning materials at logical breaking points in the instruction.

The Silent Learner Some subjects will be reluctant to respond, often because they do not feel comfortable criticizing the work in the presence of its creators. This can be addressed by warming them up through initial conversations or by asking them some easy questions up front or questions that put them in a position of authority. Another method is to deliberately insert some errors early in the instruction in an effort to elicit their responses.

Date post:	11-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

EDUC 603: Evaluation of Student Learning Course Notes · EDUC 603 Course Notes Page 2 ; ......

Documents