Evaluation Methods in Research

8/10/2019 Evaluation Methods in Research

1/113


2/113


3/113

ContinuumTh e Tower Building 15 East 26th Street11 York Road New YorkLondon SE1 7NX NY 10010

Judith Bennett 2003

All rights reserved. No part of this publication may be reproduced ortransmitted in any form or by any means, electronic or mechanical,including photocopying, recording, or any information storage orretrieval system, without prior permission in writing from thepublishers.

British Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library.

ISBN: 0 8264 6478 5 (paperback)

Typeset by RcfineCatch Limited,, Bungay, SuffolkPrinted and bound in Great Britain by MPG Books Ltd, Bodmin, Cornwall


4/113


5/113

his page intentionally left blank


6/113


7/113

Evaluation Methods in Research

Figure 1 Why evaluation might be impo rtant

an evaluation, focusing on a number of key questions which

need to be addressed. Finally, in Section 6, three examples ofevaluations are presented and discussed, illustrating howprinciples have been put into practice.


8/113

Introduction

Any book on evaluation needs to address the issue of how

far details of general aspects of research methods should beincluded. In this instance, the decision has been made toinclude information about key research strategies and tech-niques, as these are central to the planning and design of anevaluation, but to exclude material on aspects of researchmethods which are covered in detail in a number of other texts.Thus, matters to do with the detail of designing research

instruments (questionnaires, interview schedules, observationschedules) and data analysis are not included in this book.However, the Appendix gives details of further reading onresearch methods, and a Glossary has been included for thosewho may be less familiar with some of the terms used ineducational research and evaluation.

If you arc reading this book, you are likely to be involved inan educational innovation, and asking one or more of thefollowing questions: What works? How does it w ork? How can it bemade to work better? I hope what you read here helps you in yourquest to find the answers.

3


9/113


10/113

What is Educational Evaluation?

B ox 1.1 W hat is evaluation?

The process of determ ining to what extent educationalobjectives are being realized by the programme of curriculumand instruction (Tyler, 1949, 105-6).

The collection and use of information to make decisionsabout an educational programm e (Cronbach, 1963, 672).

Its purpose is to see whether curriculum objectives are being,or have been, achieved (Kerr, 1968, 21).

Evaluation is concerned with securing evidence on theattainment of specific objectives of instruction (Bloom, 1970,28).

Curriculum evaluation refers to the process or processesused to weigh the relative merits of those educational alterna-tives which, at any given time, are deemed to fall within thedom ain of curriculum practice (H amilton, 1976, 4).

Educa tional evaluation is the process of delineating, obta in-ing and providing useful information for judging decisionalternatives (Jenkins, 1976, 6).

Evaluation as illumination (Parlett and Hamilton, 1976,84).

[Evaluation is] systematic examination of events occurring inand consequent on a contemporary program me an exami-nation conducted to assist in improving this programmeand other programmes having the same general purpose(Cronbach etal., 1980, 14).

Eva luation [is] the assessment of merit or worth (Nevo, 1986,16).

Evaluation can provide a means for translating an educa-tional idea into practice as well as monitoring and enhancingcurriculum development (Hopkins, 1989, 3).

Educational evaluation is about social plann ing and control(Norris, 1990, 16).

Evaluators are interested in discerning the effects of inter-ventions over-and-above what could have been expected ifthe intervention had not been applied (Davies et aL, 2000,253).

5


11/113


Before considering in more detail what evaluation involves, it

is useful to look briefly at the sort of terminology people usewhen talking about it.

Evaluation terminology

There is considerable variety in the terms people use when

talking about aspects of evaluation. Evaluation itself mayvariously and interchangeably be described as educationalevaluation, curriculum evaluation, or program(me) evaluation,with the last term being more comm on in the USA.

Several terms may also be used to describe the change inprovision which is being evaluated. Th ese include curriculumdevelopment, program(me) development, program(me) imple-mentation, curriculum innovation, innovation, curriculumintervention, intervention or intervention strategies. Again,these terms are interchangeable. The policy norm ally followedin this book is to use the terms 'evaluation' or 'educationalevaluation', and to refer to changes in provision as 'innovations'or new prog ram m es'. However, where the work of particular

authors is described, the terms used in their original works havebeen retained.

What is evaluation?

Almost all the statements in Box 1.1 have in common thenotions that evaluation involves learning about a new pro-gramme through gathering information, and that this infor-mation should be linked to decision-making. The informationgathered might include the scores students achieve on tests,other measures of cognitive abilities, measures of attitudes,data from observations, and questionnaire and interview data

from students, teachers and others associated with the pro-gramme. The statements tend to differ in their view of whichaspects of such data should be em phasized, and the purposes to


12/113


which it should be put. The earlier statements, such as those ofTyler (1949) and Kerr (1968), talk in terms ofmaking judgementsabout the extent to which the objectives of a new programm ehave been achieved, whilst later ones, such as those of Jenkins(1976) and Parlett and Hamilton (1976), see learning from theprocess of introducing a new programme as an important element ofevaluation. The most recent (Davies et ai, 2000) hints at areturn to the ideas underpinning the views expressed in the

earlier statements, and indicates that there is ongoing debateover where the balance lies between the learning and judgingdimensions of evaluation.

For the purposes of this book, evaluation has been taken toinvolve the following:

a focus on the introduction of a new programm e;

collecting and analysing em pirical data; reaching some form of conclusions or judgements about the

data; com municating these findings to an audience; using the findings to inform decision-making.

Why undertake evaluation?

As the statements in Box 1.1 indicate, evaluation may be under-taken for a variety of purposes. The two main reasons are todetermine the effectiveness of a new programm e once it hasbeen implemented, and to gather information for improving

the programme as it is being developed. However, the informa-tion gathered through an evaluation can be used in a numberof ways. Cronbach (1963) has suggested it serves three impor-tant purposes by informing course improvement, decisionsabout individuals, and administrative regulation. He explainsthese as follows:

Course improvement: deciding what instructionalmaterials and methods are satisfactory and where changeis needed.

7


13/113


Decisions about individuals: identifying the needs of the

pupil for the sake of planning his instruction, judging pupilmerit for purposes of selection and grouping, acquaintingthe pupil with his own progress and deficiencies

Administrative regulation: judging how good the schoolsystem is, how good individual teachers are, etc.

(1963,232)

This book is primarily concerned with the first of thesepurposes - looking at the ways in which the techniques ofevaluation can be used to assess the effects of changes in thecurriculum and thus to make decisions about how classroompractice might be improved.

Who is evaluation for?

Ultimately, it could be argued, evaluation is for the studentsin the classrooms, in order to provide them with the bestpossible educational experiences. However, the audiencesfor evaluation reports are normally one or more groups of

decision-makers who can influence what happens in class-rooms. These groups include the people who have developeda new programme, the people using the programme, andexternal agencies and other interested parties such as thepeople who have sponsored the programme, educationalresearchers and those responsible for curriculum policy ordeveloping new programm es.

Each of the groups interested in an evaluation is likely tohave different, although overlapping, priorities and purposesfor its outcomes. Those developing the programme will wantto know how it is working and what factors appear to help itto work in as many locations as possible. They are also likely towant or need to gather information for the project sponsors,

to help persuade other po tential users to adopt the programm eand to share what they have learned with others involved indeveloping new prog rammes. Users will want to know how

8


14/113


the programm e is working in their particular location, and how

this compares with other approaches they may have used, andhow their experience compares with that in other locations.They will also want to let the developers know their views onthe programme, and share their experiences with o ther users.Those sponsoring the programme will be concerned with itseffects and also that they are getting 'value for money'. Oth erpeople involved in developing new program mes will be inter-

ested in what they can learn which might be of use in their ownwork. Policy-makers will want to know what messages emergefrom the evaluation which can usefully inform curriculumplanning and legislation.

Dimensions of evaluation

Attempts to define evaluation point to a number of differentdimensions by which it can characterized. A useful descriptionof these has been developed by Stake (1986), who identifiedeight different possible dimensions to evaluation studies. Thesearc:

formative-summativeformal-informalcase particular-generalizationproduct-processdescriptive-judgementalpreordinate-responsiveholistic-analyticinternal-external

(1986, 245-8)

Formative-summative

These two terms are frequently applied to evaluation. Astudy which is primarily seeking to gather information on theeffectiveness of a programme after it has been implemented

9


15/113


is termed a summative evaluation (sometimes also called an

outcome or impact evaluation). A summative evaluation seeksanswers to questions about what relationships exist between thegoals of the programme and its outcomes. A study which isprimarily seeking to gather information during the processof implementation, with a view to informing the developmentof the programme, is called a formative evaluation (sometimesalso called a process or progress evaluation). A formative evaluation

seeks answers to questions about the process of implem entationand how this relates to the achieved curriculum.

Formal-informal

Stake suggests that informal evaluation is 'a universal andabiding human act, scarcely separable from thinking and feel-ing' - in other words, people are doing it all the time. Formalevaluation of a programme, however, needs to be systematic,because its findings will be scrutinized and therefore need to beaccurate , reliable, credible and of use to those involved.

Case particular-generalization

The findings of an evaluation of a particular programme mayonly apply to that prog ram me specifically, or they may apply toother prog rammes which share similar approaches and features.If the aim of an evaluation is to permit generalizations to bemade, then there is a much g reater need for careful controls anddescription to provide a secure basis for these generalizations.

Product-process

Some evaluations focus primarily on the outcomes of a pro-gramme, whilst others focus on the processes which gave riseto these outcomes. Product-oriented evaluation tends to pro-

vide information about what effects are associated with aparticular programme, and process-oriented evaluation yieldsinformation about why those effects occurred.

10


16/113


17/113


about positions on one particular dimension are likely to

determine positions on other dimensions. For example, asummative evaluation is likely to be preordinate in nature, tofocus on the products and to be undertaken by an externalevaluator. In contrast, an internal evaluation is likely to beformative and look at processes.

What is the relationship between evaluationand research?

It is clear from the discussion so far that evaluation has linkswith educational research. A number of autho rs have exploredthe relationship between the two, with a variety of views beingexpressed about areas of overlap and difference. Norris (1990)suggests:

It is generally assumed that evaluation is the application ofresearch methods to elucidate a problem of action. Lookedat in this way, evaluation is not strikingly different fromresearch . . . Evaluation is an extension of research,sharing its methods and methodology and demandingsimilar skills and qualities from its practitioners.

(1990,97)

This view contrasts with that of M acD onald (1976), who sees

. . . research as a branch of evaluation a branch whosetask it is to solve the technological problems encountered

by the evaluator.(1976, 132)

Whilst both Norris and M acDonald appear to see research andevaluation as closely linked, others have suggested they aremore clearly distinct. For example, Smith and Glass (1987)identify eight characteristics which they see as distinguishing

research from evaluation. These are summ arized in Table 1.1.An important point to make about the distinctions betweenresearch and evaluation summarized in Table 1.1 is that they

12


18/113


represent an ideal, and what happens in practice may be rather

different. As Smith and Glass point out, the distinctionsbetween research and evaluation may become blurred. Twoways in which they suggest this might happen concern the'value-free' nature of research, and the findings of evaluationstudies. Research is rarely as value-free as it might aspire to be,as researchers inevitably bring their own interests, motivationsand agenda to research studies. It is also the case that an

evaluation study, although focusing on one particular pro-gramme, may generate findings that are of much wider inter-est and applicability, thus contributing to knowledge moregenerally.

Another view on the relationship between research andevaluation is given by Laurence Stenhouse in his influentialb o o k , An Introduction to Curriculum Research and Development:

Evaluation should, as it were, lead development and beintegrated with it. Then the conceptual distinctionbetween development and evaluation is destroyed and thetwo merge as research.

(1975, 122)

From the variety of views that has been expressed, it is clearthat there is considerable overlap between research andevaluation, although people may have different opinionson the degree of overlaps. In part, these views arise fromdifferent interpretations of the word 'research'. It is normallydescribed in terms which suggest that its aim is the pursuitof new knowledge and, as such, it can take a number of dif-ferent forms. One type of research - often called pure or,more colloquially, 'blue skies' research - is open-ended andexploratory, seeking new patterns, explanations and theories.This type of research is clearly distinct from evaluation, and isgenerally m ore closely associated with the na tura l sciences thanwith educational research.

Norris (1990) suggests that one of the problems withattempts to delineate differences between research and evalu-ation arises from a narrow view of research which

13


19/113

Table 1 .1 Ways in which research and evaluation may differ

Characteristic Research Evaluation

1 The intent and purpose of the study

2 The scope of the study

To advance the frontiers of knowledge,

to gain general understanding about the

phenomena being studied

May have a narrow focusSet by the researcher

Arises from curiosity and the

researcher s need to know

To the research community

Can take place at any time

Aspires to neutrality in values

Internal and external validity

To gather informat ion to judge the value

and merit of a specific innovation (or

parochial , in the words of Smith and

Glass), and to inform decisions

More comprehensiveSet by the client commissioning the

evaluation

Arises from a client commissioning the

evaluation

To the client who commissioned the

evaluation

Takes place when a problem arises or a

decision needs to be reached.

Must represent the multiple values of the

various interested groups

Utility and credibility

Adapted from Smith and Glass, 1987, 33-8.

3 The agenda of the studyy

4 The origins of the study

5 Accountabilityy

6 Timelinesss

7 Values

8 Criteria for judging study


20/113


. . . ignores the social context of educational enquiry, the

hierarchic nature of research communities, the rewardstructure of universities, the role of central governmentsin supporting certain projects and not others, and thelong established relationship between social research andreform.

(1990,99)

Pure research is often distinguished from a second type,called applied research, which involves the testing of theoriesand hypotheses. Here, the distinction between researchand evaluation is less clear-cut, as it could be argued thatany new programme aimed at improving practice is ahypothesis about teaching, and evaluation involves testing thathypothesis.

Whatever conclusions people reach about the relationshipbetween evaluation and research, those undertaking evaluationwill inevitably need to draw on the strategies and techniquesof research if they want to gather systematic evidence to helpanswer many of the questions they will be asking.

Summary

This section has provided some of the background to evalu-ation, and pointed to issues and areas of debate. These areexplored in more detail in the following sections. In particular,it has shown that:

there are various views on the nature and purpose ofevaluation;

there are several different po tential audiences for evaluation,each with their own priorities;

evaluation may be characterized in a number of differentways;

the distinction between evaluation and research is not clear-cut, but evaluation forms an important area of research ineducation.

15


21/113

Models and Approaches inEducational Evaluation

This section looks at:

general ideas about the approaches and models used ineducational evaluation;

key features of particular evaluation approaches and models,including the classical approach as exemplified by RalphTyler's 'objectives model' and the 'illuminative evaluation'approach of David Parlett and Malcolm Ham ilton;

ways of characterizing research and evaluation questions; the politics of educational evaluation; recent trends and developments in educational evaluation,

including randomized controlled trials (RCTs) and designexperiments.

Approaches and models in educational evaluation

Two general points are worth making before looking in m ore

detail at ways of app roaching evaluation. First, many attemptshave been made to summarize and describe approaches andmodels in educational evaluation. Whilst there are significantdifferences between some of the approaches and models, o thersoverlap to a greater or lesser extent. T hu s, these summaries andoverviews tend to cover m uch the same ground in slightly differ-ent ways. Second, summaries of approaches and models oftenpresent ideal cases or oversimplifications of what actually hap-pens. In practice, evaluators generally recognize the strengths

16


22/113

Models and Approaches

and limitations of individual approaches and the majority of

evaluation studies therefore draw on more than one approachor model. Nonetheless, these summaries are useful in providingan overview of the terminology, key features and issues whichneed to be considered when planning an evaluation study.

Approaches or models?

The literature tends to use these terms interchangeably. Theterm 'model' is often used to describe an approach which hasbeen developed by a particular person. Thus, for example,reference is made to 'the Tyler objectives model' or 'Stake'scountenance model'. (These, together with other models, aredescribed in more detail later in this section.) These models of

educational evaluation are characterized by a specific approachto evaluation design or to a particular set of circumstances tobe evaluated.

Two overviews

Lawton (1980, 1983) in the UK and Stake (1986) in the USAhave both attempted to pull together the diversity ofapproaches used in educational evaluation studies. Thestructure, emphasis and terminology of the overviews reflectthe different traditions and ways in which educationalevaluation has developed in each of these countries. (More

detail about the development of educational evaluation in theUSA and the U K may be found in Norris, 1990.)Lawton (1980, 1983) developed a taxonom y of six models of

educational evaluation:

1 The classical (or 'agricultural botany') research model2 The research and development (R and D) (or industrial

factory) model3 The illuminative (or anthropological) model4 The briefing decision-makers (or political) model

17


23/113


5 T he teacher-as-researcher (or professional) model

6 The case study (or portrayal) modelLawton indicates that the order in which these models arelisted roughly follows the order in which they were developed,although he acknowledges there are areas of overlap. Someof these models are associated with particular approaches. Forexample, the classical and research and development modelsare likely to adopt an experimental approach to evaluationsuch as is associated with the work of Tyler (1949), involvingcontrol groups and pre- and post-testing, whereas the illumi-native model uses the more descriptive approaches originallydeveloped by Parlett and Hamilton (1972, 1976) and oftentakes the form of a case study.

Stake (1986) identified nine approaches to evaluation. These

are:1 Student gain by testing - to measure student performance

and progress2 Institutional self-study by staff to review and increase staff

effectiveness3 Blue-ribbon panel to resolve crises and preserve the

institution4 Transactionobservation to provide understanding ofactivities

5 M anagem ent analysis - to increase rationality in day-to-daydecisions

6 Instructional research - to generate explanations and tacticsof instruction

7 Social policy analysis - to aid developm ent of institutionalpolicies

8 Goal-free evaluation - to assess the effects of a program me9 Adversary evaluation to resolve a two-option choice

Table 2.1 summarizes the key features of these approaches,as outlined by Stake (1986, where fuller description and

elaboration may be found). Stake also notes that the descrip-tions are over-simplifications and that there is overlap. Thetable is detailed (and it is not necessary to absorb all the detail),

18


24/113


but helpful in gaining a feel for some of the key ideas and

terminology associated with educational evaluation.Although the summaries of Lawton and Stake have differentstructure and terminology, bo th, it could be argued , have justtwo principal models (or paradigms) which could be describedas distinctly different: the classical research model and illumi-native evaluation. Oakley (2000), in discussing what she termsthe 'paradigm wars' in educational research and evaluation,

has produced a useful summary of the chief characteristicsof the two prevailing methodological paradigm s which may befound in Table 2.2. As with Stake 's overview, absorbing thedetail of this table is less necessary than getting a feel for the keyideas and terminology. As the next section will demonstrate,the classical research model of evaluation reflects many of thecharacteristics Oakley has associated with what she terms the'logical positivist/scientific' paradigm, whilst the characteristicsof the 'naturalist/interpretivist' paradigm are much closer tothose of illuminative evaluation. The next section considersthese two main models in more detail, together with brieferdescriptions of other approaches to evaluation.

The classical research model

The classical research model sees the evaluation of a pro-gramme as being similar to that of a standard scientific experi-ment involving the testing of a hypothesis. In its simplest form,an experiment involves testing a hypothesis by making a change

in the value of one variable (called the independent variable)and observing the effect of that change on ano ther variable (thedependent variable). In educational contexts, the hypothesisbeing tested is that a particular intervention, in the form of anew p rogramme, will result in a particular outcome. The modelinvolves four main steps:

1 two groups of students, one a control group and one anexperimental group, are tested on a particular part of theirprogramme;

19


25/113

Table 2.1 Stake's nine approach es to educational evaluation

Approach

Student gain bytesting

Institutional self-study by staff

Blue-ribbon panel

Transactionobservation

Purpose

To measurestudentperformance andprogress

To review andincrease staffeffectiveness

To resolve crisesand preserve theinstitution

To provideunderstandingof activities andvalues

Key elements

Goal statements;test score analysis;discrepancybetween goal andactuality

Committee work;standards set bystaff; discussion;professionalismPrestigious panel;the visit; review ofexisting da ta anddocumentsEducational issues;classroomobservation; rasestudies; pluralism

Some keyprotagonists

Ralph Ty er

Malcolm Parlettand DavidHamilton;Robert Stake

Risks

Over-simplifyeducational aims;ignore processes

Alienate somestaff; ignore valuesof outsiders

Postpone action;over-rely onintuition

Over-rely onsubjectiveperceptions:ignore causes

Payoffs

Emphasize,ascertain studentprogress

Increase staffawareness, sense ofresponsibility

Gather best insights,judgement

Produce broadpicture ofprogramme; seeconflict in values


26/113

Managementanalysis

Instructionalresearch

To increaserationality in day-to-day decisions

To generateexplanations andtactics ofinstruction

Social policy analysis To aiddevelopment ofinstitutionalpolicies

Goal-fr ee evalu ation To assess effectsof programm e

Adve rsary evaluation To resolve a two-option choice

Lists of options;estimates; feedbackloops; costs;efficiencyControlledconditions,multivariateanalysis; bases for

generalizationMeasures of socialconditions andadministrativeimplementationIgnore proponentclaims, followchecklistOpposingadvocates, cross-examination, thejury

Michael Scriven

Over-valueefficiency;undervalueimplicitsArtificialconditions; ignorethe humanistic

Neglect ofeducational issues,details

Over-valuedocuments andrecord keepingPersonalistic,superficial, time-bound

Feedback fordecision making

New principles ofteaching andmaterialsdevelopment

Social choices,constraints clarified

Data on effect withlittle cooption

Information onimpact is good;claims put to test

Adapted from Stake, 1986, 252-3.


27/113

Table 2.2 Oakley's summary of the two main paradigms of educational research and evaluation

'(logical) positivist'/'scientific'/'qu antitative'/ 'natur alist'/ 'inte rpretiv ist'/'qu alitativ e''positivism'

AimsPurposeApproachPreferred techniqueResearch strategyStance

Method

Implementation ofmethodValuesInstrumentResearcher's stanceRelationship ofresearcher andsubject

Testing hypotheses/generalizingVerificationTop-downQuantitativeStructuredReductionist/inferential/hypothetico-deduct-ive/outcome-oriented/exclusively rational/oriented to prediction and controlCounting/obtrusive and controlledmeasurement (surveys, experiments, casecontrol studies, statistical records, structuredobservations, content analysis)Decided a priori

Value-freePhysical device/pencil and paperOutsiderDistant/independent

Generating hypotheses/describingDiscoveryBottom-upQualitativeUnstructuredExpansionist/exploratory/inductive/process-oriented/rational and intuitive/oriented to understanding

Observing (participant observation, in-depth interviewing,action research, case studies, life history methods, focusgroups)

Decided in field setting

Value-boundThe researcherInsiderClose/interactive and inseparable


28/113


29/113


2 some form of educational 'treatment', such as a new

teaching technique, is applied to the experimental group;3 both groups are retested;4 the performance of the groups is compared to assess the

effects of the treatment.

Central to the classical research model is the notion that theaims of the programme can be translated into specific objec-tives, or intended learning outcomes, which can be measured.The model also places a premium on the reliability and validityof data collected.

W hat Lawton calls the research and development (R & D)model is a variant of the classical model and parallels evalu-ation with the industrial process of improving a productthrough testing. As in the classical model, specific, measurable

objectives are developed from the aims and tests devised toassess these. The tests are administered before and after thenew p rogram me is used in order to assess its effects. T he R & Dmodel does not always use control groups.

Ralph Tyler: the objectives model of evaluation

The classical research model is closely associated with thework of Ralph Tyler in the USA, and most accounts of thedevelopment of educational evaluation begin with his highlyinfluential work. Tyler was critical of what he saw as the veryunsystematic approach adopted in curriculum developmentin the USA in the 1940s. In 1949, he published his book TheBasic Principles of Curriculum and Instruction (Tyler, 1949), in which

he proposed four basic questions which, he argued, werecentral to curriculum planning and evaluation.

1 W hat educational purposes should the school seek to attain?2 What educational experiences can be provided that are

likely to attain these purposes?3 How can these educational experiences be effectively

organized?4 How can we determine whether these purposes are being

attained? (1949, 1)

4


30/113


These questions can be sum marized as a four-step sequence

objectives > content > organization > evaluation

with, as the first quotation in Box 1.1 says, educational evalu-ation then being

. . . the process of determining to what extent theeducational objectives are realized by the program of

curriculum and instruction. (1949, 105-6)

This model is often referred to as the 'objectives model' ofevaluation, and it has had a significant impact on educationalevaluation. Certainly, the apparen t simplicity of a model whichseeks to compare actual effects with declared goals has its attrac-tions. It also seems reasonable to suggest, as Tyler does, that anynew educational programme should have clearly stated objec-tives, and there should be general agreement over how these canbe recognized and measured for the purposes of evaluation.The objectives model underpinned the evaluation of muchcurriculum development in the USA in the 1960s and 1970s,when its influence was also felt in the UK and elsewhere.

From the policy-makers' perspective, the classical model ofevaluation appears to provide an answer to the question, 'whatworks?'

The classical model has been criticized for a number ofreasons. First, it does not take account of the complexity ofpeople's behaviour and the dynamics in teaching and learningsituations. The critical label of 'agricultural-botany approach'was applied to the classical model by Parlett and Hamilton(1972, 1976) because of its failure to address differencesbetween the behaviour of humans and plants. As Lawton(1980) put it: 'human beings perform differently when underobservation, cabbages do not' (112). Second, educationalsettings involve a number of variables, not all of which can be

easily controlled. Third, the model tends to focus on what canbe measured and easily quantified, and an evaluation whichfocuses solely on these aspects runs the risk of missing other

5


31/113


unplanned outcomes which may be of importance. A final

criticism is that the objectives model focuses on inputs andoutputs, and treats the classroom as a 'black box'. This meansthat the findings are of limited use because they onlydemonstrate what has happened, and do not explain why ithappened.

One outcome of these criticisms has been an explorationof variations on the experimental design which may be more

appropriate to educational settings (see, for example, Fitz-Gibbon and Morris, 1987). A second, more radical, outcomehas been the rejection of experimental approaches and theproposal of alternative ways of undertaking evaluation.

The illuminative evaluation model

Malcolm Parlett and David Hamilton: illuminative evaluation

The early 1970s saw the emergence of a new style of edu-cational evaluation, whose proponents were highly criticalof the classical model of evaluation. In the UK, in their veryinfluent ial paper Evaluation as illumination: A new approach tothe study of innovative programm es , M alcolm Par le t t andDavid Hamilton (Parlett and Hamilton, 1972, 1976) arguedstrongly against classical approaches to evaluation, saying thatthe notion of matching groups for experimental purposesis impossible in educational settings, first, because there areso many potentially important variables that would need to be

controlled, and second, because it is impossible to determinein advance what all the relevant variables might be in any par-ticular situation. Parlett and Ham ilton proposed an alternativeapproach which they termed 'illuminative evaluation', drawingon the methods of social anthropology to study innovations incontext and w ithout the need for parallel control groups. Suchan approach, they contend

. . . takes accoun t of the wider contexts in which edu-cational innovations function . . . its primary concern is

6


32/113


33/113


of educational evaluation which involves progressive focusing

through observation, further inquiry, and seeking explanations.The first phase involves relatively open-ended data collectionin order to identify issues, the second is a more focused phase inwhich these issues are explored in more detail, and the lastphase involves looking for patterns and explanations in thedata. They recommend drawing on four sources of evidence:observation of events to identify common incidents, recurring

trends and issues; interviews with participants to probe theirviews; questionnaire and test data where appropriate, anddocumentary and other background information to set theinnovation in context. Whilst they do not reject quantitativedata completely, they see it as less important and informativethan qualitative data. The outcome of the evaluation is adetailed case study of the programme in use.

Although illuminative evaluation has strong appeal, thereare several potential drawbacks to the approach. It raises anumber of methodological issues, particularly in relation to itsuse of case studies. A key issue concerns the influence thatthose conducting an evaluation have over the nature of thedata collected, and questions the reliability and validity of

the data and the extent to which both the data and the inter-pretation are 'objective' rather than reflecting the views of theevaluators. In order to minimize these concerns, case studiesmake use of triangulation (i.e. drawing on multiple datasources) in data collection and the stages in analysis are madetransparent through the use of data audit trails (i.e. summariesof all the steps taken in collecting and analysing data). A secondissue concerns the extent to which the findings of a case studycan be generalized. Those who m ake use of case studies also seeother aspects such as 'trustworthiness' (Lincoln and Gub a, 1985)and 'relatability' (Bassey, 1981) as of more central importancethan reliability, validity and generalizability. In other words, agood case study will be reported in such a way that the members

of a similar group will find it credible, be able to identify with theproblems and issues being reported, and draw on these to seeways of solving similar problems in their own situation.

28


34/113


Illuminative evaluation has also been subjected to strong

criticism. For example, Delamont (1978) and Atkinson andDelamont (1993) suggest that illuminative evaluation is aslimited as classical evaluation in that

Without an adequately formulated body of theory ormethods, the illuminators have been, and will be, unable toprogress and generate a coherent, cumulative researchtradition. They cannot transcend the short-term practi-calities of any given programme of curriculum innovation.They merely substitute one variety of atheoretical findings - based mainly on observation and interviewfor another - based mainly on test scores.

(1993,218)

Despite the potential drawbacks and criticism, many have seenthe flexibility of the illuminative evaluation approach as beingparticularly relevant to educational contexts and, as such, itbecame increasingly popular in the 1970s and 1980s.

Other models of evaluation

The classical research model and illuminative evaluationillustrate two very contrasting approaches to evaluation. How-ever, other approaches have also been advocated, a number ofwhich combine aspects of both approaches.

Robert Stake: the countenance model

Robert Stake s model of evaluation (Stake, 1967) emerged froma concern over the narrowness and limitations of the classicalmethod, particularly as it was being used in the USA. Thoughnot as critical of the classical approach as Parlett and Hamilton,Stake felt that the classical method moved too quickly to

detailed measurements at the expense of taking in the broadercontext of the situation being evaluated. He therefore arguedfor a model of evaluation which broadened the field of data

29


35/113


36/113


evaluation. Rather, a checklist is used to rate aspects of the pro-

gramme in terms of, for example, the need for its introduction,its potential market and its cost-effectiveness. Scriven's modelappears to have been less influential than that of illuminativeevaluation or Stake's portrayals. Scriven was, however, the firstto use the terms summative and formative to distinguish betweenthe evaluation carried out at the end of a programme and thatcarried out during the programme (Scriven, 1967).

The teacher-as-researcher model

A different, but very important, model of educational evalu-ation is that of the teacher-as-researcher which has its originsin the work of Laurence Stenhouse in the UK. As mentionedin an earlier section, Stenhouse (1975) sees evaluation as a keyelement of curriculum development, with the two merging asresearch. Stenhouse also sees teachers as having a crucial roleto play in evaluation

. . . all well-founded curriculum research and development

. . . is based in the study of classrooms. It thus rests on thework of teachers. It is not enough that teachers' workshould be studied: they need to study it themselves. Mytheme . . . is the role of the teacher as a researcher . . .

(1975, 143)

In proposing his teacher-as-researcher model, Stenhouseargues for the use of social anthropological approaches toevaluative research undertaken by teachers, drawing onobservation and interpretation of lessons. The approachesStenhouse advocates in his model therefore overlap con-siderably with those of illuminative evaluation, though with theteacher fulfilling the dual role of teacher and researcher/evaluator. For any teacher engaging in this task, there are issueswhich need to be addressed concerning potential bias and

subjectivity in data which is being gathered and interpreted bysomeone who is a participant-observer. Nonetheless, thetcachcr-as-researcher (or practitioner researcher) model gained

31


37/113


considerable ground in the 1980s and 1990s, with studies often

taking the form of case studies and/or action research (i.e.research aimed at improving aspects of practice).

Evaluation and the intended, implemented andachieved curriculum

Another approach to evaluation has been to see a new pro-

gramme as consisting of three main aspects: the intended curric-ulum, t h e implemented curriculum a n d t h e achieved curriculum

(Robitaille et al., 1993). This model has its origins in thatdeveloped for the International Studies of EducationalAchievement (IEA studies), undertaken over a number of yearsin the 1970s and 1980s. The intended curriculum refers to theaims and objectives of the programme as specified by thosedeveloping the programme and the materials to support itsintroduction and use. The implemented curriculum concernswhat happens in practice in the classroom, and the teachingapproaches, learning activities and materials teachers draw onwhen using the programme. The implemented curriculumis very likely to differ both from the intended curriculum and

from teacher to teacher, as it depends on how teachers respondto the new programme and how they interpret and chooseto use its associated materials. One aspect of an evaluation istherefore to explore the extent to which the implementedcurriculum matches the intended curriculum. The third aspect,the attained curriculum, relates to the outcomes of the pro-gramme: the knowledge, skills, understanding and attitudesdisplayed by the students who experience the programme. Asecond stage of evaluation is therefore to look at these aspectsand relate them both to the intended and the implementedcurriculum.

3


38/113


Ways of characterizing research and

evaluation questions

As the preceding discussion has dem onstrated, there are manyways in which evaluation may be characterized. One furtherperspective comes from work done on the sorts of questionsevaluations set out to answer. Miles and Huberman (1994)provide a useful typology, categorizing evaluation questions

according to whether they are causal (i.e. looking for linksbetween cause and effect) or non-causal (i.e. seeking to gatherinformation), and related to policy an d /o r managem ent. Table2.3 summarizes their categories of questions, with examplesof each type.

The notion of causal and non-causal questions is alsocentral to two other potentially useful classifications which haveemerged in the early 2000s: that of Shavelson and Towne inthe USA, and the EPPI-Centre in the UK.

Shavelson and Towne (2001) divide research studies intothree broad groups. These are:

description: seeking answers to questions about what is

happening; cause: seeking answers to questions about whether effects aresystematic]

process or mechanism: seeking answers to questions aboutwhy or how effects are happening.

The Evidence for Policy and Practice Information and Co-

ordinating Centre (EPPI Centre) is overseeing systematicreviews of research studies in a number of areas in education,and has proposed the following classification for studies (EPPI-Centre, 2002).

A DescriptionB Exploration of relationships

C Evaluation (both of naturally occurring events and thosewhich are researcher-manipulated, i.e. where the researcherintroduces and evaluates a change)

33


39/113


Table 2.3 Typology of evaluation questions

Causal-research

Non-causalresearch

Does X cause Y?

Does X cause more ofY than Z causes of Y?

W hat is X?

Is X located where Y islowest?

Non-causal policy W hat does Y mean?

Why does S supportX?

Non-causal

evaluation

Non-causalmanagement

What makes W good?

Does T value X?

Is X more cost-effective than Z?

How are U maximizedand V minimizedsimultaneously?

Do children read better as aresult of this program me?Do children read better as aresult of this p rogrammecompared with anotherprogramme?What is the dailyexperience of the childrenparticipating in thisprogramme?Are the remedial centreslocated in the areas ofprimary need?

W hat do we mean byspecial education children,and remediation;'Is this prog ramme receivingsupport from state and localofficials for political ratherthan educational reasons?What are the characteristics

of the best ComputerAssisted Instruction (CAI)materials being used?How do the variousminority groups view thisprogramme and judge itsquality?What is the cost-effectiveness of theprogramme compared withother programmes?How can we maximize thescheduling of classes at thecentre with the minimumof expense?

Adapted from Miles and Hubcrman, 1994, 24.

34

Type of question Gene ral forms Sample question


40/113


D Discussion of m ethodology

E ReviewsCategories B and C are of particular relevance to evaluationstudies.

What is of interest in the context of educational evaluationis that both these classifications have emerged from discussionand debate about the nature of educational research and

the appropriateness of experimental approaches in researchand evaluation. Although the classifications cover a range ofdifferent types of study, the groups who have produced themalso appear to be intimating that educational research andevaluation would benefit from including more studies of anexperimental nature.

The politics of educational evaluation

Lawton's inclusion of a political m odel in his sum mary (whichcorresponds to Stake's management and social policy analysisapproaches) points to an important aspect of educational

evaluation: its political dimension. M acD onald (1976)summarizes the situation for evaluators as follows:

Evaluators rarely see themselves as political figures, yettheir work can be regarded as inherently political, and itsvarying styles and methods expressing different attitudesto the power distribution in education.

(1976, 125)MacDonald argues that evaluators have a responsibility whichgoes beyond making judgem ents and passing these on to deci-sion-makers. Th ey also need to ensure that the information theyprovide enables a more rational choice to be made and, in pro-viding information, they need be aware that decision-makers

will bring their own values to bea r when making choices.Although MacDonald's paper was written a number ofyears ago, the issues it raises about the political dimension

35


41/113

Evaluation M ethods in Research

of educational evaluation are even more pertinent in the cur-

ren t climate, which is characterized by much more prescriptionand centralized control of the curriculum, and a drive to raisestandards. The politicization of educational evaluation has alsobeen exacerbated by the moves towards what Hopkins (1989)refers to as 'categorical funding' of educational initiatives.Here, a cen tral policy is developed, and funds made available toattract those who have the means to develop the resources

needed to implement the policy. Those accepting the fundswould appear to be, to a very large extent, also accepting thepolicy itself. As evaluation da ta will always be influenced by thevalues of those interpreting the results, any evaluation under-taken within the context of categorically funded initiatives isbound to have a strong political dimension. Moreover, changesin the structure of school education in the last decade haveserved only to increase this politicization of educational evalu-ation. (More detailed discussions of the political dimension ofeducational evaluation may be found in M acD onald (1976) andNorris(1990).)

Recent trends and developments ineducational evaluation

In the 1970s and 1980s, the approaches advocated in theilluminative evaluation model influenced much educationalevaluation. However, by the late 1980s and early 1990s, itwas becoming apparent that the tide was beginning to turn,and people were looking once again at the possibilities offeredby more experimental approaches to evaluation. A number offactors contributed to the raising of the profile of experimentalapproaches at this time. Many countries were experiencingincreasingly centralized control of education in a climate ofgrowing accountability and a drive to raise standards. Co nce rn

was also being expressed about the lack of impact of thefindings of educational research and evaluation.In the UK, a fierce debate was launched when David

36


42/113


Hargreaves, of the University of Cambridge, gave the annual

Teacher Training Agency (TTA) lecture in 1996 (Hargreaves,1996). He argued that schools would be more effective ifteaching became a research-based profession, and blamedresearchers for supposedly failing to make this happen.Hargreaves also accused researchers of producing 'inconclu-sive and contestable findings of little worth ', and went on to say

. . . jus t how much research is there which (i) dem onstratesconclusively that if teachers change their practice from x toy there will be a significant and enduring improvement inteaching and learning, and (ii) has developed an effectivemethod of convincing teachers of the benefits of, andmeans to, changing from x toy?

(1996,5)

Hargreaves was not alone in his criticism, and many of thepoints he made were echoed in other documents, such as thereports of two subsequent inquiries into educational research(Tooley an d Darbey 1998; Hillage et aL, 1998). Underpinningall these critiques was the notion that much educationalresearch is 'unscientific', because it fails to draw on the experi-

mental techniques of the natural sciences. As such, it also failsto 'deliver the goods' in terms of making recom mendations forpractice which can be implem ented with confidence.

In his lecture, Hargreaves encouraged the educationalresearch community to look to medical research and theapproaches adopted in evidence-based medicine as a goodmodel for research whose procedures supposedly allowed def-inite conclusions to be reached about what works. In evidence-based medicine, controlled trials of particular treatmentsare undertaken to establish if they work. Hargreaves arguedthat educational research should follow a similar approach evidence-based education.

37


43/113


Randomized controlled trials

A key technique in evidence-based medicine is that of therandomized controlled trial (RGT). Oakley (2000) describesRCTs as follows:

An RCT is simply an experiment ('trial') which testsalternative ways of handling a situation. Sometimes theintervention is tested against what would have happenedhad it not been used; sometimes different interventions arecompared.

(2000, 18)

A key aspect of an RCT is that, although members are allo-cated randomly to groups, the groups being compared are assimilar in composition as possible. Thus, in medical trials of aparticu lar trea tment, groups might contain similar distributionsof people in terms of, for example, age, sex and social class. Inorder to achieve similarity in group composition, and to ensurefindings are reliable and valid, RCTs require large samplegroups.

In the late 1990s and early 2000s, a number of people are

advocating the use of RCTs in educational research andevaluation (for example Boruch, 1997; Fitz-Gibbon, 2000;Torgerson and Torgerson, 2001). The technique is seen as away of enabling claims about cause and effect to be made withmore confidence than has formerly been the case in educationalresearch and evaluation. Although RCTs purport to yield con-clusive results abou t what does and does not work, they need tobe treated with caution in educational research and evaluation.As M illar (2002) has po inted out, there are some key differencesbetween medical 'treatments' and educational 'treatments' interms of what they set out to do. A medical treatment is nor-mally undertaken to restore a desired state of normality; aneducational pro gramm e is usually developed in o rder to achieve

different outcomes to curren t prog rammes. Thu s it is not pos-sible to make direct comparisons between the programmes inthe way that an R C T seeks to do.

38


44/113


Interest in RCTs is not confined to the UK. Their potentialutility has been debated extensively in the USA, where thereis a stronger tradition of experimental research. One outcomeof the debate was that the National Research Council in theUSA set up a panel to consider and advise on approaches toeducational research and evaluation. The subsequent report, Scientific Enquiry in Education (Shavelson and Towne, 2001),considers the merits of a number of approaches to research

and evaluation, as summarized earlier in this section. Theyconclude that RGTs may have a useful role in some situations,but other approaches which generate more descriptive andexplanatory findings also have an important part to play ineducational research and evaluation.

Much of the current debate about the potential utility ofRCTs is reminiscent of that which took place in the 1970s onthe limitations of the classical research model in educationalevaluation. A key question with RCTs concerns when such atechnique is appropriate, and it will be interesting to see in thenext few years the extent of the impact of RCTs on the evalu-ation of new educational programmes.

Design experiments

Another comparatively recent development in educationalevaluation is the design experiment. The term has its origins in the

work of Ann Brown (Brown, 1992) and Allan Collins (Collins,1993) in the USA. Design experiments draw on the evaluationapproaches used in technology and engineering, which aim toexplore how a product, developed to solve a particular prob-lem, performs in selected situations. This has clear parallelsin educational contexts, where the product being tested is anew programme, developed with the intention of addressingselected problems or shortcomings within the existing system.

A design experiment in educational contexts involves evalu-

ating the effects of a new programme in a limited numberof settings. For example, this might involve selecting teacherswho teach roughly comparable groups, but who have different

39


45/113


teaching styles, and exploring the effects of the new pro-

gramme on each group of students. The design experimentwould then yield information on the circumstances in whichthe programme is likely to be most successful. Design experi-ments see the context in which the prog ram me is introduced asan important factor likely to influence its success, and alsoacknowledge that those implementing the programme arehighly likely to make modifications in order to tailor it to their

own particular situations. Thus, there may be considerablevariation in what happens in practice from one context toanother.

Design experiments have features in common with bothilluminative evaluation and the classical research approach toevaluation. They resemble the former in seeking to describeand explain what h appens in selected settings. However, withinthis they also seek to test out pa rticular hypotheses, and as suchincorporate a dimension of the classical approach.

As with RCTs, the impact of design experiments in edu-cational evaluation is yet to be seen. Where they seem tohave particular advantages is in their ability to encompassthe complexity of educational settings and interactions, whilst

also enabling the aims of new educational programmes to betested systematically. As such, they ap pear to offer a potentiallyfruitful approach to evaluation which is sympathetic to thenature and purpose of many new educational program mes.

Summary

This section has provided an overview of different perspectiveson the models and approaches associated with educationalevaluation. In particular, it has shown that

several different m odels of evaluation have been developed,some of which bring very different perspectives to bear on

the process of evaluation; the two most contrasting models are provided by theclassical research model and illuminative evaluation;

4


46/113


bo th the classical research model and illuminative evaluation

have their adherents and critics; evaluation questions can be characterized in a num ber ofways, with some simply wanting to know 'what works?',and others w anting to know 'how is it working?' or 'why is itworking in this particular way?';

there is a political dimension to educational evaluation; recent moves to encourage the educational research com-

munity to look to the approaches adopted in medicalresearch have resulted in an increased interest in the classicalresearch model in the form of Randomized ControlledTrials (RGTs).

41


47/113

Curriculum Innovation andModels of Change

This section looks at models of change which have beendeveloped from examining the effects of curriculum innovationand the implications of such models for evaluation.

Introduction

In the previous section, a number of models and approachesto educational evaluation were discussed, which point to issuesneeding to be taken into consideration when planning and

designing an evaluation. These models are com plemented byothers which have been developed to describe the effects ofprogramme implementation. The two models described in thissection are the Concerns-Based Adoption Model (CBAM ) developedin the USA, and the typology of continuing professional development(CPD) outcomes developed in the UK. These two models havebeen selected for a number of reasons. Both are empirically-

based and have been developed from detailed studies of theeffects of introducing new programm es in a range of differentcontexts. Both have also sought to identify the factorswhich seem to con tribute to the success - or otherwise - of aninnovation. As such, they have a direct bearing on educationalevaluation because they help provide answers to questionsabout how and why a new programm e is - or is not working.The models differ in that the GBAM model places its emphasison the process of change which accompanies the introduction

4

3


48/113

Curriculum Innovation and Models of Change

of a new programme, whereas the typology of CPD outcomes,

as its name suggests, focuses on the effects or outcomes ofin-service work provided to support the implementation of anew programme.

The models draw on the work of a number of others whohave undertaken detailed studies of curriculum innovation andits effects, most notably that of Michael Fullan in Canadaand Bruce Joyce and Beverley Showers in the United States.

Their work is summarized briefly here in order to providecontext for the two models described in this section.

One of Fullan's key publications is The M eaning of EducationalChange (Fullan, 1982, third edition, 2001). Fullan argues thatthere are three dimensions at stake in implementing a newprogramme:

the possible use of new or revised materials; the possible use of new teaching approaches; the possible alteration of beliefs (2001, p39).

He suggests that change is composed of four phases: initiation,implementation, continuation and outcome. In educationalcontexts, initiation involves the processes leading up to the

decision to adopt a new programme, implementation involvesthe first experiences of using the new programme, continuationrefers to the time when the programme is either integrated intothe system or discarded, and outcome is the degree of improve-ment in, for example, students' learning and attitudes, teachersatisfaction, and overall school improvement. Fullan arguesthat the lack of success of many innovations can be attributed

to the failure of policy-makers, curriculum developers andthose implementing the innovation to understand the processof change.

Bruce Joyce and Beverley Showers have focused on staffdevelopment as a key element of successful change. In theirearlier work, they suggest that there are four categories of levelsof impact in in-service training:

awareness; the acquisition of concepts or organized knowledge;

43


49/113


the learnin g of principles and skills;

the ability to apply those principles and skills in the class-room (1980, 380).

Their more recent work (Joyce and Showers, 1995) has identi-fied a number of key components which are necessary foreffective in-service training. These include:

describing new skills to teachers through, for exam ple, talksand lectures;

dem onstrating new skills and techniques to teachers; providing opportunities for teachers to develop and practice

these skills and techniques in simulated and real settings; giving teachers feedback on performance; coaching teachers on the job .

Joyce and Showers place particular emphasis on this lastaspect - which they term peer coaching - as a central elementof effective in-service training.

The Concerns-Based Adoption Model (CBAM )

The Concerns-Based Adoption Model (CBAM) was developedover a number of years by a team at the University of Texasat Austin in the USA. Th e team was concerned that many newprogrammes introduced into schools appeared to meet withlittle success and were often discarded, a situation which iscertainly not un ique to the U SA. Th is led to a detailed study of

the process of change in schools and classrooms covering theintroduction of a range of different programmes. The initialwork was undertaken by Hall et ai, (1973), and is described indetai l in Shirley Hord 's book, Evaluating Educational Innovation(Hord, 1987), a book written with classroom practitioners as itsprincipal audience. Hord summarizes the CBAM model asfollows:

The Concerns-Based Adoption Model is an empirically-based conceptual framework which outlines the develop-

44


50/113


mental process that individuals experience as they

implement an innovation. (1987,93)

She goes on to suggest that there are three general questionsabout an innovation that the model can help to answer:

W hat would I like to see happen with the innovation? How can I make that happen? How is it going?

T he model is based on seven basic assumptions abou t change:

1 Change is a process, not an event.2 Change is made by individuals first.3 Ch ange is a highly personal experience.

4 Chang e entails multi-level developmental growth (i.e. it willinvolve shifts in feelings, skills and behaviours).

5 Change is best understood in operational term s (i.e. teacherswho have to implement the change need to be able to relateit readily to what it means for their classroom practice).

6 Change facilitation must suit individual needs (i.e. it mustaddress the concerns and problems of those implementingthe change).

7 Change efforts should focus on individuals, not innovations(i.e. the innovation needs to be seen as extending beyondthe materials it produces to the role of the individuals whowill use the materials).

The CBAM model has four components. The first relatesto how teachers feel about an innovation, the second to howthey use it, the third to what the innovation means in practiceas a result of it being used, and the fourth to implementingstrategies to aid the change process.

A cornerstone of the CBAM model is a seven-level descrip-tion of Stages of Concern, which identifies teachers' feelings in

relation to an innovation. This is summarized in Table 3.1.The lowest level of concern in the CBAM model is Stage 0,where an individual has little or no awareness of an innovation

45


51/113


52/113


Hord suggests that gathering information during an innov-

ation is essential. This can be done by asking individuals tocomplete open-ended statements, or by conducting interviews.(The CBAM team has also developed a copyright-protectedwrit ten quest ionnaire, the Stages of Concern Questionnaire1 (Hallet ai, 1973), which enables identification of the stage a teacherhas reached. Th e questionnaire and accom panying man ual areavailable for purchase.)

Although teachers' feelings about an innovation are highlylikely to influence its effects, what matters in practice is whatteachers actually do - the behaviours and skills they demon-strate in relation to an innovation. The CBAM model thereforesupplements the seven stages of concern with descriptions ofeight levels of use. These are summarized in Table 3.2.

Levels 0, I and II are associated with non-users, althoughLevels I and II indicate some involvement, first throughactivities such as attending workshops and discussions withusers and then through gathering together the resourcesneeded to implement the innovation. Levels III onwarddescribe behaviours of users. At Level III, the individual ispreoccupied with logistical concerns such as getting organized

and preparing materials. Hord's work suggests that manyindividuals remain at this stage for a long time, and may neverget beyond it without training and support. Level IVA corre-sponds to a 'breathing space', where the immediate stressesand strains associated with implementation have passed. AtLevel IVB and higher, the user moves beyond basic survivaland routine use to behaviours focusing directly on improvingthe student experience.

Hord suggest that interviews are the most appropriateway of gathering data on levels of use. However, these wouldappear to run the risk of merely gathering information onreported behaviour, rather than actual behaviour. Supple-menting interview data with observation would seem to be

essential in gathering valid data on levels of use.The final strand of the CBAM model focuses on the inno-

vation itself, and what is actually happening when it is being

47


53/113


Table 3.2 The Levels of Use in the Concerns-Based Adoption Model(GBAM)

Level of use Behav ioural indices of Level

VI Renew al The user is seeking more effective alternatives to theestablished use of the innovation

V Integration The user is making deliberate efforts to coordinatewith others in using the innovation

IVB Refinement Th e user is making changes to increase outcomesIVA Routine T he user is mak ing few or no changes and has an

established pattern of useIII M echanical use T he user is making changes to organize better use

of the innovationII Preparation I would like to know more about itI Orien tation The individual is seeking information about the

innovation

0 Non-use No action is being taken with respect to theinnovation

Taken from Hord, 1987, 111.

used. As Hord comments, no two teachers will use an inno-vation in exactly the same way, bu t will integ rate it in some way

with their existing practice. Therefore, judgements about theeffectiveness of an innovation need to be set in the context ofwhat the innovation means in practice. The CBAM modelterm s this the Innovation Configuration. In order to find out what ishappening, Hord recommends interviewing key people associ-ated with an innovation. These are the developers, and thoseshe terms the change acilitators: the people providing the trainingand support for the innovation, who may or may not be thedevelopers. She suggests three key questions should be asked:

W hat would you hope to observe when the innovation isoperational?

W hat would teachers and others be doing? W hat are the critical com ponents of the innovation?

This is then followed by interviews with a small number ofusers, and observation of their lessons to check on the user's

48


54/113


55/113


training sheds light on factors which m ay contribute to the level

of the program me's subsequent success.Harland and Kinder felt that research into the implementa-tion of new programmes had emphasized the process atthe expense of looking at the outcomes, although, as theyacknowledge

Ultimately, of course, any comprehensive theory ofINSET must take account of both an empirically-validated model of outcomes and its relationship to theprocesses associated with the many different forms of CPDprovision and activity.

(1997,72)

The development of their model resulted from work on astudy of a staff development programme for the introductionof science into primary schools (Kinder and Harland, 1991)following the introduction of the National Curriculum inEngland and Wales (DES/WO, 1989), which made science acompulsory subject in primary schools. The data on whichthe model is based were gathered in five case-study schoolsover a period of four years, through detailed observation and

interviews with key personnel.Harland and Kinder proposed a typology of nine INSET

outcom es, which they suggest, show how existing models of in-service training, such as those of Fullan and Joyce and Showers,could be enhanced. T he ir typology is summarized in Table 3.3.

Harland and Kinder's research suggests that material and pro-visionary outcomes, or the physical resources made available toteachers, can have a very positive effect on practice. However,their work also indicates that these alone are unlikely to havemuch effect, unless accompanied by motivational outcomes and newknowledge and skills (see below). They also contrast their modelwith Fullan's model of change (Fullan, 1991). The latter modelhas initiation as the first phase, involving acquisition and use of

new materials. Harland and Kinder suggest that it is usefulto separate these two aspects, as it is possible for teachersto acquire new materials but not use them, and changes in

5


56/113


Table 3.3 A typology of in-service training (INSET) outcomes

Outcome Definition

1 Material and provisionaryoutcomes

2 Informational outcomes

3 New awareness

4 Value congruence outcomes

5 Affective outcomes

6 Motivational and attitudinaloutcomes

7 Knowledge and skills

8 Institutional outcomes

9 Impact on practice

The physical resources which result fromparticipation in INSET activitiesThe state of being briefed or cognisant ofbackground facts and news aboutcurriculum management developments,including their implications for practicePerceptual or conceptual shift fromprevious assumptions of what constitutesappropriate content and delivery of aparticular curriculum areaThe personalized versions of curriculumand classroom management which informa practitioner's teaching, and how far these

'individual codes of prac tice' come tocoincide with INSET messages about'good practice'The emotional experience inherent in anylearning situationEnhanced enthusiasm and motivation toimplement the ideas received duringINSET experiences

Deeper levels of understanding criticalreflexivity and theoretical outcomes, withregard to both curriculum content and theteaching/learning processCollective impact on groups of teachersand their practiceThe ultimate intention to bring aboutchanges in practice

Adapted from Kinder et at, 1991, 57 8.

practice can be severely impeded if teachers do not have thenecessary resources to support the changes.

Harland and Kinder have also identified two linked sets of

outcomes : informational outcomes a n d new knowledge and skills.The former simply refers to teachers being briefed about thebackground and facts relating to the innovation, including

51


57/113


58/113


presence of certain outcomes was more likely to achieve

developments in practice than others. Thus, they proposed ahierarchy of INS ET outcomes (see Table 3.4).Harland and Kinder suggest that in-service experiences

which offer - or are perceived to offer only third order out-comes, i.e. those which raise awareness and provide materialsand information, are unlikely to have an impact on practiceunless some of the other outcomes are already present. The

second order outcomes, including motivational and affectiveoutcomes, were important in con tributing to success, but sub-stantial impact on practice was consistently associated with thepresence of the two first order outcomes: value congruence,and new knowledge and skills. In reaching this conclusion,Harland and Kinder also note that other outcomes are likelyto be present if these two first order outcomes are present, andthat successful implementation requires all the outcomes, asprioritized in the hierarchy, to be either achieved through thein-service provision or present as pre-existing conditions.

The Harland and Kinder typology of CPD outcomes sharesthe same advantages of the CBAM model in that it 'rings true'to anyone who has been involved with in-service training. As

with the CBAM model, it also lends itself to adap tation for avariety of different circumstances.

Table 3.4 A hierarchy of in-service training (INSET) outcomes

INSET inpu t

3rd order Provisionary Information New awareness

2nd order Motivation Affective Institutional

st order Value congruence Know ledge and skillsImpact on practice

Taken from Harland and Kinder, 1997, 77.

53


59/113


Summary

This section has provided an overview of two different modelsof change, one focusing on the process of change and the otheron the outcomes of in-service provision aimed at supportingchange. These models have suggested that

change is a com plex process which involves several stages; support is needed for those implemen ting a new programm e

to help them move from one stage to ano ther; certain factors are crucial to the success of a new pro gramm e,

such as the extent to which the aims of the programme fitwith teachers' views of what constitutes good prac tice;

studies of the process of change point to areas whichshould be explored in an evaluation in order to answer

questions about how and why a new p rogramme is workingin particular ways.

54


60/113


61/113


Table 4.1 Characteristics of experiments and case studies

Purpose

Advantages

Disadvantages

Other points

Experiment

To test a hypothesis

Can provide strongevidence linking particularfactors to particularoutcomes Matching of control

and experimentalgroups can be difficult

Focuses on outcom esnot process, thereforedoes not yieldinformation or answerhow or why particularoutcomes arise

Normally associatedwith summativeevaluation andquantitative data

Requires sample sizes

of at least 30 forrandom allocationinto control andexperimental groups

Case study

To examine an educationalpractice in a specificinstanceCan reveals subtleties andintricacies of situations andexplanations for outcomes

The extent to which resultsare generalizable

Often associated withformative evaluation andqualitative data, thoughmay also make use ofsome quantitative data

Often draws on morethan one researchtechnique

A range of research techniques for data collection is avail-able to educational evaluators. The five most commonly usedare document study, focus groups, interviews, observation,and questionnaires, although a number of other techniquescan also be employed. Table 4.2 summarizes the key character-istics of these techniques.

Decisions about research strategies and techniques areclosely linked to the overall purpose of an evaluation and

whether it will have a summative or formative emphasis.These decisions usually carry with them implications for thestrategy or strategies to be followed, which, in turn, point to

56


62/113


63/113


64/113

Obs e rva t ion

Q u e s t i o n n a i r e

Th e context of a setting Behaviours and actions of

participants, including verbaland non-verbal interactions

Wh at is actually happeningwhen a new programme isintroduced

Teachers'view s of a

programme Teachers' reported behavioursin relation to a programme(which can be verified fromobservation data)

Students' views on particularaspects of their experience

Provides a pictur e of the context in which a programmeis being implem ented

Can yield information onunexpected outcomes or aspects*of which participants areunaware

An efficient use of time for both

evaluator and respondent Standardization of questions Th e possibility of respondent

anonymity, which m ay lead tomore candid and honestresponses

Da ta analysis is norm allystraightforward and not overlytime-consuming

Time requirements forgathering and analysing dataThe impact on the participantsof having an observer presentElements of observer bias inselecting the data to berecordedDifficult to explore issues in

depthRespondents can only answerthe questions they are asked,therefore unanticipated issueswill not emerge'Questionnaire overload' -many people receive a lot ofquestionnaires and maytherefore be inclined to answerthem quickly and superficially,if at all


65/113


evaluation plan should unanticipated outcomes worthy of

further exploration be encountered; they generate multiple sources of data which provide checkson the validity and trustworthiness of the findings.

Summary

This section has outlined the main types of data collected inevaluation studies, and the principal strategies and techniquesemployed. In particular, it has shown that:

evaluation studies can collect quantitative and qualitativedata;

evaluation studies in education are normally associated

with experimental research strategies or case studies, each ofwhich has its advantages and disadvantages;

a variety of techniques is employed to gather evaluationdata, including document studies, questionnaires, observa-tion, interviews and focus groups;

effective and informative educational evaluation is likely toinvolve a range of techniques and to gather a variety of data,i.e. it uses a multi-method approach.

6


66/113

5

Planning and Doingan Evaluation

This section explores ways in which the ideas covered above canbe applied to the planning and undertaking of an evaluation.

Key questions when planning and undertakingan evaluation

A number of questions need to be addressed when planning andundertaking an evaluation. T hese are summ arized in Box 5.1.A first po int to make about these questions is that they point to

a number of theoretical and practical issues which need to be

B ox 5.1 Key questions w hen plann ing an evaluation

What is being evaluated? W hat form will the evaluation take? W hat practical issues (time, money, staff skills, timescale) need

to be taken into account? W hat questions will the evaluation address? What use will be made of the findings? What types of data will be collected to help answer the

evaluation questions? W hat techniques will be used to gather the data? Who will gathe r the data?

W hat ethical considerations need to be

addressed? How will the evaluation be reported?

61


67/113


68/113

Planning and Doing an Evaluation

B ox 5.2 An evaluation of a support program me forunderachieving students

You are a teacher and mem ber of the senior managem ent teamin a high school which hopes to improve its overall perform ancein national tests and examinations. Current assessment data hasrevealed that the picture within the school reflects the nationalpicture, which is one where girls are out-perform ing boys in tests

and exam inations in most subjects and at most levels. As a result,your school is working in collaboration with three other localschools to introduce a programme aimed at providing supportfor students aged 11 and 12. The particular target group is thatof underachieving boys. This programme takes the form ofmonthly reviews in which underachieving students are offeredadvice and support in planning their work, and targets are setand reviewed. Progress is recorded in a 'Progress diary' whichstudents are expected to show to their parents. In addition, adatabase of information on all students has been set up to recordtest and examination marks across all subjects.

The schools involved in the programme want to gather somesystematic data on its effects, and you have been asked toproduce a report evaluating the program me.

B ox 5.3 Tem plate for produc ing a description of theprogram me to be evaluated

Th e programm e is addressing the area o f .. .

The aims of the program m e are . . . Th e ideas or theories underpinning the program me are . . . Th e programm e takes the form o f .. . The target groups for the program me are . . . The key groups of people (stakeholders) involved in the

program me arc . . . The support being provided during the prog ram me takes the

form of. . . O the r aspects worth noting arc . . .

63


69/113


and Box 5.4 illustrates how this might be completed for the

example evaluation.It should be noted that this template (and the othersincluded in Section 5) are provided as a general framework forstructuring thinking about aspects of an evaluation. They arenot intended as a checklist to be followed mechanically, and willneed to be tailored to individual circumstances.

What form will the evaluation take?

An evaluation is often characterized by its main purpos

Date post:	02-Jun-2018
Category:	Documents
Upload:	irma-ain-mohdsom
View:	218 times
Download:	0 times

Evaluation Methods in Research

Documents