Developing standards to evaluate vocational education and ...

Developing standards to evaluatevocational education and training programmes

Wolfgang Beywl; Sandra Speer

In:

Descy, P.; Tessaring, M. (eds)

The foundations of evaluation and impact researchThird report on vocational training research in Europe: background report.

Luxembourg: Office for Official Publications of the European Communities, 2004(Cedefop Reference series, 58)

Reproduction is authorised provided the source is acknowledged

Additional information on Cedefop’s research reports can be found on:http://www.trainingvillage.gr/etv/Projects_Networks/ResearchLab/

For your information:

• the background report to the third report on vocational training research in Europe contains originalcontributions from researchers. They are regrouped in three volumes published separately in English only.A list of contents is on the next page.

• A synthesis report based on these contributions and with additional research findings is being published inEnglish, French and German.

Bibliographical reference of the English version:Descy, P.; Tessaring, M. Evaluation and impact of education and training: the value of learning. Thirdreport on vocational training research in Europe: synthesis report. Luxembourg: Office for OfficialPublications of the European Communities (Cedefop Reference series)

• In addition, an executive summary in all EU languages will be available.

The background and synthesis reports will be available from national EU sales offices or from Cedefop.

For further information contact:

Cedefop, PO Box 22427, GR-55102 ThessalonikiTel.: (30)2310 490 111Fax: (30)2310 490 102E-mail: [email protected]: www.cedefop.eu.intInteractive website: www.trainingvillage.gr

http://www.trainingvillage.gr/etv/Projects_Networks/ResearchLab/

www.cedefop.eu.int

www.trainingvillage.gr

Contributions to the background report of the third research report

Impact of education and training

Preface

The impact of human capital on economic growth: areviewRob A. Wilson, Geoff Briscoe

Empirical analysis of human capital development andeconomic growth in European regionsHiro Izushi, Robert Huggins

Non-material benefits of education, training and skillsat a macro levelAndy Green, John Preston, Lars-Erik Malmberg

Macroeconometric evaluation of active labour-marketpolicy – a case study for GermanyReinhard Hujer, Marco Caliendo, Christopher Zeiss

Active policies and measures: impact on integrationand reintegration in the labour market and social lifeKenneth Walsh and David J. Parsons

The impact of human capital and human capitalinvestments on company performance Evidence fromliterature and European survey resultsBo Hansson, Ulf Johanson, Karl-Heinz Leitner

The benefits of education, training and skills from anindividual life-course perspective with a particularfocus on life-course and biographical researchMaren Heise, Wolfgang Meyer

The foundations of evaluation andimpact research

Preface

Philosophies and types of evaluation researchElliot Stern

Developing standards to evaluate vocational educationand training programmesWolfgang Beywl; Sandra Speer

Methods and limitations of evaluation and impactresearchReinhard Hujer, Marco Caliendo, Dubravko Radic

From project to policy evaluation in vocationaleducation and training – possible concepts and tools.Evidence from countries in transition.Evelyn Viertel, Søren P. Nielsen, David L. Parkes,Søren Poulsen

Look, listen and learn: an international evaluation ofadult learningBeatriz Pont and Patrick Werquin

Measurement and evaluation of competenceGerald A. Straka

An overarching conceptual framework for assessingkey competences. Lessons from an interdisciplinaryand policy-oriented approachDominique Simone Rychen

Evaluation of systems andprogrammes

Preface

Evaluating the impact of reforms of vocationaleducation and training: examples of practiceMike Coles

Evaluating systems’ reform in vocational educationand training. Learning from Danish and Dutch casesLoek Nieuwenhuis, Hanne Shapiro

Evaluation of EU and international programmes andinitiatives promoting mobility – selected case studiesWolfgang Hellwig, Uwe Lauterbach,Hermann-Günter Hesse, Sabine Fabriz

Consultancy for free? Evaluation practice in theEuropean Union and central and eastern EuropeFindings from selected EU programmesBernd Baumgartl, Olga Strietska-Ilina,Gerhard Schaumberger

Quasi-market reforms in employment and trainingservices: first experiences and evaluation resultsLudo Struyven, Geert Steurs

Evaluation activities in the European CommissionJosep Molsosa

Developing standards to evaluate vocational education

and training programmesWolfgang Beywl, Sandra Speer

AbstractThere have been numerous attempts in evaluation research to develop guidelines and standards. Thebest known are the US standards for program evaluation, established by the Joint Committee on Stan-dards for Educational Evaluations (JC). These standards originated in the school and university sector.Some illustrative examples relate explicitly to the area of initial and continuing vocational training. Thegoal of the study was to assess the transferability of US standards or the derivative standards of theGerman Evaluation Society (DeGEval, 2002) to vocational education and training (VET). The studyconsiders the following initial questions:(a) does the terminology of the standards match the concepts of European initial and continuing voca-

tional training?(b) are any standards not applicable to initial and continuing vocational training?(c) do European evaluation experts understand and accept the key concepts conveyed (e.g. definition of

‘evaluation’, differentiation between ‘formative’ and ‘summative’ evaluation, purpose of evaluation,etc.)?

(d) are there specific national differences which should be considered when adapting the groups ofstandards?

The standards of the DeGEval (2002) were chosen as a reference point for the following analysis. Otherrelevant standards were presented, and reflections on intercultural transferability and applicability to thesubject of VET were made. First, VET experts were consulted during further discussions in Germany andAustria. Nobody expressed reservations about the transferability of the standards to VET, and no oneproposed adaptation. Second, evaluation experts in widely divergent European countries were sent aquestionnaire. The majority of those surveyed have a positive attitude to standards and endorsemaximum standards. Pluralistic evaluation appears to be an important quality criterion. The singleDeGEval standards are also debated and subject to comment on the basis of criteria found in recentEuropean literature on VET evaluations.

Table of contents

1. Introduction 55

2. General evaluation standards 58

2.1. Background and purpose of evaluation standards 58

2.2. Philosophy of evaluation standards 60

2.2.1. Excursus on the meaning of the word ‘standard’: minimum vs. maximum standards 61

2.3. Standards for evaluations and guiding principles for evaluators 63

2.4. Evaluation standards and models 64

3. Transferability of standards 67

3.1. Intercultural transferability of standards 67

3.2. Current approaches to development of evaluation standards in Europe 70

3.3. Transferability of evaluation standards to VET 73

4. Dialogue with German and Austrian VET experts on evaluation standards 76

4.1. Focused events with the German Federal Institute for Vocational Training (BIBB) 76

4.2. Focused meeting with the Austrian Federal Institute for Adult Education (BifEb) 78

4.3. Conclusions from the dialogues 79

5. E-mail survey of evaluation experts in Europe 82

5.1. Profession and nationality of respondents 82

5.2. Assessment of existing evaluation standards 84

5.3. Further development of evaluation standards 87

5.4. Summary of survey findings 89

6. Reflections on VET evaluation standards literature 91

6.1. Commentary on the utility standards 92

6.1.1. N1/U1: stakeholder identification 93

6.1.2. N2/U2: clarification of the purposes of the evaluation 94

6.1.3. N3/U3: evaluator credibility and competence 94

6.1.4. N4/U4: information scope and selection 95

6.1.5. N5/U5: transparency of values 96

6.1.6. N6/U6 – report comprehensiveness and clarity: evaluation reports should provide all

relevant information and be easily comprehensible 96

6.1.7. N7/U7: evaluation timeliness 97

6.1.8. N8/U8: evaluation utilisation and use 98

6.2. Commentary on the feasibility standards 99

6.2.1. D1/F1: appropriate procedures 99

6.2.2. D2/F2: diplomatic conduct 100

6.2.3. D3/F3: evaluation efficiency 100

6.3. Commentary on the propriety standards 101

6.3.1. F1/P1: formal agreement 101

6.3.2. F2/P2: protection of individual rights 102

6.3.3. F3/P3: complete and fair investigation 103

6.3.4. F4/P4: unbiased conduct and reporting 103

6.3.5. F5/P5: disclosure of findings 103

Developing standards to evaluate vocational education and training programmes 53

6.4. Commentary on the accuracy standards 104

6.4.1. G1/A1: description of the evaluand 105

6.4.2. G2/A2: context analysis 105

6.4.3. G3/A3: described purposes and procedures 106

6.4.4. G4/A4: disclosure of information sources 106

6.4.5. G5/A5: valid and reliable information 107

6.4.6. G6/A6: systematic data review 108

6.4.7. G7/A7: analysis of qualitative and quantitative information 108

6.4.8. G8/A8: justified conclusions 109

6.4.9. G9/A9: meta-evaluation 109

6.5. Proposals for expanding existing standards 110

6.5.1. Selection of the evaluation model 110

6.5.2. Selection of suitable methods 110

6.5.3. Explicit reference to evaluation of training programmes 111

7. Summary and outlook 112

7.1. Objectives, questions and method of the study 112

7.2. Results and conclusions 112

7.2.1. Standards for programme evaluation 112

7.2.2. Transferability of standards 112

7.2.3. Results from group discussion on the applicability of DeGEval standards to vocational

training 113

7.2.4. Survey of evaluation experts in Europe 113

7.2.5. Reflections on VET evaluation standards literature 114

7.3. Outlook 114

List of abbreviations 116

Annex 1: transformation table 117

Annex 2: questionnaire 118

Annex 3: list of experts answering the e-mail survey 123

References 124

List of tables and figures

TablesTable 1: Main tasks in an evaluation 60

Table 2: Exemplary models of evaluation by value interpretation 66

Table 3: Particularly culturally sensitive DeGEval standards 69

Table 4: VET examples in the JC standards (1994/2000) 73

Table 5: Primary position in evaluation 83

Table 6: Respondents’ professional background 83

Table 7: Respondents’ relation to VET 83

Table 8: General assessment of evaluation standards 86

Table 9: Preferred type of standards (minimum vs. maximum) 88

FiguresFigure 1: Subject of this paper 57

Figure 2: Evolution of DeGEval standards 59

Figure 3: Respondents’ identification with national professional cultures 84

Figure 4: Respondents’ familiarity with various sets of evaluation guidelines 85

Figure 5: Seven points which make evaluations useful 92

Figure 6: Two points which make evaluations feasible 99

Figure 7: Five guidelines which keep evaluations on a straight course 102

Figure 8: Nine components which make evaluations accurate 105

The market for evaluations in Europe is growingrapidly. More evaluations are being performed,and they are playing a decisive role in shapingpolicy, particularly government policy.

After many decades of evaluation, much ofwhich was in the area of vocational educationand training (VET), a wide spectrum of evaluationmodels has emerged. Evolution in VET willcontinue to change evaluation requirements.

‘In a time of deregulation and decentralisation,evaluation becomes increasingly important as asteering mechanism. This makes it vulnerable tomisuse. Evaluations can be used as a spuriousjustification for practices that are deemed politicallyexpedient rather than objectively serving theirpurpose. This demands a rigorous discipline aswell as ethical standards […]’ (Cedefop, 2001, p. 6).

Nevertheless, no standards are yet recognisedas quality requirements and guidelines for evalua-tions of VET in European Union Member States.After years of experience in evaluation, therecontinues to be reflection on, and systemisation ofrequirements for, good evaluation of VET. Thisdesideratum can form a basis for expert discussion.

In this paper, evaluation is broadly defined as‘systematic investigation of the applicability ormerit of an evaluand’ (JC, 2000, p. 25). Theuniqueness of the evaluation derives from the factthat concepts, structures, processes and resultsof programmes are described and gradedaccording to their relationship to target groups orin social systems on the basis of empirical, scien-tific methods. Evaluation also provides the foun-dation for impact-oriented programme control.

This paper focuses on the theory and practiceof evaluations which address initial and contin-uing vocational training programmes (1).

The term ‘programme’ can have various mean-ings, depending on level of reference, field ofstudy and policy area. A macro-programme, for

instance, can encompass major bundles of VETmeasures as part of EU policy. By the sametoken, local continuing training measures andinitial training initiatives in individual corporatedivisions can become a programme for evalua-tion. For people-oriented service programmes,which is what most VET measures are, theintended impact only appears in the desiredquantity and quality if target group membersactively participate (coproduction, uno actu prin-ciple, Haller, 1998). Systematic ratings anddescriptions of human service programmes witha claim to intersubjective reliability are highlyvulnerable, given varying, even contrary,economic and social interests and values. Thisapplies to all phases of the evaluation: selectionof evaluators, definition of information scope,interpretation of data, drafting of the evaluationreport and formulation of conclusions and, insome cases, recommendations.

High-quality evaluations are required toachieve acceptance and credibility of evaluationsamong programme participants and evaluationreport addressees. To do this, and thus toincrease the acceptability and utilisation of evalu-ations, norms, rules, guidelines and standards aredevised for evaluations. Nuissl (1999, p. 283)writes: ‘A key prerequisite is that education andtraining evaluation research, which has so farconcentrated on scholastic education, should bemore involved in the construction of evaluationmethods and the development of quality stan-dards and meta-evaluation procedures.’ As arule, evaluation standards and guiding principlesfor evaluators spell out organisational, legal, tech-nical and methodological evaluation requirementsas well as ethical principles and considerations.

The term ‘standard’ has attracted increasedattention recently in education and training circles,not only with reference to evaluation. The Euro-

1. Introduction

(1) ‘Programme’ is a generic term from evaluation jargon. In the VET sector it includes teaching units, courses, series of courses,curricula, a training or university programme, the services of a vocational training provider, local, regional, national or EuropeanVET programmes. Programmes are packages of measures, comprising a succession of activities based on a set of resources,aimed at specific outcomes with defined target groups. A programme comprises a fixed (written) plan or design (programmeas plan) and its implementation in practice or conduct (programme as action). Data-based evaluations can describe andassess policies by carving several programmes into evaluands.

pean Commission Action Plan of November 2001,Making a European area of lifelong learning areality, states: ‘The Commission, the MemberStates and the social partners will jointly examinethe role and character of voluntary minimumquality standards in education and training’ (Euro-pean Commission, 2002, p. 17). The EuropeanTraining Foundation writes in its manual Develop-ment of standards in vocational education andtraining: ‘The creation of market economy struc-tures in these countries often brings with itincreased, and frequently completely differentrequirements in terms of the general abilities,knowledge and skills required by employers atthe intermediate qualification level. These require-ments are documented in vocational educationand training standards’ (ETF, 1999, p. 3). InGermany the contribution of the German Institutefor International Education Research, Bildungs-standards als Beitrag zur Qualitätsentwicklungdes Schulsystems, is the subject of widespreaddebate (Klieme, 2002). The Federal Institute forVocational Training sees training standards as acentral component of a ‘new paradigm for thecreation of vocational profiles’ (Sauter andSchmidt, 2002, p. 21).

Initially, we should specify that the standardspresented, which are drafted at the Europeanlevel and in Germany, refer to VET measures,programmes, training and university courses andportray desirable qualities of the aspects thatevaluations describe and rate. From the evalua-tion perspective we are talking about ‘programmestandards’, or evaluand quality requirements.

This paper deals with the ‘evaluation stan-dards’ that impose requirements on evaluationsthemselves. Repeated confusion of these twolevels occurs, e.g. when in some evaluationmodels the programme standards are erro-neously labelled ‘evaluation criteria’, i.e. yard-sticks for measuring the merit or the applicabilityof the programme being assessed.

Since evaluation in VET is a potentially impor-tant reference point and has such VET standards(Section 3.3), it is particularly significant for evalu-ation terminology to clarify the definition of ‘stan-dard’. This is vital for clear communication on VETquality between policy-makers, programmemanagers, social partners and other stakeholdersas well as evaluators. Focusing on German and

Anglo-Saxon countries, we provide an excursus onthe meaning of the word ‘standard’ (Section 2.2).

The study addresses the following centralquestions:(a) does Europe need a code in the guise of eval-

uation standards to ensure and improve thequality of VET evaluations?

(b) do existing general evaluation standards winthe approval of European experts in evalua-tion and VET?

(c) what opportunities and what risks are seen inpropagating a single set of VET evaluationstandards in Europe?

(d) what cultural and professional values andrequirements should such a code address?

(e) are any standards not applicable to specificVET contexts?

(f) are there quality requirements for evaluationsin VET contexts which are missing in generalevaluation standards? Should there be addi-tional standards or extensions of existingstandards?

(g) are the standards equally suited to evalua-tions in organisations (VET institutions), at thelocal/regional level (i.e. cooperation of severalinstitutions, schools and enterprises), at thenational and European level?

(h) what recommendations are made in relationto discussing and disseminating standardsfor evaluation in the evaluation profession,vocational educators and trainers, and in thegovernment VET authorities?

The subject of this paper is the evaluation ofEuropean VET programmes and measures. Itaims to determine the status of evaluation standards in this area and to present well-founded suggestions for their specification.

In the following chapters we will survey a widerange of evaluations and diverse elements ofevaluations, from their inputs to the output and itsutilisation. They will be localised in the VETcontext and subjected to a critical assessmentwith the help of evaluation standards, thesestandards being honed to specific requirementsfor VET evaluations. The discussion mirrors thebackground of the authors, and involves expertsin evaluation and VET and related Europeanliterature.

Chapter 2 presents general – i.e. applicable inall policy fields – sets of evaluation standards asperformance processes. Other relevant standards

The foundations of evaluation and impact research56

for certain policy areas and political organisationsare presented in Chapter 3. Intercultural transfer-ability and application to VET are also addressed.In the following three chapters, experts speakthrough three channels: dialogue events on stan-dards (VET experts from Germany and Austria), a

survey of 19 evaluation experts in widely diver-gent European countries, and critical analysis ofrecent European literature on VET evaluations.Chapter 7 summarises the findings of the anal-ysis and recommends refinements to standardsfor VET evaluations.


Figure 1: Subject of this paper

Evaluationinputs

Evaluationactivities

Evaluationoutputs

Utilisationof

evaluationoutputs

Evaluationoutcomes

VET context

Evaluation standards

European context

Source: Own depiction based on Zorzi et al. (2002)

This chapter presents the development andcontext of German, Swiss and US evaluationstandards and sketches their goals and compo-sition, using the German code as an example.Subsequently, we explain the basic philosophy ofevaluation standards with reference to thecontent and relationship of the four groups ofstandards. We then discuss their character asmaximum standards which should supportdialogue and learning about good evaluationpractice. We distinguish between evaluationstandards referring to evaluation services andguiding principles that relate to evaluator compe-tence and performance. In conclusion, we tracethe connection between evaluation standardsand evaluation models.

2.1. Background and purpose ofevaluation standards

Professionalisation of evaluation in the US sincethe mid-1970s has involved the development ofvarious sets of standards to register and controlthe quality of evaluations. The evaluation stan-dards of the Joint Committee on Standards forEducational Evaluation (JC) are widely known.The JC first published Standards for evaluation ofeducational programs, projects and materials in1981 (JC, 1981). In 1994 the JC, which by thenbelonged to the later-founded American Evalua-tion Society, presented the Program evaluationstandards. They were revised in a laboriousfive-year review process. They now go beyondschools and universities. A reference to educationand training was consequently only mentioned inthe subheading of the publication.

The JC standards were translated into German(JC, 2000) and initially adapted by the SwissEvaluation Society (SEVAL, 2002). The DeGEval

also decided to base its own standard-settingprocess on the work of the JC to harness the20 years of materials and published expertise inrelated JC standards and to facilitate internationalexchange. A commission, made up of represen-tatives of various fields of application andacademic disciplines, revised the JC standards tomatch the German and Austrian situation and hadthem reviewed by qualified commentators. Inautumn 2001 the DeGEval (2002) approved theevaluation standards.

This paper focuses on the DeGEval standards.Their basic philosophy, their systematic organisa-tion, their designation of most of the standardsand their use of terminology often adhere to theJC and SEVAL standards. Like the latter, they arehelping to adapt the US model to Europeanpolicy and research traditions. If the DeGEvalstandards are the starting point for discussionand analysis of applicability to VET, this takesplace in the name of the ‘family’ of evaluationstandards, to which the JC and SEVAL standardsboth belong. Whenever our statements essen-tially apply to all three sets of standards, we willcall them simply ‘evaluation standards’.

The evaluation standards address evaluators,individuals and organisations who commissionevaluations, and stakeholders in the programmeundergoing evaluation and other evaluands. Thestandards are designed primarily as tools ofdialogue and well-founded reference points forevaluations. The standards furnish adequate,appropriate aids for all evaluation phases. Theweighting of the standards depends on the mainobjective of an evaluation. We distinguishbetween phase-related objectives in the courseof the evaluation cycle and cross-sectional tasks,which are performed several times or continuallyin the course of an evaluation (2).

2. General evaluation standards

(2) For an overview see DeGEval (2002), pp. 38-41.


Figure 2: Evolution of DeGEval standards

Degeval standards: creation and perspectives

Public hearing

Stimulation from othersets of standards

Discussion with otherspecialist bodies

2001 2001

Standards for evaluation (2)

2004

Discussion within DeGEval

Revision from 2002

Development from 1999

Standards for evaluation(DeGEval-Standards)

Deutsche Gesellschaft fürEvakuation DeGEval

*1997

2000 Second edition

Beywl/WidmerTranslation of the JointCommittee-Standards

1999

Program evaluationstandards

*1997

1994

Revision from 1989

1981 Standards for evaluationof educational evaluations

Development

*1975Joint Committee on

standards for educationalevaluation

Initiation

*1974Standards for educational

and psychologicaltests and manuals

W. Beywl, 2002

SchweizerischeEvaluationsgesellschaft

SEVAL 1996

Development

Evaluation standards(SEVAL-Standards)

Source: DeGEval (2002, p. 2; slightly revised by the author)

The standards are also meant to be interfacesfor initial and continuing training in evaluation.They can likewise be employed in the evaluationof evaluations (meta-evaluation) and, finally, makeevaluation transparent to the general public asthe performance of a profession.

The DeGEval standards consist of 25 Stan-dards für Evaluation. Like the JC and SEVALstandards, the DeGEval standards prescribe fourbasic qualities for evaluations: utility, feasibility,propriety and accuracy. The 25 standards aredivided into these four categories (3). These stan-dards, limited to three printed pages, are supple-mented by materials, explanatory notes, aids andchecklists as well as an annex (DeGEval,2002) (4). A transformation table shows whichindividual standards from the three related setscorrespond and enables users of the less estab-lished SEVAL and DeGEval standards to consultthe copious body of JC materials (JC, 1994,2000). To identify the standards unambiguously,we will use the abbreviations listed at the end ofthis paper (5).

2.2. Philosophy of evaluationstandards

The four attributes – utility, feasibility, propriety,accuracy – reflect the thrust of the standardsassociated with each of the four groups. It is tobe hoped that an evaluation observes all fourcriteria.

The accuracy standards in Group 4 underscorethe incontestable demand that evaluation bebased on scientific methods. They require thatthe scope of the evaluation and its findings bestated precisely (G1/A1 and G2/A2) and that theprocedure and sources of information tapped bepresented in a manner conducive to comprehen-sion and verification (G3/A3 and G4/A4). Stan-dards G5/A5 to G7/A7 treat validity, reliability,systematic error checking and qualitative andquantitative data analysis, which are crucialrequirements of empirical social scienceresearch. G8/A8 stresses that conclusions mustclearly follow from the empirical data. Finally,G9/A9 demands that evaluations submit tosystematic meta-evaluation.


Table 1: Main tasks in an evaluation

Phase-related tasks A. Decision on performing an evaluationB. Definition of evaluative questionC. Evaluation planningD. Information collectionE. Information processingF. Evaluation reporting

Cross-sectional tasks G. Evaluation budgetingH. Evaluation contractI. Evaluation managementJ. Evaluation staffing

Source: author’s representation

(3) The US JC standards are composed of 30 individual standards, which were partially combined in the DeGEval standards,yielding a set of 25. See Annex 1: transformation table.

(4) The annotated DeGEval standards can be found in English translation in the annex. We therefore forego a detailed descriptionat this point.

(5) This publication usually cites the DeGEval standards. To avoid misunderstandings, the reference numbers of both the Germanand the English texts will be given, e.g. N1/U1 for the Nützlichkeitsstandard No N1 or the translated Utility Standard No 1. Inexceptional cases we will also cite the JC standards. They will be indicated as follows: JC-U1 for Joint Committee Utility Standard No 1.

The third group, propriety standards, containsrequirements which we know from the ethics ofscience (F2/P2): protection of individual rights,F5/P5; disclosure of findings; and additionaldemands which result from the clash betweenevaluation as assessment of social practice andas scientifically based procedure (F1/P1, formalarrangements; F3/P3, complete and impartialreview; and F5/P5, disclosure of findings).

The second group, feasibility standards, empha-sises that in implementing evaluations – in contrastto basic scientific research – one must alwaysconsider economic, social, political and organisa-tional factors which impinge on the programmes,etc., to be evaluated. Standards D1/F1 to D3/F3state that compromises and adaptations mustconstantly be made. Procedures must be appro-priate to the practice which is to be described andevaluated. They must be introduced and performeddiplomatically. They must be efficient in terms oftheir cost-benefit ratio to be accepted by practi-cians and to be politically viable.

The first-mentioned group of standards uses‘utility’ to label the central goal of evaluations andsuggests that the information and conclusionsthey provide should actually be used by the eval-uated programme stakeholders (N8/U8). Analysisof, and research on, evaluation have derivedseven requirements which utilisation and worth ofevaluations must demonstrate: identified andadequately involved stakeholders, clarified evalu-ation purposes, credible and competent evalua-tors, suitable selection of data, transparentlypresented values, complete and clear reporting,and timeliness of evaluation activities.

It may seem odd that the utility standardscome first and the accuracy standards last in theset. This is no indication of their relative status. Ithighlights the often unresolvable conflict betweenscientific merits and the requirements of evalua-tion users which frequently arise in the course ofevaluations. ‘In practice, therefore, the evaluatormust struggle to find a workable balancebetween the emphasis to be placed on proce-dures that help ensure the validity of the evalua-tion findings and those that make the findingstimely, meaningful, useful to the consumers’(Rossi et al., 1999, p. 31) (6). The systematic

listing of the four groups in the DeGEval, SEVALand JC standards underscores the fact thatstruggling for an appropriate balance betweendiffering criteria, sometimes diametricallyopposed, evaluation quality is at the heart ofgood evaluation in theory and practice.

The outline of the evaluation standards does notconnote any weighting, neither among groups ofstandards nor between individual items. Widmer(2000) states that the fact that each groupcontains a different number of standards does notpermit us to draw any conclusions about the rela-tive importance of any group. Weighting of indi-vidual standards should be conducted for eachseparate evaluation, taking account of its determi-nants. This is very significant because individualstandards sometimes lay competing claims. It isthe job of the evaluator to decide which standardsto prioritise, to state this expressly and justify thechoice. Evaluators are always involved in a bittertug of war between two or more sides.

Evaluation standards are designed to unfoldand explain the broad spectrum of quality normsand bring them to the attention of thoseconcerned. Different quality criteria should not beplayed off against each other. They should serveas signposts for careful planning, conducting andanalysis of evaluations.

2.2.1. Excursus on the meaning of the word‘standard’: minimum vs. maximumstandards

The term standard is used in many ways. We willfocus on the difference between minimum andmaximum standards.

The German word Standard is derived from itsEnglish cognate: ‘yardstick, norm, rule’ (<19thcentury), borrowed from the modern English word‘standard’, originally ‘flag’. The shift of meaning inEnglish from ‘flag’ to ‘norm’ has not been reliablymapped (either via ‘guiding’, ‘gauging’ or ‘king’sstandard [royal flag], or as a landmark providingorientation)’ (Kluge, 1999, p. 787).

Webster (1989, p. 1385) gives a total of 28meanings for standard(s), including the originalmeaning, No 12 in the list: ‘a flag indicating thepresence of a sovereign or public official’ andNo 13, ‘a flag, emblematic figure, or other object


(6) Whether evaluation is an academic discipline or a scientific profession is the subject of great controversy. The topic was thefocus of a workshop at the conference of the European Evaluation Society (EES, 2002).

raised on a pole to indicate the rallying point ofan army, fleet, etc.’ The following, non-academicmeanings illustrate the versatility of the term: (a) an object considered by an authority or by

general consent as a basis of comparison; anapproved model;

(b) anything, as a rule or principle, that is used asa basis for judgement;

(c) an average or a normal requirement, quality,quantity, level, grade, etc.;

(d) standards: those morals, ethics, habits, etc.,established by authority, custom or an indi-vidual as acceptable;

(e) the authorised example of a unit of weight ormeasure.

Webster’s differentiation between standardsand criteria provides input for discussion of theuse of the two terms in the language of evalua-tion: ‘A “standard” is an authoritative principle orrule that usually implies a model or pattern forguidance, by comparison with which the quantity,excellence, correctness, etc., of other things maybe determined. [...] A “criterion” is a rule or prin-ciple used to judge the value, suitability, proba-bility, etc., of something, without necessarilyimplying any comparison.’

The last-quoted definition of standard showsthat it can be used for comparison with somespecified quantities as well as with less opera-tional items such as ‘excellence’ (7). For thehighest possible terminological clarity, we use theuniversal distinction between the two extremevarieties of standards (which should apply to bothevaluations and evaluands). A ‘minimum stan-dard’ states, usually in rather technical terms,specific (ideally quantitatively operationalised)minimum requirements, which must be strictlyobserved (here: by an evaluation) so that highquality can be ascribed to it. A ‘maximum stan-

dard’ states, usually in lay terms, which leavescope for interpretations, the envisioned ideal(which an evaluation should fulfil to be judged tobe of high quality).

In consultation with evaluation experts, whocomplemented our study, we noted that the term‘standard’ possesses very different connotations,depending on national origin, academic back-ground and one’s role in the evaluation (commis-sioner/evaluation team):(a) colleagues from the United Kingdom primarily

associate standard with quantified, uncondi-tionally binding minimum standards (as in aBritish Standards Institute definition, forexample). ‘A standard is a published specifi-cation that establishes a common language,and contains a technical specification orother precise criteria and is designed to beused consistently, as a rule, a guideline, or adefinition [...]’ (8);

(b) psychologists – at least those who are statis-tically inclined – usually think in terms ofminimum standards, while sociologists tendtoward maximum standards;

(c) commissioners (particularly if they have intro-duced quality management systems) (9) oftenprefer operationalised minimum standards,e.g. stipulated in requirement specifications,while evaluators favour maximum standardsbecause they guarantee the necessary flexi-bility for planning and conducting evaluations.

Because of such ambiguities, the drafters ofthe SEVAL and DeGEval standards consideredreplacing the term standard with another such as‘norm’, ‘code’ or Richtlinie. These evaluationsocieties decided differently, however, becausesuch terms are also ambiguous from discipline todiscipline and would not increase clarity. Ourexperience shows that a consensus can only be


(7) Harvey and Green (1993) for the five quality dimensions of human services and their fundamentally varying capacity for beingput into operation.

(8) This layperson’s definition is given on the BSI education sites at http://www.bsi-global.com/Education/ index.xalter(9) The Standards Policy Team of the Regulatory Affairs and Standards Policy Directorate, Industry Canada stresses maximum

standards in its first definition section, whereas in a second section it defines minimum standards with reference to the Inter-national Organisation for Standardisation (ISO). ‘A standard is broadly defined as a publication that establishes accepted prac-tices, technical requirements and terminologies for diverse fields of human endeavour. The International Organisation for Stan-dardisation (ISO) defines standards as documented agreements containing technical specifications or other precise criteria tobe used consistently as rules, guidelines, or definitions of characteristics, to ensure that materials, products, processes andservices are fit for their purpose.’ Available from Internet: http://strategis.ic.gc.ca/SSG/ sp00447e.html#NSS [Cited13.11.2003]. The German counterpart of the BSI is the Deutsches Institut für Normung, the Austrian is the ÖsterreichischesNormungsinstitut and the German areas of Switzerland have the Schweizerische Normenvereinigung. Norm is closer inmeaning to the international and English word ‘standard’ than to the German word Standard.

achieved through widespread intensive perusal ortrial application of the evaluation standards.Faithful to the JC tradition, SEVAL and DeGEvalchose to retain the term standard.

In this paper, we have chosen to state expresslyeach time whether we mean maximum standards(as in the JC, SEVAL and DeGEval standards) orminimum standards (as in quality management).

The evaluation standards discussed in thispaper are conceived as maximum standards. Anideal evaluation would adhere to each individualstandard that is theoretically applicable to thisevaluation. The JC standards expressly providefor the possibility of a priori non-applicability ofcertain standards to a concrete evaluationproject (10). This ideal, already qualified, can rarelybe achieved in practice unless the requirementsof two or more standards prove to be contradic-tory or financial resources do not suffice to meetall standards (11). Even though a specific evalua-tion can hardly comply with all standards equally,evaluators should strive to take account of each –where applicable – as far as possible.

European VET discussion involves various stan-dards. In the introduction we termed them‘programme standards’ in contrast to the ‘evalua-tion standards’ covered in this paper. Typically VETstandards are copious bodies of rules. For examplethe German term Ausbildungsordnung (trainingregulation) has more recently been translated as‘VET standard’. The government-issued trainingregulations in Germany dictate requirements for‘state-recognised training occupations that requireformal vocational training’ (Sauter and Schmidt,2002, p. 7). The 1999 publications of the EuropeanTraining Foundation aim to create a similarly

comprehensive body of standards to supporteastern European countries in developing VET stan-dards. Another prominent example of a detailed,descriptive standard containing definitions, specifi-cations, checklists, codes, the reasoning behindthem and much more, is ISO 9000:2000 comprisingapproximately 40 printed pages.

However, in this paper we use standard as alabel for short, succinct texts, often limited to onesentence, and rarely exceeding three (12). Thesemaximum standards are statements for evalua-tion planning and execution. They constitute abasis for meta-evaluations.

2.3. Standards for evaluationsand guiding principles forevaluators

In the US we find, apart from the JC standards,the Guiding principles for evaluators (Shadishet al., 1995) (13). The latter were developed asprofessional guidelines or codes of ethics by theAmerican Evaluation Association (14).

While the JC, SEVAL and DeGEval standardsrefer to the quality of evaluations as a service, theAmerican Evaluation Association guiding princi-ples state requirements of professionals who planand conduct evaluations, i.e. evaluators, occasion-ally also of commissioners (15). While the formerfocus on the quality of rendering the service, thelatter concentrate on evaluators’ professional andpersonal skills and their adherence to general lawsand codes of ethics and assumption of profes-sional and personal responsibility (16).


(10) For example, in the Checklist for applying the standards (JC, 1994, p. 18 f). It is also clearly stated in the analogous checklistattached to the DeGEval standards.

(11) Examples of non-applicability and non-achieveability of applied standards are given in Section 5.1 of the currently unpublishedmeta-evaluation by Jenewein (2001). For an example in VET, see Section 5.1.

(12) The DeGEval standards make a clear distinction between standards and explanatory notes. JC and SEVAL publications do thesame thing. The standard per se is the ‘presentation of the standard in the form of a should statement’ (JC, 1994, p. 7). ‘(Thestandards [...] comprise a term and a description in one sentence’ (SEVAL, 1999, p. 2).

(13) Their relevance for the American Evaluation Association is evidenced by the fact that these guiding principles are printedverbatim on the initial pages of each issue of the American Journal of Evaluation.

(14) The American Evaluation Association and 14 other organisations were involved in elaborating the JC standards.(15) The terms ‘standard’ and ‘guiding principle’ are not mutually exclusive and, ultimately, they are chosen arbitrarily. We propose

the convention of using ‘standards’ for evaluation services and ‘guiding principles’ for evaluators, cf. Section 3.1.(16) Further examples of the category ‘guiding principles’ are the Guidelines for the ethical conduct of evaluations of the

Australasian Evaluation Society (1998), which address commissioners, users and teachers in the field of evaluation, and theCES Guidelines for Ethical Conduct of the Canadian Evaluation Society. A more comprehensive discussion and a comparisonare found in Beywl and Widmer (2000).

The guiding principles are much more generaland broader than standards. Sanders (1995) findsno contradictions or inconsistencies between theguiding principles and the JC standards(Sanders, 1995; pp. 50-51). The former concen-trate on evaluators’ professional values, whereasthe latter focus on professional performance.

There are five guiding principles. Systematicenquiry is basically contained in the accuracystandards and in JC-U3, Information scope andselection. The guiding principle Competencematches JC-U2, Evaluator credibility, andJC-A12, Meta-evaluation. Integrity and honestyprinciples are found in the feasibility, proprietyand, to some extent, accuracy standards.

The fifth guiding principle, Responsibilities forgeneral and public welfare, touches on severalstandards from the four groups of JC, SEVAL andDeGEval standards. JC-U1, Stakeholder identifi-cation, specifies ‘the general public’ as a poten-tial stakeholder requiring consideration. JC-D2/F2addresses political viability. ‘Evaluations are polit-ically viable to the extent that their purposes canbe achieved with fair and equitable acknowledge-ment of the pressures and actions applied byvarious interest groups with a stake in the evalu-ation’ (JC, 1994; p. 71). In the Propriety group,JC-P6 elevates disclosure of findings to a centralquality criterion. Finally, Accuracy standardJC-A12 requires meta-evaluations ‘[...] (which)should enhance the credibility of particularprogrammes evaluations and the overall evalua-tion profession’ (JC, 1994, p. 185).

The public welfare duty (‘evaluators have obli-gations that encompass the public interest andgood’) establishes a peculiarity of the guidingprinciples which was debated most ferociouslyduring their drafting (17). The same goes for thepromulgation of the principle ‘freedom of infor-mation is essential in a democracy.’ This can beviewed as a vote for making publication of evalu-ation reports obligatory.

The JC standards, and even more the DeGEvalstandards, are more reluctant than the guiding

principles to express requirements based on suchcodes of ethics or the theory of democracy. Onereason is that evaluation standards are typicallydrafted by a team including evaluators andcommissioners, and the conflicts of interestsbetween these groups, e.g. on obligatory publica-tion, already lead to compromise solutions at theearly stage of negotiations. In contrast, evaluatorprofessional organisations are much freer informulating guiding principles. They can stipulatefurther voluntary obligations.

Since evaluations of VET programmes andmeasures in the EU and its Member States areset in an intricate stakeholder mesh (18), existingevaluation standards are suitable starting pointsfor a discussion of VET evaluation standards. Thisdebate may stimulate evaluators active in tradeand professional associations in Europe to refineguiding principles.

2.4. Evaluation standards andmodels

Evaluation standards are designed to be suitable fora huge variety of evaluation approaches and to beapplicable to the broadest possible scope of appli-cations. They are generally appropriate for bothformative evaluations, which accompany theshaping of the evaluand and attempt to fosterimprovements, and for summative evaluations,which calculate a balance, usually on one evaluand.

In past decades evaluation models of allshapes and sizes have emerged (19). They differ,in particular, in their epistemological foundations,the academic field of their authors, the incorpora-tion of social values and interests, participationand use conceptions, evaluation purposes,advance organisers, relationship of the evaluationto the phases of the programme concerned,stressed dimensions of the evaluand andmethodological preferences.

Sometimes we encounter developers or usersof a certain model who assume that this is the


(17) See various articles in the special issue of New directions in program evaluation, No 66. San Francisco: Jossey Bass, summer1995.

(18) Cited examples are the legislative and central government executive branches, employers, unions, and professional teacherand trainer associations.

(19) A survey is found in Beywl (1988), Owen and Rogers (1999), Russon, C. and Russon, K. (2000), Stufflebeam (2001), Kellaghanand Stufflebeam (2002).

best or even the only applicable evaluationmodel. They then equate their brainchild withevaluation. This may result from a narrow,subject-oriented perspective or from institutionalembedding of evaluation tasks in a national orinternational organisation or agency. It may alsobe related to the intention of jockeying one’s ownmodel into a more favourable bargaining positionin negotiating evaluation policy (20).

JC, SEVAL and DeGEval standards claim tocover the entire spectrum of evaluation modelsand incorporate pluralistic epistemology andmethodology. On the one hand, they do notfavour any specific evaluation model or group ofmodels. On the other hand, it has been shownthat some models, especially if they are used‘purely’, conflict with some standards (21). In prac-tice a mix of evaluation models is applied whendrawing on evaluation and analysis experience todesign and implement a concrete evaluation. Inso doing, evaluators often meet evaluation stan-dards, even if they do not know them. This is notto say that all, or even the majority of, evaluationsare high quality in terms of evaluation standards;judging this requires systematic meta-evalua-tions, which have not yet been conducted (22).

In this report we cannot provide a systematicsurvey of evaluation models. Patton (1997,p. 192) lists 57 approaches in a table. Each yearanthologies or textbooks introduce a new variantor an entirely new approach (23). Evaluation

models in English dominate. Most of them arefrom the US. A few Continental approaches havealso found a foothold or promise to add a newdimension to evaluation theory and practice (e.g.Pawson and Tilley, 1997, Kushner, 2000) (24).

The following survey outlines a few of the mostprominent evaluation models employed in widelydivergent fields, including VET. The depiction isorganised in terms of value interpretation, whichstandard N5/U5 stresses. This corresponds to thenotion that evaluation takes values (25) as aconstituent reference point of practice.

The following outline of the four main types issuccinct and is no substitute for thorough anal-ysis (26). Categorisation is guided by the evalua-tion model’s consciousness of values (27).Commonly we find overlaps between categories,which result from ambiguities in model descrip-tions, particularly when the subject of values isonly treated implicitly.

Value-distanced approaches follow the traditionof thinkers such as Max Weber or Karl Popper andeliminate value judgements from the evaluationprocess. Theoretical framing of an evaluation andimplementation in empirical investigations operate‘objectively’ according to strict rules; the utilisationof evaluation findings is delegated to the externalpublic democratic process (28).

Value-positioned approaches expressly assumethat societies are marked by stark power imbal-ances and social and economic inequality. Evalu-


(20) In the 1990s, Germany often saw a monopoly claim to total quality management for large areas of human services. The rela-tion between quality management and evaluation, which are equally represented in VET contexts, has not yet been studied ininternational comparison.

(21) Stufflebeam (2001), who performs a systematic comparison of a total of 22 evaluation models across the entire board of JCstandards, speaks in these cases – rather argumentatively – of pseudo-evaluations and quasi-evaluations.

(22) But this is not the case for Widmer’s (1996) meta-evaluations in a wide range of Swiss policy areas.(23) For example, Mark et al. (2000), Kushner (2000), Hale (2002).(24) No assessment which systematically and comparatively presents the evaluation approaches developed in Europe outside of

Ireland and UK has been published.(25) It is beyond the scope of this paper to define the multifaceted term ‘value’. An intercultural comparison reveals that North

American evaluation literature often uses ‘value’ in collocation with ‘material’, ‘social’, etc. (Cf. explanation of JC Standard U4).(26) An initial systematic portrayal is found in Beywl et al. (2003), dealing primarily with evaluations of poverty avoidance and social

inclusion policies and programmes. A comparative study and survey of evaluation models focusing on VET is not yet available.A first approximation can be consulted in the annotated bibliography by Beywl and Schobert (1999).

(27) Assignments are not performed analytically, by maintaining, for instance, that cost-benefit analyses are bound ipso facto to thevalue judgements of shareholders (a stakeholder subgroup) or that goal-free evaluations mainly reflect values that arewidespread in society (thus confirming the value hierarchy). Such mutually critical analyses form the nucleus of the ‘paradig-matic debates’ in evaluation methodology (Guba and Lincoln, 1997; Pawson and Tilley, 1997; Philosophies and types of eval-uation research authored by Eliot Stern in this publication).

(28) VET evaluations tend to take place in enterprises managed as meritocracies. This would require a fundamental adaptation ofevaluation models associated with ‘open’ and ‘experimental society’. We believe this is a current research objective.

ations should counterbalance the value hege-mony in the political and cultural spheres bystrengthening the weak and giving them anaudible voice in the political process.

Value-prioritising models also assume strongdisequilibria in society, but thus restrict them-selves to making them transparent and acces-sible to the negotiation of particularlyrelevant/socially accepted values. For instance,they may demand involvement of all stake-holders in the determination of questions anddiscussion of findings and may work towardprioritisation and a minimum consensus.

Value-relativistic models underscore the domi-nant significance of values in planning, executingand utilising evaluations. They detect valueconflicts in all phases and maintain existingtensions without taking sides. Motivation andsocial energy in using evaluation findings derive

from consciously and publicly stated differencesin values and interests among stakeholders.

The explicit reference to evaluation models inconception, and particularly in written reporting,of evaluations offers an opportunity to assess thesuitability of certain approaches for concrete VETevaluation tasks, to criticise them and contributeto refining evaluation methodology. An evaluationstandard could demand specification and justifi-cation of the model (or the two or more models)which were used to design an evaluation and toexplain why it/they fit the given evaluationpurpose, the evaluation questions and thespecific VET external variables. Such a disclosureand justification requirement would encouragepropagation of evaluation models in Europe,discussion of their weaknesses and strengthsand development of an awareness of the need formeta-evaluation (recommendation 7).


Table 2: Exemplary models of evaluation by value interpretation

Model family Model Type Models Author (a)

Goal-oriented Madaus and Stufflebeameffectiveness estimation (1988)

Effectiveness-orientedExperimental impact model Shadish et al. (2002)

evaluation

Quasi-experimental Heckman and Smith (1996)

Value-distanced impact model

Efficiency-oriented Cost-benefit analyses Levin and McEwan (2001)

evaluation

Result-oriented Goal-free result assessment Vedung (1999)

evaluation

Programme-theory-oriented Theory-driven evaluations Chen (1990)

evaluations

Empowerment evaluation Fetterman (2000) Value-positioned

Participative

evaluation Democratically balanced House and Howe (2000)

evaluation

Decision-oriented evaluation Stufflebeam et al. (1971)Value-prioritising

Stake-oriented

evaluation Utilisation-focusedPatton (1997)

evaluation

Value-relativisticConstructivistic

Responsive evaluation Guba and Lincoln (1989) evaluation

(a) Here we cite either creators of the evaluation strategy or authors who give a well-founded overview of the given evaluation model

This chapter discusses the general question ofhow transferable evaluation standards originatingin the US are to the European social, political andcultural context. It goes on to present themeasures undertaken by the EU and its MemberStates to develop independent evaluation norms,including some international sets of standards.The chapter then conveys some initial perceptionson the standards’ transferability to vocationaltraining. Later chapters will expand these ideas.

3.1. Intercultural transferability ofstandards

As demonstrated in the previous chapter, thedevelopment of evaluation standards in Europewas stimulated by the US JC standards (29). Atfirst glance this seems a good idea because ofthe high costs of devising standards, but it mustalso be seen in the light of a general dominanceof the US evaluation approach. As alreadymentioned, Europe has developed hardly anyindependent evaluation models of its own. Thissuggests that evaluations on this side of theAtlantic largely follow the American lead. Vedung(1999) is an exception for an evaluation approachdeveloped in Europe (30). Pawson and Tilley(1997) describe an evaluation model whichexplicitly espouses European traditional thinkingand represents a deliberate departure from theUS precedent (31).

Professional standards are usually shaped byvalues and norms, which can vary widely fromculture to culture. In addition, the configuration ofthe parliamentary system is an important determi-nant of national evaluation culture (32).

In his 1986 publication, Stufflebeam, thelong-serving chairman of the Joint Committee,claims that the JC standards have limited useoutside the US. He writes that other countrieshave adopted adaptations of the standards. Fewwould question the transferability of standardsbased on procedures derived from socialsciences, the ‘accuracy’ category (33). Thesecross-cultural norms have been formulatedalmost identically in very different fields in the USand in European countries, for example in theBritish psychological society’s code of conductand in the British Sociological Association’sStatement of ethical practice. In addition, qualitystandards exist for certain parts of programmeevaluation. These include standards for thedesign of experimental research and objectivity,reliability and validity specifications as surveyquality criteria. For information on the formalaspects of quality assurance, we refer to qualitymanagement concepts such as the ISO/EN/DIN9000ff norms (34).

In North America, where there is a profoundmistrust of State control in general, independentassessment seems a more logical approach.The public expects to be informed of the costsand benefits of government activities and the UShas long dedicated considerable resources to

3. Transferability of standards

(29) They have been implemented in countries with extremely different evaluation cultures, such as Brazil and Israel. A Europeanexample is Sweden (Marklund, 1984).

(30) Vedung (1999) is available in Swedish, English, German and Spanish. See also Beywl and Taut (2000).(31) Cf. the detailed description in this report (Eliot Stern).(32) ‘Competitive democracies’ (dominance of the majority principle) and ‘consociational democracies’ (consideration of all relevant

interests sometimes going as far as the principle of unanimity) (Jesse, 1993) tend to assign different functions to the evalua-tion of political programmes and measures. The rapid development of an evaluation culture in Switzerland, the home of conso-ciational democracy, may indicate that independently obtained evaluation findings foster amicable resolution of conflict andwillingness to compromise. It would be interesting to analyse national VET evaluation cultures as a function of the respectivepolitical system, incorporating structural characteristics of the VET systems.

(33) See also the discussion on accuracy standards in Section 6.4.(34) Beywl and Schobert (2000) give an overview of the relationship between evaluation and quality management in vocational

training. Speer (2001) compares and differentiates between evaluation and benchmarking in company personnel management.

academic evaluations of training and labour-marketprogrammes. In Europe, training as part oflabour-market interventions is a much more recenttool, especially in southern EU countries. Sinceprogrammes in this field are a relatively newphenomenon, there is a dearth of econometric dataand programme designs. The US has a far largerreservoir. Schmidt (2000, p. 427) points out thestriking absence or infancy of social science exper-imentation in Europe. The differing evaluationcultures in North America and Europe must betaken into account.

A survey of 1645 companies in Finland,Germany, Ireland, Northern Ireland and the UKidentified differences in the evaluation of trainingactivities (Field, 1998a). The UK conducted moretraining evaluations than the other countries; thiswas particularly evident for evaluations ofpre-training activities and reflective evaluations.In Germany comparatively little evaluation takesplace during training. Finland conducts a rela-tively large number of evaluations immediatelyafter training courses finish. The countries inwhich reviews are carried out most often evaluatetraining as soon as courses end. Next mostfrequent are evaluations before training starts,followed by evaluations after participants havereturned to their jobs. Evaluations are leastcommon during training activities. The purposesof the evaluations vary in focus correspondingly.One often-mentioned aim is to test whethertraining has fulfilled its objective. Evaluationswhich concentrate on improving participants’abilities to perform their job are identified asimportant. This was especially true in the UK.Thus differences of emphasis characterise theevaluation practices of various European coun-tries. However, this does not affect evaluationstandard enforcement options (35).

Those standards which demand a high level ofsocial awareness during planning and manage-ment of evaluations in the national/regionalcontext are likely to be sensitive in interculturalapplications (Rost, 2000). This applies especiallyto the following standards:

(a) identification of stakeholders in the evaluatedprogramme (N1/U1);

(b) relative significance of personal, social andevaluand-related skills for the credibility ofevaluators (N3/U3);

(c) disclosure and discussion of values and inter-ests as the basis for judgements (N5/U5);

(d) anticipation of the various positions to ensuretheir advocates’ cooperation and to preventdeliberate obstruction of the project (D2/F2);

(e) consideration of the culturally determined andlegally protected inalienable personal inter-ests of all those involved in the evaluation(D2/F2/P2);

(f) attempts to ensure all relevant interests aretreated fairly (F4/P4);

(g) publication of findings (F5/P5).It would be useful to test these assumptions

through empirical research, but this may result indifficulties, since professionalisation of evaluationand the emergence of evaluation cultures arerecent developments in Europe. Often the stan-dard sets are not sufficiently known, particularlytheir details. This makes it difficult to conductsurveys to gather statements and criticalcomments on individual standards and theircross-cultural applicability. To offset this barrier,the empirical part of the investigation uses a mixof group discussions (combined with presenta-tion of evaluation standards), an electronic surveyof experts (which assumes a certain degree offamiliarity with evaluation standards), and contentanalysis of current European literature on thesubject (a non-interactive process). However, thecomponents of this pilot study are no substitutefor an analysis of intercultural transferability.Future studies will have to resolve this issue (36).

Since some standards refer to several aspectsof desirable evaluation quality, it is also conceiv-able to weight the focuses of these standardsdifferently in various European countries. Somestandards may be crucial, while others may bemeaningless because they are rarely fulfilled inthe given cultural context or are nearly alwaysmet anyway. Standard N7/U7 ‘evaluation timeli-


(35) The concept of controlling in corporate continuing training, which is partly related to evaluation, was also applied in almostexactly the same way in Germany, the Netherlands and Austria (BIBB et al., 2001).

(36) We highly recommend workshops and meetings which benefit from a systematic data collection procedure as an appropriatetest of intercultural compatibility. They may contribute to the further development of European-level evaluation standards(Recommendation 10).

ness’ can serve as an example. Its relevance canbe judged entirely differently from culture toculture. One society may view strictly designateddeadlines as evidence of the contractor’s lowsocial status whereas another may make theability to fulfil deadlines an automatic prerequisitefor winning an evaluation tender. Other standardsmay prescribe behaviour that is entirely natural incertain cultural contexts and yet completely aliento others. They would, therefore, be superfluousor incomprehensible respectively (F5/P5: disclo-sure of findings). This can lead to serious conflictwithin multinational evaluation teams or duringevaluations of programmes implemented inseveral countries.

The European Commission observed that inthe years 1997 to 2000 an evaluation cultureemerged with the following characteristic(Schmitt von Sydow, 2001, p. 9) (37): the majorityof the evaluations are mid-term, the rest are ex post evaluations. Ex ante evaluations are rare.Stakeholder orientation is not a priority of Euro-pean Commission evaluations. They are formativerather than summative. The white paper pointsout that it is still too early to speak of a generalEuropean Commission evaluation culture. Thewhite paper names several purposes of evalua-tion (idem, pp. 20-21): ‘[…] to enhance demo-cratic accountability, to assist political decisionsabout legislation, policies and programmes, topromote closer understanding between stake-holders and to support the implementation andmanagement of existing programmes’. General

rules or standards are regarded as more effectivefor maintaining objectivity and neutrality than, forexample, any new, formally independent evalua-tion functions within the European Commission.Standards for the evaluation process can increasethe credibility of evaluators (idem, p. 38,Annexe IV). Rules like this would tighten method-ology and data reliability (idem, p. 35, Annexe IV).It was also decided that evaluations should not berestricted to the perspective of single directoratesbut should pose and answer cross-sectoral evalu-ation questions. Evaluations should be designedas inputs for annual decisions on policy priorities.

The evaluation culture of southern EU MemberStates is strongly shaped by their obligation tojustify structural appropriations (EuropeanCommission, 1999b, Vol. 1, p. 45); Greece, Spainand Portugal rarely conduct evaluations unrelatedto structural funds. In contrast, Denmark,Germany, France, the Netherlands, Sweden andthe UK carry out many evaluation activities notrelated to structural funds. It is not surprising thatsome of the latter States see evaluations as apart of their political culture and as an expressionof the democratic process while the southernEuropean countries often regard evaluations as achore imposed on them from outside. However,the evaluation activities conducted in the contextof EU programmes have accelerated the creationof additional evaluation resources in countries likeGermany and France (European Commission,1999b, Vol. 1, p. 46). A third group of countriesincluding Belgium, Ireland, Italy (Northern),Luxembourg, Austria and Finland, (i.e. mostlysmaller, developed countries) predominantlyregard evaluation as improved management ofpublic intervention.

We can identify differences between thesevarious evaluation practices which probablyresult from varying cultural and institutional tradi-tions. Northern Europe is ascribed a parliamen-tary-democratic evaluation culture (EuropeanCommission, 1999b, Vol. 1, p. 202). Wollmann(2002, p. 5 f.) and Vedung (1999, p. 70 f.) claimthat Sweden, considered the European leader inevaluation research, has a consensus-orientedpolitical style moulded by parliamentary commis-sions. These commissions often award contracts


(37) This white paper involved 27 European Commission employees from different directorates (members of Working Group 2b) and18 external evaluation experts from various European countries participating in four hearings.

No Standard English term

U1 Stakeholder identification

U3 Evaluator credibility and competence

U5 Transparency of values

F2 Diplomatic conduct

P2 Protection of individual rights

P4 Unbiased conduct and reporting

P5 Disclosure of findings


Table 3: Particularly culturally sensitive DeGEval standards

for studies or evaluations with political relevance.Wollmann writes that the contract recipients areusually university social scientists. In contrast, theEU primarily commissions external bodies toconduct evaluations and carries out very fewinternally. Wollmann adds that private consul-tancy firms have the lion’s share of the market forexternal evaluations. He distinguishes betweencentral-level evaluations, whose evaluands arewhole programmes, and evaluations of nationalprogrammes, which are usually conducted bynational (private) evaluation institutes, except inSpain where they are undertaken by universities.On the basis of a study he conducted, Leeuw(2000) concludes that the market for evaluationsis a growth industry. The demand for evaluationsseems to be expanding more quickly at EU levelthan at national or regional level.

Because of various institutional arrangementsand differently developed evaluation markets, somestandards may be particularly culturally sensitive. Inanother context, Smith et al. (1993, p. 12) identifieda fundamental cultural difference in the use of eval-uation standards. ‘The concept of standards asemployed in the US is much less relevant within theMaltese and Indian traditional cultures. Althoughstandards may be imposed from the outside,indigenous standards are unlikely to emerge.’

The discussion of individual standards inChapter 6 provides details of certain areas ofintercultural sensitivities.

3.2. Current approaches todevelopment of evaluationstandards in Europe

The institutionalisation of evaluation in the form ofevaluation societies is a very recent development inEurope. Societies exist in Denmark, France,Germany, Italy, Spain, Sweden, Switzerland,the UK and Wallonia. Europe also has its own eval-uation body, the European Evaluation Society (38).In some European States there has been a critical

look at US evaluation standards. To test the trans-ferability of US standards to Europe, the authorslooked at the acceptance of American standards inEuropean Countries. Some national societies havedesigned or adopted their own standards. Theauthors contacted them as part of this study if wecould not find sufficient information about theirstandards on their websites, and asked them aboutthe current status of their discussion on standards.

German/Austrian and Swiss standards followthe example of the US standards. The SociétéFrançaise de l’Évaluation is currently developingits own independent standards (SFE, 2002). It hasnot yet fixed these standards, but internal discus-sion has reached an advanced stage. In contrastto the JC standards, the French discussion isfocusing on the social usefulness and publicinterest (utilité sociale et intérêt général) of theevaluations. It also values the principle of honesty(principe d´honnêteté) (39). Referring to productquality policy the Société Française de l’Évalua-tion draft includes guidelines for the structure ofevaluation reports and rules on their readability.The French draft is very precise on this point (40).The commissioners are responsible for externalprocess management of evaluations and shouldbe directly involved. For example, those respon-sible for the evaluation should support the devel-opment of an evaluation culture in the organisa-tion concerned (IV-6 Culture d’évaluation). TheSociété Française de l’Évaluation continues todebate how much attention should be paid toFrench idiosyncrasies.

The Italian linea guida per un codice deonto-logico del valutatore focuses on the evaluators (41).It clearly stresses their overriding responsibilities.The contents of the majority of the DeGEval, JCand SEVAL propriety standards feature in the lineaguida. It is little known in Italy, probably because ofthe relatively small evaluation market. The ItalianEvaluation Society is also considering augmentingthe linea guida with its own standards (Bezzi, 2002).

The Finnish Evaluation Society also recentlydeveloped its own standards (FES, 2002). Theyclearly focus on ‘truth’ and ‘community’. Such


(38) For an overview see Toulemonde, 2000; p. 355. The DeGEval website features a constantly updated link list. Available fromInternet: http://www.degeval.de/weltweit.htm [Cited 29.10.2003].

(39) The ‘propriety’ standards in the US version do correspond to the term honnêteté, but the definition of ‘propriety’ is much moreobjective than the ethical appeal the French standards make.

(40) The US original is also very detailed, in contrast to the German and Swiss versions.(41) This corresponds to the Guiding principles of the US Evaluation Society (Section 2.3).

standards resemble ethical precepts. This seemsto be an important consideration for the Finnishevaluation community and its mentality andreflects the origins of the Finnish standards. Stateinstitutions played a major role in their establish-ment. This is not the case for the other nationalevaluation standards and has obviously influ-enced the Finns’ alternative approach.

The United Kingdom Evaluation Society’s Guid-ance for good practice in evaluation (UKES, 2002)focuses on the evaluation process, particularly oncooperation and consultation between the variousinterest groups. It contains an individual section foreach main stakeholder group involved in evalua-tions: evaluators, commissioners, participants. Italso provides guidance and information for partici-pants in self-evaluations. This distinction is notmade in any other set of standards. Furthermore,the UK standards contain phrases such as ‘itwould be helpful’, less binding than the prescrip-tive sollen (should) of the DeGEval standards. Theguidelines are still the subject of internal negotia-tions and have not yet been finalised.

Standards for Europe exist alongside those ofvarious national evaluation societies. They resemblethe DeGEval standards but are designed for otherpolicy areas than VET, such as development aid.One example is the Danida standards (Danida,2001). Codes of different national professionalorganisations are also available and can overlap withevaluation. They are not discussed here but Beywland Widmer (2000) provide a comprehensive survey.

The European Commission has its own guideand the International Labour Office (ILO) has guide-lines which may be relevant for VET in Europe. Thefollowing paragraphs describe these publications.

Evaluating EU expenditure programmes: aguide was financed by Directorate General XIX. Itwas conceived as an aid for evaluating manydifferent kinds of evaluands, including VETprogrammes and projects with entirely differentcontexts and contents, and so can be consideredpertinent. It identifies the key issues of evalua-tions as relevance, efficiency, effectiveness, utilityand sustainability (European Commission, 1997,p. 18). One of the guide’s main focuses is evalua-tion management and preparation. Selection ofevaluators is one part of this. The guide is verydetailed and comments on many aspects of eval-

uation which also feature in the standards, but inthe more substantial form of a handbook

Guidelines for systems of monitoring and eval-uation of ESF assistance in the period 2000-2006’(European Commission, 1999a) was published bythe Directorate General for Employment, Indus-trial Relations and Social Affairs. European SocialFund programmes often include continuingtraining schemes. Some of these are the ‘training’part of the ‘measure of assistance to persons’programme category, and the ‘teacher training’and ‘creation of training/education curricula’parts of the ‘measures of assistance to structuresand systems’ category. Therefore these guide-lines can be classified as directly relevant to VET.

The guidelines stipulate that evaluations shouldfollow the logical framework of intervention. Thatmeans that indicators should be used to measurethe input, output, outcome and impact of aprogramme. The guidelines clearly state whichindicators should be adopted for each stage of thelogical framework. They specify which (quantita-tive) parameters should be selected and howmuch data needs to be collected (N4/U4). Theguidelines also advocate including collection ofqualitative data as part of the evaluation process.The analysis of the evaluation context (G2/A2)should cover the ‘operational context’ and the‘conditions of implementation’. The guidelinesexplicitly define certain standards: evaluation time-liness (N7/U7); formal agreement (F1/P1); unbiasedimplementation (F4/P4); efficiency (D3/F3); anddisclosure (D5/P5). Thus, most of the DeGEvalstandards are included in the guidelines, and someare treated more thoroughly. Evaluation utilisation(N8/U8) reflects the use of findings from the ex post evaluation. This particularly applies to indi-cator definition and evaluation scheduling.Mid-term evaluations should be formative and ex post evaluations summative.

The MEANS handbooks (European Commission,1999b) deal with the entire range of potential eval-uands from EU politics. Training and employmentare most relevant for VET. So the MEANS criteria,which actually originated in regional politics, havealso been implemented in other EuropeanCommission General Directorates such as DGEmployment. The MEANS handbooks stipulateeight quality criteria (42) for evaluations (idem, Vol. 1,


(42) The term ‘criterion’ is somewhat misleading. The term ‘assessment dimension’ would be more accurate.

p. 169): meeting needs; relevant scope; defensibledesign; reliable data; sound analysis; credibleresults; impartial conclusions; and clear report.These are explained in detail, corresponding largelyto the specifications of the DeGEval standards andtheir US predecessor. The MEANS handbooksfocus on the ‘workmanship’ of evaluation methods;ethics play a negligible role. The MEANS collectionhas tremendous influence on quality discussions inthe evaluation of EU-financed programmes, partic-ularly in countries lacking their own evaluation stan-dards. European countries are very familiar with therequirements found in the MEANS handbooks (43).Some national governments, such as the Finnish,have adopted these criteria (Uusikyla and Virtanen,2000). The EU Commission also uses the MEANScriteria to assess evaluation reports, grading themfrom one to four (44). Because the MEANS criteriaare also implemented for intermediary reports, theycan acquire the character of minimum standards,although they also consider unforeseen circum-stances. Since the MEANS criteria are similarlyconcretised in the DeGEval, SEVAL and JC stan-dards, collaboration on further development couldbe beneficial.

The International Labour Office (ILO) hasdevised guidelines for the external evaluation ofits own programmes, including vocationaltraining. The ILO is active in many developingcountries, as well as in new European Statessuch as Poland, and its guidelines apply toEurope. They specify evaluation aspects whichshould be given the most consideration: effec-tiveness, relevance, efficiency, sustainability,causality, unexpected effects, alternative strate-gies and specific ILO concerns. They regardstakeholder involvement and the role of evalua-tors to be particularly important aspects ofapproaches to independent evaluations. From anorganisational perspective, they focus on compo-sition, schedules and information sources. Thetopics ‘qualification profiles and responsibilities ofexternal evaluators’ and ‘role of the stakeholders’are addressed in further sections. To summarise,the ILO guidelines are much more concrete anddetailed than the DeGEval standards, which are,

however, enhanced by the highly extensive andcomprehensive material in the JC standards.

The Public Management Service’s Best PracticeGuidelines for Evaluation are intended to helpOECD Member States improve the utilisation ofevaluations in performance management systems.They primarily consult those people responsible forthe political control of evaluations (governmentalorganisations, politicians and leading publicservants). There are a total of nine guidelines, eachwith two to five itemised paragraphs listing recom-mendations. They impose strong demands forinvolvement of stakeholders. The development ofan evaluation culture is also seen as an importanttask at the level of supranational organisations. ThePUMA guidelines share many features with the‘standards’ and give contributing input for deci-sion-making a far higher priority than other objec-tives. They create distinct tension between the deci-sion-maker approach and the participatory one.

No empirically supported statements can bemade on the scope and depth of the applicationof standards in Europe. A few prominent exam-ples are known to the authors.

Switzerland has a leading position in Europewith its far-reaching evaluation culture and the useof evaluation standards. The work of Widmer hascreated a relatively dense information base. In arecent publication (Widmer, 2003) he lists sixmeta-evaluations (five from Switzerland) whichused the JC or SEVAL standards to assess several(in one case, 43) evaluations. However, none of thethree comparative case analyses Widmerconducted himself, covering a total of 18 evalua-tions, deals directly with VET programmes.

German VET evaluations use JC standards inisolated cases (Peltzer, 2002). Further examplesof the application of JC standards have beenfound in Europe outside the VET context(e.g. Jacob and Varone, 2002). However, theyhave been consulted relatively rarely in Spain,although a Spanish translation is available. InSpain they are also seldom employed formeta-evaluations (Bustelo Ruesta, 1998). In theUS, where the JC standards have been estab-lished longest, hardly any publications exist on


(43) They are frequently included in meta-evaluations, although often remarkably cursorily. One exception is Polverari andFitzgerald, 2000; p. 30. These meta-evaluations also often refer to the JC standards, although again usually without explicitlyand specifically citing individual standards.

(44) Stated by European Commission employees at the European Evaluation Society Conference (EES, 2002).

systematic surveys on the adoption and applica-tion of the standards (45).

In conclusion, we can say that there are noserious discrepancies or contradictions betweenthe European evaluation norms presented hereand the DeGEval standards. The various sets ofstandards simply have separate focuses and areconcretised differently. Some are formulatedgenerally, others contain more precisely definedrules. Many of the standards discussed abovecorrespond to central elements of the DeGEvalstandards. This makes them a suitable specialistevaluation reference, along with the JC standards.

3.3. Transferability of evaluationstandards to VET

The US standards were originally developed for theeducational sector. Most of the examples in the JChandbook come from elementary schools, highschools, colleges and universities, but also fromvocational training and social work. Perusal of theterms and definitions of the 30 individual standardsreveals that only one contains educational termi-nology. This is the standard JC-P1 (service orienta-tion support) which demands that evaluations ‘helpensure that educational and socialisation objec-tives are appropriate’ (46). Because of its specificnature this standard is not included in the Germanand Swiss standards, which are designed to begeneral and applicable to all evaluand fields.

The handbook (JC, 1994) illustrates each JCstandard with positive and negative examplesand their analyses to clarify the actual text anddetailed guidelines. Seven of these examplesinvolve case studies from in-company vocationaltraining and are therefore directly applicable toVET (Widmer and Beywl, 2000, p. 249).(a) In the illustration contained in JC standard U5

(report clarity) a vocational training planningteam commissions an evaluation of a training

programme and expects a written report withsuggestions for improvement. The reportingcould have been better.

(b) JC standard U7 (evaluation impact) gives theexample of a formative evaluation of perfor-mance-based training in the industrial sector. Achecklist devised by trainers and evaluatorshelps record the (altered) behaviour of trainees.The overall design of this evaluation was excel-lent. The fact that all stakeholders remainedmotivated until the very end is a major success.

(c) JC standard P2 (formal agreement) is illustratedby a case where the staff training manager ofan enterprise has sought the advice of an eval-uation consultant. The consultant has deviatedfrom the agreed evaluation plan. She hasconducted a written survey of graduates of amanagement training course, something thatwas not originally stipulated. In this case thecontract between the commissioner and theevaluator should have been updated.

(d) JC standard P4 (human interactions) containsthe following illustration: an internal evaluator isto collect information on the training needs ofsecretaries in all units of the company, in orderto test the effectiveness of the currentprogramme and propose changes. She


(45) A panel discussion on Applying program evaluation standards took place at the American Evaluation Association annualmeeting sessions in November 2001. In her paper Standards-based processes for program evaluation at SERVE,Mary Sue Hamann presented several SERVE guidelines. SERVE is an educational organisation serving six American States.The four binding guidelines on evaluation bids, evaluation contracts, evaluation designs and evaluation reports are based onthe JC standards. In his paper on Making use of evaluations standards routine Ken Town from the University of Southern Mainedescribed a long-term initiative of the Institute for Public Sector Innovation (IPSI). The intention is systematically to improveevaluation competence among IPSI employees and in the organisation as a whole. The initiative is based on the JC standards.Most of the employees are not evaluation experts so the multidisciplinary JC standards are a useful resource.

(46) We advocate restoring a standard of this nature to VET evaluations (Section 6.5).

Table 4: VET examples in the JC standards (1994/2000)

No Standard term

JC-U5 Report clarity

JC-U7 Evaluation impact

JC-P2 Formal agreement

JC-P4 Human interactions

JC-A1 Programme documentation

JC-A2 Context analysis

JC-A12 Meta-evaluation

Source: JC, 1994

conducts focus group interviews but theseantagonise a leading personnel manager andhave to be abandoned. Forming an advisorycommittee for stakeholders at the start of theproject would have helped avoid this problem.

(e) JC standard A1 (program documentation) isillustrated with an evaluation of anin-company technical training programmeincluding computer-based training. Themembers of the supervisory panel watch ademonstration of how computer-basedtraining works. This was the only way toacquire the knowledge needed to tackle thecore questions of the evaluation.

(f) JC standard A2 (context analysis) is explainedthrough an evaluation of the effectiveness ofa training programme for sales representa-tives. The interviewees were members ofvarious company departments, and the find-ings of several successive focus groupsurveys were very different. The evaluatorwas initially unaware of the personnelchanges in the departments which hadproduced the inconsistent assessments. Thisimportant contextual information wasrequired at the planning stage.

(g) The following case exemplifies JC standardA12 (meta-evaluation). An organisation wantsto initiate a series of follow-up evaluations toimprove its training courses. A meta-evalua-tion revealed that most of these follow-upevaluations were not completed. Themeta-evaluation inspired new impetuses for afuture evaluation system.

This brief overview of the JC standards rele-vant to VET demonstrates that they all have afundamental similarity. All the examples portrayevaluations of individual education and trainingprogrammes within enterprises. However, in VET,evaluations of larger programme systems are justas relevant as, for example, evaluations ofgovernment initiatives or VET subsystems, or ofEU-wide support programmes. The VET spec-trum is manifestly broader than the examplesfrom the US handbook suggest (cf. this paper’sOutlook, Section 7.3). The following chapter ofthis paper will examine what other VET require-ments need to be addressed.

One of the standards’ fundamental tenets isthat they can be applied to a broad spectrum ofpolitical fields. Stockdill (1986) interviewedexperts to investigate whether the JC standardsfor evaluation were appropriate for the US busi-ness world. He established that the standardswere also suitable for personnel development andthe evaluation of other human resource develop-ment tasks in the profit-making sector. The orig-inal US standards began in the field of educationand were then applied to programme evaluationsin other policy areas in Europe. The diversificationevidently influenced the DeGEval and SEVALadaptations.

Since the evaluation standards originated ineducation and are meant to be applicable to allpolicy areas, we must assume that they are alsovalid for VET (47). We do not consider it necessaryto alter the names and texts of the general evalu-ation standards. We feel that the explanations,and particularly the illustrative examples, whichillustrate and discuss good and bad applicationsof standards would be particularly beneficial,making standards much more accessible to VETspecialists (recommendation 6).

An important connection exists between evalu-ation standards and programme standards. Eval-uators must systematically develop scientificfindings and academic theories, fundaments ofthe evaluand field and demands for quality andharness them in the planning of evaluations, torespond to the requirements of standards G1/A1(description of the evaluand) and G2/A2 (contextanalysis). We also suggest inserting an additionalevaluation standard specifically for VET tosupport the utilisation of scientifically founded,specialist or professional programme standardsin VET evaluations.

This standard, Quality orientation support invocational training, could read as follows: ‘Evalu-ations should assist VET policy-makers andprogramme managers to meet quality require-ments within the vocational training sector (VETstandards). These particularly include standardswhich require evaluations to consider the needsof target groups, social partners and society,have a scientifically founded theoretical andteaching concept, help shape the structure and


(47) A current textbook on continuing training evaluation (Reischmann, 2003) features the DeGEval standards, explicitly validatingtheir use in VET.

organisation of political education and helpmanage educational processes and ensure theprofitability of VET activities.’ The explanatorynotes on this standard should mention well-known, recognised VET standards and point theway to the most relevant sources.

An additional note to JC standard P1 (serviceorientation) should mention that evaluations are

meant to support decision-makers, sponsors andprogramme managers in tailoring their VET poli-cies and programmes to the needs and situationsof the target groups, and to promote gendermainstreaming and social inclusion (recommen-dation 8).


The following chapter will first describe how thediscussions were implemented in the two coun-tries. The third section will summarise the resultsof the debates as hypotheses.

4.1. Focused events with theGerman Federal Institute forVocational Training (BIBB)

The BIBB (48) coordinator for additional qualifica-tions, learning organisations and process orienta-tion organised a series of workshops fromautumn 2000 on concomitant-research methods.The sessions focus on research support (49) forpilot projects on learning organisations, additionalqualifications, process-oriented vocationaltraining and cooperation between learning loca-tions. The pilot projects commissioned by BIBBtest the practicality of innovative developments ininitial and continuing vocational training. The aimis to translate their findings into vocationaltraining practice (50). Their dual purpose is toimprove the areas of vocational training practicecovered and, at the same time, gain insight intothe evaluand’s field.

During the second workshop in April 2001, thetopic of standards for evaluation was introduced ina lecture (51). The short discussion focused on theconflict of roles consulting researchers face fromthe various expectations of stakeholders such as

administrators, practitioners and academics. Howcan close contact between researchers – indis-pensable for the application of findings in learningorganisations – be guaranteed while still ensuringthat researchers remain impartial and indepen-dent (52)?

Because of their significance for the furtherexchange of experiences, standards for evalua-tion, adopted in the interim by the DeGEval, werethe main topic of the third meeting, held in Frank-furt am Main in October 2001. The keynote wasProf. Klaus Jenewein’s (53) detailed meta-evalua-tion lecture relating to a pilot project, Develop-ment of occupational skills via a contract typeconcept for initial vocational training. To test thestandards, he applied them to his concomitantresearch and performed a meta-evaluation of hiscompleted research. Jenewein concluded thatmost DeGEval standards can be applied to eval-uations of vocational training pilot projects. Heclaimed that one problem was the plethora ofobjectives which concomitant research into pilotprojects must pursue (including development,summary assessment, promotion of main-streaming, legitimisation), particularly concerningthe demands the standards make for impartialityand propriety in testing (F4/P4 and F3/P3). Hemaintains that, since pilot projects test innovativeideas and vocational training content, rigiddemands for valid and reliable data collection andassessment often cannot be met (G5/A5 andG7/A7). He also doubts it is possible to measure

4. Dialogue with German and Austrian VET experts on evaluation standards

(48) BIBB was founded in 1970 pursuant to the Vocational Education and Training Act of 1981. The federal public law body issupported by the federal budget. It investigates initial and continuing vocational training practice in enterprises. This involvestesting new approaches to initial and continuing vocational training and, in conjunction with the social partners, settingcompany regulations on vocational training and career advancements.

(49) Approximately 20 people participate in the workshops. They are usually concomitant researchers with many years experiencein initial and continuing vocational training.

(50) The institute is legally obliged to promote pilot projects and their supporting research. This is specified as an objective in itswork programme.

(51) Wolfgang Beywl, talk and transparency presentation on Evaluationen an lernende Organisationen anschlussfähig machen –Hinweise und Standards für Programmevaluationen aus der Evaluationspraxis (Making evaluations of learning organisationscompatible: instructions and standards for programme evaluation resulting from practice).

(52) Dorothea Schemme, minutes of the second session of the concomitant-research-methods workshop on 26 April 2001 inStuttgart.

(53) Prof. Klaus Jenewein works in the Vocational Education and Technical Didactics department of the University of Karlsruhe,Germany.

the cost-benefit ratio, required to establish theefficiency of the evaluation (D3/F3) and believesthe problem is aggravated by, or even conflictswith, the basic values (ultimately criteria) of thevarious stakeholders (e.g. target groups, spon-sors, companies, schools).

The proceedings of the third workshop empha-sise the analytical distinction between aprogramme and its evaluation, stated in the intro-duction to the DeGEval standards. Thisdichotomy provides the opportunity to specify therole of concomitant researchers and can helpimprove transparency and awareness of thedilemmas outlined by Jenewein. We can defineroles and requirements for the interactionbetween programme managers and evaluationmanagers, making the performance of both moreverifiable and controllable. Given the dualpurpose of pilot projects – to improve practiceand gain insight – ‘pragmatic orientation, trans-disciplinary procedures and a reduction inapplied research usually have priority over theprecise construction of perfect scientific use oftools known from basic research into single disci-plines’ (54).

In May 2002 BIBB scheduled an internal collo-quium, open to its entire staff, on Evaluation stan-dards and their application in vocational training.The aim was to introduce dialogue on the adap-tation of the DeGEval evaluation standards forvocational training as implemented by the BIBBitself or subcontracted. Participants were to airquestions on the standards, establish furtherdiscourse requirements and discuss how thedialogue should be continued.

Some 30 BIBB employees from many differentinitial and continuing vocational training fields tookpart in the two-hour event. Most participants workprimarily or partly as consultants or evaluators.After an introduction to the DeGEval standards (55),the discussion focused on the following aspects:(a) relations between quality management and

evaluation/potential for synergy;(b) differences and overlaps of concomitant

research and evaluation;(c) validity of the standards for self-evaluations;

(d) suitability of the standards for comparativeevaluations;

(e) suitability of the standards for meta-evalua-tions;

(f) fears that application of the standards mighttie down too many resources;

(g) warnings about (potential) contradictionsbetween individual standards;

(h) lack of guidelines, frequent errors and illustra-tive examples in the JC unabridged version;

(i) intercultural transferability of standards origi-nating in the US.

Points (a), (b) and (c) concern the boundarybetween evaluation and other forms of academicsupport for, and assessment of, programmes andprojects in vocational training. Such support isthe concomitant research approach commonlypractised in vocational training, although it ismethodologically less elaborate than evaluation,since specialist textbooks are rare. However,quality management or quality assurance, or evensystematic development and testing of qualityalong the lines of consumer reports, are certainlyof interest to vocational training. After all,everyday language equates the self-evaluationapproach with self-assessment, although inGermany the former has been much more sharplydefined and presented in several monographs.

Points (d) and (e) concern the scope of validityof the standards for comparative evaluation andmeta-evaluation of programmes or projects. Thefact that this was questioned makes it clear thatthe DeGEval standards – particularly the terse25 individual standards – are not self-explanatory.We recommend always consulting explanationsand the additional US sources as a supplemen-tary reference.

Points (f), (g) and (h) cover queries and criticalobservations on the evaluation standards. It isclear that the evaluation standards (or the codesof ethics) present fundamental dilemmas. Profes-sional debate and a feedback process are neces-sary to ensure that they are reliably put into prac-tice. On the one hand, evaluators feel thatover-demanding or operationalised standards ora high density of rules might overtax practicalevaluations. The standards explicitly refer to this


(54) Dorothea Schemme (BIBB), minutes of the third meeting of the concomitant research methods workshop of 12 February 2002.(55) The introduction followed this basic structure: a brief definition of ‘evaluation’, the origins and system of the standards, an

example of the implementation of a standard.

danger, particularly in the individual standardD1/F1 (appropriate procedures). On the otherhand, workshop participants stressed thatapplying the individual standards to practicalevaluation projects could lead to contradictorydemands. In such cases compromises will haveto be reached and priorities assigned to‘competing’ standards. This is another problemwhich the DeGEval and JC standards mentiondistinctly. Finally, BIBB specialists would likemore concrete and more palpable standards togive evaluators, in particular, as much tangiblehelp as possible and to tailor the standards foruse as a (self-)education programme.

Point (i) amounted to a brief expression of thegeneral concern about the transferability of stan-dards from a different culture and society.

4.2. Focused meeting with theAustrian Federal Institute forAdult Education (BifEb)

Strobl am Wolfgangsee in Austria hosted athree-day colloquium of around 14 hours from 2to 4 April 2002. Its title was Quality developmentin adult education: evaluation standards andmethods. The Federal Institute for Adult Educa-tion (BifEb) organised the event (56).

Approximately a quarter of the 17 participatingspecialists work primarily in the vocationaltraining field. Most are trainers who devise orconduct continuing training courses themselves.A few are external evaluators of general orin-company continuing training. The participantshad little or no prior knowledge of the standards.They familiarised themselves with the system andcontent of the standards through lectures, indi-vidual and partner activities on the text of theDeGEval standards, and application to their own,usually internal, evaluands. Their primary concern

was to put the standards into practice when plan-ning their own evaluations and commissioningthem. The course focused on evaluation control,data collection and interpretation and reportingand utilisation of findings. Two tools wereemployed to assess evaluation standard suit-ability in vocational education and continuingtraining:(a) a poster survey on application of the stan-

dards; towards the end of the seminar, partic-ipants were asked to note their responses tothe following questions on big posters:(i) what consequences do you think the

standards should have for your work?(ii) what steps should managers of contin-

uing training institutions take with regardto the standards?

(iii) how should adult education and contin-uing training legislation and public spon-sors react to the standards?

(iv) how should DeGEval address standardsin the area of adult education and contin-uing training?

All 17 participants contributed to the postersurvey (57). Their comments were subse-quently discussed in a plenary session, whichmade it possible to acquire a deeper under-standing of some points and to ascertain theintention of each remark;

(b) short printed questionnaires on the suitability ofthe DeGEval standards; questionnaires weredistributed to gain insight into how suitable theparticipants thought the DeGEval standardswere for adult education and vocational training.They addressed the following topics (58):(i) arguments for the three evaluation

purposes (preparation for decision-making,improvement and gaining insight);

(ii) distinction between formative andsummative activities;

(iii) interpretation of the standards asmaximum standards;


(56) BifEb was founded in 1956 and is the training institute for adult education, supported by the Austrian Federal Ministry ofEducation, Science and Culture (according to Article 11, Paragraph 1 of the Adult Education Promotion Act of 1973). Itemploys 30 members of staff with and without educational qualifications. It targets multipliers inside and outside traditionaladult education. Its main focuses are vocational and further training of staff, training management, training consultancy,programme creation, organisation, supervision, evaluation and new approaches to teaching and learning. Available fromInternet: http://www.bifeb.at [Cited 29.10.2003].

(57) The responses were incorporated into the findings of the event.(58) The topics were chosen on the basis of the questions posed during the CFT 13 subproject and complemented by points of

the discussion during the first working group meeting on the third Cedefop report on vocational training research in Europe, 28February to 1 March 2002, Thessaloniki.

(iv) unsuitable individual standards;(v) missing individual standards;(vi) suitability of terminology;(vii) limits of evaluation;(viii) European dimension;(ix) standards revision processes.

Seven participants completed and returned theforms (59). The comments made it clear that it wasdifficult for the participants to answer veryspecific questions. This was particularly the casefor questions concerning unsuitable individualstandards, missing individual standards and theEuropean dimension (60).

4.3. Conclusions from thedialogues

Neither the Austrian nor the German experts sawany fundamental restrictions to the application ofthe DeGEval standards to the field of initial andcontinuing vocational training. Members of thewidely varying academic cultures involved in VETevaluations and research regard certain individualstandards as unfathomable, insufficiently defined,vague, contradictory and possibly irrelevant.However, they acknowledge that the standards,and the theories and experience they embody,offer tremendous learning and developmentpotential for evaluations and impact investigationin initial and continuing vocational training.

There is accepted applicability in vocationaltraining. No doubts were expressed on the stan-dards’ transferability to vocational training as anevaluand with specific institutional arrangements(for example, the dual system of vocationaltraining in Germany). The experts do not proposespecific adaptation, although they would like tosee certain standards illustrated throughconcretely demonstrated examples from initialand continuing vocational training.

There is uncertainty as to validity for differentacademic cultures. Evaluators (researchers) who

have been working for many years within aparticular discipline or theoretical tradition haveinitial concerns that their methodology might notbe adequately covered by the evaluation stan-dards. The researchers working for and collabo-rating with BIBB felt this way. We assume thatrepresentatives of other schools may also notinitially consider how the standards might applyto their preferred approach. For example, someresearchers consider the use of experimental andquasi-experimental design the litmus test of thequality of evaluations (61).

There is ambivalence with regard to maximumstandards. The conception of the DeGEval stan-dards as ‘maximum standards’ with a primarilyorienting function, intended to inspire dialogue onthe quality of evaluations, was received ambiva-lently. Some ascertained (although maybe hesi-tantly) one advantage of maximum standards tobe that they can refer to many differentapproaches and types of evaluations and impactinvestigations. However, the representatives ofcertain ‘schools’ regretted the lack of theprescription of obligatory minimum standards.Concomitant researchers, for example, wouldwant the project evaluators to be vocationaltraining specialists and perhaps to haveconducted their own independent research basedon vocational training theories, or to havepublished articles in this field. However, advo-cates of more experimental approaches woulddesire minimum requirements including suchfeatures as control group designs, or specificmandatory procedures for random selections.

There are concerns that links to evaluationtheory are not sufficiently explicit. The notesexplaining the standards state that there are‘numerous different approaches to professionalevaluation’ and that these vary markedlydepending on epistemological approach, disci-pline and professional ethics. The pluralistic foun-dation of the standards is not immediately clearto experts the first time they read them. They


(59) The results have been incorporated into the proposals.(60) No question was posed as to whether the standards are equally applicable to evaluations in the micro-, meso- and

macro-areas, since the experiences of most of the participants in Strobl have mainly been in the micro-area (organisinglearning processes, establishing curricula), rarely in the meso-area (evaluations of [external] company continuing trainingsystems) and not at all in the macro-area (vocational training policies and their effects on the whole of society and the generaleconomy). That also explains the difficulty they had expressing an opinion on the European dimension.

(61) We received a refusal for the expert surveys discussed in Section 5. The reason given was that our questionnaire did notinclude explicitly the relevance of (quasi)experimental designing.

often worry that the standards will have a restric-tive effect on the approach they advocate, oreven exclude it entirely. Besides, the variousfunctions of evaluation developed by the newerevaluation theorists (e.g. proactive, clarifying,interactive, monitoring and impact evaluation[Owen and Rogers, 1999]) and Stufflebeam’stypology with around 20 evaluation models (2001)are not sufficiently recognised as linked to thepluralistic function of the evaluation standards(recommendations 3 and 7).

There is a perceived conflict of roles in terms ofutility, accuracy and independence. Universityacademics in particular, but also those at publicand private research institutes, feel that they facea strong conflict of interests. The standards, andtheir four main tenets of utility, feasibility,propriety and accuracy, have increased thisawareness. When public or private bodiescommission evaluations and impact analyses,they usually expect immediately utilisable find-ings. Sponsors and heads of facilities where thedata is collected prefer streamlined proceduresand tools which do not disrupt current initial andcontinuing vocational training. Data protectionregulations also impose some major limitations,particularly when the performance of teachingpersonnel is directly or indirectly described orjudged by evaluation processes (62). Furthermore,the accuracy group imposes strict requirementsof empirical social and economic research on, forexample, the validity of tools and the reliability ofdata collection. Descriptions and assessmentsmust be independent. All these demands comingfrom different quarters may compete with oneanother in evaluation practice, and situationscould arise where they cannot be reconciled. Theevaluation standards expose these contradictionsbut do not propose any general solutions.

There are limitations to the possibility ofself-evaluation. In Germany, the concept ofself-evaluation has been widely propagated in thesocial services and school system, partly byseveral monographs and manuals (63). The situa-

tion in Austria and Switzerland is similar.Non-school initial and continuing vocationaltraining programmes have also started to intro-duce it. The participants in the Austrian seminar,who primarily work as evaluators in small andmedium-sized continuing training institutions,expressed particular interest in the self-evaluationapproach. Teachers can implement it at themicro-level of teaching and learning processes. Ithas few extra costs, e.g. for external evaluationconsultancy. Well-versed professional experts areinitially uncertain whether the evaluation stan-dards also apply to self-evaluation. The fact thatthey do not, and that DeGEval has developedseparate self-evaluation standards, becomesclear from the explanatory notes, but has oftenbeen the subject of inquiries. We should askwhether other European countries are familiarwith, and use, self-evaluation approaches to VETevaluation or whether they rely solely on externalor internal independent evaluation (64).

The brief summary of standards is perceived tohave limited usefulness. Interested parties oftenread only the brief summary of standards, whichis approximately three pages long. Explanationsof the individual SEVAL standards cover aboutone page each. The DeGEval standards areaccompanied by similar explanations by theStandards Commission, but these are not aformal component. The JC standards containseveral more markedly operationalised guidelinesin addition to the explanatory notes for eachstandard. The committee recommends compli-ance with these guidelines. They include a list offrequent errors and a few annotated examples,which help elucidate the applicability of a certainstandard. During the workshops several peopleexpressed the desire for more comprehensivelyannotated specifications similar to the JC publi-cation. If possible, these should be supported byillustrative examples of evaluations in the field ofinitial and continuing vocational training.

The clarifying function of the standards waspositively received. Many respondents praised


(62) The DeGEval standards, like the JC and SEVAL standards, thus emphasise that they are not suitable for personnel evaluations.For that purpose the JC published the Personnel Evaluations Standards as early as 1984. It did not publish its Student Evalu-ation Standards until 2002.

(63) It has major similarities to the concepts of empowerment and collaborative evaluation. If, and when, evaluation specialists areinvolved long term in these projects, they undertake a role as teachers and facilitators (Fetterman, 2000).

(64) DeGEval’s Social Services working group has its own set of standards specifically tailored to self-evaluation. Available fromInternet: http://www.degeval.de/ak_soz/index.htm [Cited 30.10.2003].

the fact that the explanatory notes on theDeGEval standards defined terms. These include,for example, the difference between stake-holders, addressees and users. There is ananalytical distinction between the purpose of theevaluation (and the evaluation approach) on theone hand and the aims of the programme/eval-

uand (and its approach) on the other. This facili-tates making an analytical division between therole of evaluation and responsibility for theprogramme, particularly during formative evalua-tions or concomitant research. These valuedaspects of the standards underline the impor-tance of annotated explanations.


A further element of the study into evaluationstandards was an e-mail survey of expert opin-ions. The poll addressed quality requirements forevaluations in VET. First of all, this chapteroutlines the questions and the process; thesampling method is also described. The answersto questions 5 to 7 provide an overview of theexperts’ attitudes towards evaluation standards,their preferences and their familiarity with varioussets of standards. Questions 8 to 12 asked forcritical comments on the advantages and disad-vantages of the standards and on the establish-ment of basic values which the evaluation qualityrequirements should contain (65).

The survey yielded answers to the followingtwo central questions:(a) does Europe need a codified rule book in the

guise of evaluation standards to ensure andincrease the quality of VET evaluations?

(b) what cultural and professional values andrequirements should be addressed in such acode?

Experts on evaluation and/or vocationaltraining from various European countries wereapproached by e-mail. We had had no priorcontact with most of these experts. Most of themwere located through the support of nationalevaluation associations and members of theboard of the European Evaluation Society. Wealso utilised ERO-CALL, a mailing list mainlyfeaturing VET experts, to invite people to partici-pate in the survey. The questionnaire was sent asa text file. We asked respondents to recommendother experts for the survey. We subsequentlycontacted them.

The three-page questionnaire is written inEnglish and consists of a total of 15 items. Seven

are closed questions (one with several sub-ques-tions). The eight open questions gave respon-dents the opportunity to state their opinions andprovide feedback.

The e-mail questionnaires sent to the expertswere accompanied with the request to fill themout electronically and return them by e-mail orto fill them out by hand and fax them. Wechose to conduct the survey by e-mail sincemost initial contacts had been made via thismedium and because it accelerated the proce-dure. The questionnaires went out late inAugust 2002 and the deadline for their returnwas 4 October 2002.

We used SPSS to process the quantitativedata. We analysed the content of the qualitativedata on the open questions. Questions 13 to 15were merely devised to assist organisation ofthe study (66) so these answers do not featurein this report. The following tables include thetext of the original questionnaire for clarity’ssake.

5.1. Profession and nationality ofrespondents

Limited resources restricted the survey to a smallsample from the outset, so it cannot claim to berepresentative. 19 of the 30 experts who receiveda questionnaire replied (67). This is a satisfactoryresponse rate (68). The total of 19 returned ques-tionnaires can be seen as a pool of trends andindications that can be scrutinised in conjunctionwith other investigations to make valid interpreta-tions.

5. E-mail survey of evaluation experts in Europe

(65) See the questionnaire in Annex 2(66) Contact addresses, other contact recommendations, hints on relevant literature.(67) Please see the list in the Annex 2.(68) Regrettably, only one person from the UK responded by the deadline (Figure 3).

Around half the respondents were evaluators,while three commissioned or sponsored evalua-tions. One person was a programme manager ora member of programme staff. Six respondents

had posts outside evaluation. One was an evalu-ation handbook author, four were researchersand one a regional administrator dealing withevaluations.


Table 5: Primary position in evaluation

Your primary position Frequency Percentage in/to Evaluation

Evaluator 9 47.4

Client/sponsor/commissioner 3 15.8

Programme director/programme staff 1 5.3

Other 6 31.6

Total 19 100.0


Table 6: Respondents’ professional background

What is your main Frequency Percentage professional background

Economics 3 15.8

Social and political sciences 12 63.2

Liberal arts including pedagogic 4 21.1

Total 19 100.0


The professional background of around two thirdsof the respondents was social and political sciences.Four respondents represented the liberal arts(including teaching) and three were economists. Tworespondents also entered natural sciences as asecondary field. No engineers participated. Seven ofthe nine evaluators were social and political scientists.

Three respondents each identified themselveswith the German and Belgian professionalcultures. France and the Netherlands were eachnamed twice. One respondent each citedDenmark, Finland, Sweden, Norway, NorthernIreland, Luxembourg, Portugal and Spain. Onerespondent named the culture of the EU.

Table 7: Respondents’ relation to VET

What is your relation Frequency Percentage to VET?

VET is my main/most 3 15.8 relevant working field

VET is one of my most 6 31.6 relevant working fields

VET is a known field for me but I am 10 52.6(nearly) not active in

Total 19 100.0


Over half (10 respondents) of the internationalexperts said that they were familiar with VET butthat they were rarely, if at all, involved in it.Around a third (six) identified VET as a relevantworking field for them. Only three respondentsnamed VET as their main or most relevantworking field. This distribution suggests that eval-uation experts in the field of VET who also have athorough knowledge of standards or other evalu-ation quality norms hardly seem to exist or aredifficult to identify.

Of the nine respondents who named VET astheir main or at least one of their most relevant

working fields, five are evaluators. Of theremaining 10 for whom VET is not a central field4 are evaluators.

5.2. Assessment of existingevaluation standards

The first block of questions covers the degree offamiliarity with various guidelines for evaluationand attitudes to evaluation standards in generalas well as to the two alternatives, minimum stan-dards and maximum standards.


Figure 3: Respondents’ identification with national professional cultures

Denmark(n=1) Norway

(n=1)

Sweden(n=1)

Finland(n=1)

Germany(n=3)

Spain(n=1)

Portugal(n=1)

France(n=2)

Belgium(n=3)

Northern Irland(n=1)

Luxembourg(n=1)

The Netherlands(n=2)

The national professional culture you mostly identify with European Union (n=1) (n=19)


The respondents reported on how familiar theyare with various current standards or guidelinesfor planning and implementing evaluations. Theexamples given were the US joint committeestandards for evaluation and the Guidelines forevaluators published by the American EvaluationAssociation, the European Commission’s MEANSCollection, the SEVAL standards, the DeGEvalstandards and the OECD’s Best practice guide-lines for evaluation. Respondents could alsoname other standards or guidelines.

A total of 15 respondents identified at leastone standard with which they were familiar. Thebest known were the Joint Committee Stan-dards and the Guidelines for Evaluators. Twelveof the respondents were familiar with theformer and 14 with the latter, to at least somedegree. They were followed by the MEANSCollection, OECD’s Best practice guidelinesand the DeGEval standards.

Four experts mentioned one other set ofguidelines, a theory or literature that they consult,

and two experts named two publications. Thesewere quoted as:(a) the Finnish Evaluation Society’s Ethics of

evaluation;(b) French system AFPD, IEFP rules;(c) Investors in people standard, United Kingdom;(d) our own framework for evaluation, including

distributional effects, concepts borrowed fromA. Sen, duration analysis, cost-benefit analysis;

(e) ISO quality measurement more than standards;(f) range of textbooks on evaluation theory.

These answers provided interesting insightsinto what additional sources could be consideredin the further development of evaluation stan-dards in Europe.

In general we discovered that no matter whichstandards or guidelines we listed, a maximumapproaching half the respondents were familiarwith them to a considerable or certain degree.The majority of the respondents knew most of thestandards listed only fleetingly or not at all. TheJC Standards clearly are best known. Only tworespondents had not heard of them.


Figure 4: Respondents’ familiarity with various sets of evaluation guidelines

How familiar are you with the following sets of standards/guidelinesfor evaluation?

Very familiarKnow a little bit

Quite familiarDon’t know

Best Practice Guidelines for Evaluation of the OECS (n=17)

German Evaluation Society 2001 (n=18)

Swiss Evaluation Society 2001 (n=16)

The MEANS Collection (n=17)

Guidelines for Evaluators (n=19)

Joint Commitee Standards for Evaluation (n=18)


All the experts had a positive attitude towardsstandards for evaluation. Over a third feel thatstandards are absolutely necessary. A thirdbelieve that standards are important and theremaining third think that standards could beuseful but do not yet seem to be sure whetherthey actually will be. None of the respondentsticked the fourth or fifth option, that the stan-dards do not matter or are unnecessary or evenharmful (69).

On the open question 8 nearly all respondentsargued for the intensive use of standards in VETevaluation (70). Many emphasised that standardslead to improvement in the quality of evaluations.Cited advantages are higher professionalism, thepossible use of the standards as a study aid andmeans of establishing uniform terminology,improvement in the utility and significance ofevaluation projects, and, especially, improvedtransparency and comparability of evaluationprojects.

Quoted responses include:(a) to diminish biases in evaluation. To get more

justice for everybody who is evaluated;(b) I was active in a Dutch consulting project on

examination and evaluation in VET. I discov-ered the importance of a minimum languageto be able to exchange between the variouseducational tracks;

(c) I think that standards utilisation is the bestway to increase the quality of evaluation bythe development of a common framework forall the stakeholders (commissioners, evalua-tors, […]);

(d) in countries where evaluation is just beingintroduced evaluation can have very impor-tant functions in changing organisationalcultures and the functioning of organisationsin many ways. There should, however, beclear standards in order to protect all partici-pants.

The respondents also see in the standards animproved opportunity for evaluators’ work toappear more legitimate, transparent and verifiableto outsiders. This could help protect all stake-holders:(a) raise credibility and professionalism of evalu-

ation and evaluators, provide a valuablechecklist for evaluators and those wishing toappoint evaluators, identify expectations andbenefits from the evaluation process [...];

(b) in [our country; W.B./S.S.], in the case ofabsence of standards, the profession willnever exist as a special profession of a groupof professionals, whether in VET or any otherdomain;

(c) enhance relevance, usefulness, and utilisationof evaluation;


(69) We cannot exclude the possibility that the eleven people who did not respond to the survey or that other unidentified VET eval-uation experts have this sceptical or negative attitude to evaluation standards.

(70) Only two of the 19 respondents did not advocate more intensive use of standards. One of these was not familiar with any ofthe standards mentioned in question 7.

Table 8: General assessment of evaluation standards

General position to standards for evaluation Frequency PercentageCumulativepercentage

Standards are absolutely necessary 7 36.8 36.8

Standards are important 6 31.6 68.4

Standards could be useful 6 31.6 100.0

Standards for evaluation do not matter 0 0.0 100.0

Standards for evaluation are not necessary or even harmful 0 0.0 100.0

Total 19 100.0


(d) if nothing else, it can at least enhance discus-sion about the relationship between evalua-tion and ethics.

Two respondents regretted that evaluations areoften understood in a very one-sided manner,being either restricted to their summative functionor concentrating purely on short-term effects.They hope that the evaluation standards will helpextend the scope and time frame of evaluations.

One respondent stressed that evaluation stan-dards should be generally applicable and notdesigned specifically for one field, such as VET.

Although many respondents felt that settingrigid standards could jeopardise the plurality andflexibility of evaluation (see the disadvantagesmentioned in answer to question 9), several oftheir colleagues believe that standards can safe-guard against one-sidedness and the loss of flex-ibility by emphasising methodological variety andplurality of perspectives. Observers hope thatevaluations based on standards will thrive, sincecomprehensible guidelines have prepared theground for reaping tangible benefits. Fewerrespondents mentioned the opportunity forprofessional exchange on the subject of evalua-tions which the discussion of standards provides;but those who did make mention, value it.

Of the 19 experts who responded to the openquestion 9, 6 did not identify any disadvantagesin a more intensive use of standards (71). Themisgiving most often expressed was that stan-dards could lead to a loss of plurality and flexi-bility, and thus to rigidity, in the theory and prac-tice of evaluation, creating barriers to innovation.Some respondents believe that the multitude ofcultural and historical approaches to evaluationcannot be reflected in standards:(a) ‘standard’ may not always do justice to

national/historical idiosyncrasies; an explo-rative attitude is necessary also in evaluation;

(b) […] different evaluation cultures in differentcountries; lacking flexibility if standards are notfurther developed/updated; sponsors/donorscould feel to be hampered in their programmes;

(c) there is the problem and fear of harmonisation;(d) when something becomes institutionalised

and written, many negative, unexpected andunintended side effects may occur, e.g.lip-service kind of talk;

(e) it is too restraining, the evaluator might looseinteresting development features in the field.

An idea expressed almost as frequently wasthat prescribing standards could lead to mechan-ical application which would not suit the evalua-tion focus or the evaluand. Respondentssuggested that the alleged objectivity of rigidstandards could eclipse the individual (ethical)decisions of the evaluators, and ultimately under-mine the real quality of evaluations, if unques-tioning obedience to standards were to becomethe overriding principle: (a) [...] they might be applied in a mechanic way

if they are too technical. Standards alwaystransport values and methodological as wellas theoretical applications that would narrowthe scope of approaches and might hinderinnovation [...];

(b) risk to focus the evaluation on the respect ofprocedure rather than of its purpose. Risk ofrough benchmarking and comparison.

The standards could also hinder evaluation. Forexample, commissioners might employ them as aninstrument of control or pressure, or smaller organ-isations might refrain from evaluating if they areobliged to follow standards slavishly. One personrejected the idea of a possible seal of approval forinstitutions and/or evaluators. Some warnedagainst competition for distinctions of this natureor recognition for ‘compliance to standards’:(a) if a standard ‘kite mark’ became attainable it

should not prohibit small companies fromapplying to attain the standard; standardsshould be reviewed; community developmentevaluation (which often includes areas of VET);

(b) it takes a lot of effort by the evaluators.At this stage respondents also pointed out that

standards must be worded very carefully to elim-inate the risk of ‘poor standards’.

5.3. Further development ofevaluation standards

Respondents were asked to decide which of thetwo following standard types they prefer. Minimumstandards are precise, operationally indispensableminimum conditions that the evaluation must fulfil.


(71) Three explicitly answered ‘none’, three provided no answer.

If one minimum standard is not observed, the eval-uation is not acceptable. Maximum standardsdescribe desiderata which evaluators should keep

in sight. If one or more maximum standards are notapplicable to an evaluation, or could not be met,this should be disclosed and justified (72).


(72) For details see the excursus on the meaning of the word ‘standards’ in Section 2.2.(73) We could not detect a clear pattern for this answer. It did not correlate with the primary position in evaluations, proximity to

the VET field or professional background.

Table 9: Preferred type of standards (minimum vs. maximum)

Preferred type of standards Frequency Percentage

maximum standards strongly prefer 5 29.4

maximum standards prefer 5 29.4

cannot decide 3 17.6

minimum standards prefer 2 11.8

minimum standards strongly prefer 2 11.8

Total 17 100.0

The majority of respondents preferred maximumstandards. Only four endorsed minimum stan-dards, two of them strongly, two less so. Threewere undecided (73).

The open question 12 asked what fundamentalvalues evaluation standards should embody.

The most frequently mentioned values wereparticipation with, cooperation between, andinclusion of, all stakeholders.

Transparency, integrity and frankness of theimplementers were also often mentioned.Respondents also said that standards shouldprovide leeway for adopting many differentmethods.

Some respondents regard reproducibility andtransferability of findings as the central determi-nants of evaluation quality. The evaluation’s find-ings should be useful and its impact beneficial.

The following criteria were mentioned:propriety; validity of findings; adoption of along-term perspective with follow-up studies;stakeholder acknowledgement; implementationof a formative evaluation or a process evaluation;and competence of implementers and theirresponsibility for promoting the public good.Slightly over a third of the respondents did notanswer this question. One participant feels that

most basic values are already incorporated in theJC standards.

Question 10 asked if current standards hadsignificant gaps or omissions. Of the 19 respon-dents, 8 did not name any or did not answer thequestion.

Those who did answer usually felt the absenceof a stipulation that the focus of studies shouldbe generally extended, e.g. that indicators ofsuccess other than ‘rate of employment’ shouldbe included in evaluations, or that studies shouldalso feature variables like the macroeconomicand societal effects of evaluands or their socialand educational environment.

Mirroring fears expressed frequently in theanswers to question 9, some respondentsdesired additional emphasis on flexible andpluralistic evaluation approaches and considera-tion of cultural and historical idiosyncrasies. Theyalso reiterated the principle that in each individualcase evaluators should be free to make decisionsaccording to their own ethical convictions.

Some respondents who play a major role inVET evaluations want them to be clearly directedtowards supporting objectives and processes ofvocational training, for example by more fullyinvolving active participants:

(a) ‘[...] content items, pedagogical items, itemslinked to management, teaching and supportstaff, items linked to participants (pupils,students, apprentices), physical resources,organisation [...]’;

(b) ‘involvement of trainers, trainees, employersand other stakeholders in the evaluationprocess [...]’;

(c) ‘recognition of the need to develop effectiveprocesses to evaluate intangible outcomes –e.g. the impact on individuals and communitiesin terms of quality of life, personal develop-ment, etc.; which make a real difference [...]’.

This is an allusion to JC standard P1 onservice orientation, which does not exist in themore general SEVAL and DeGEval standards,since it refers explicitly to the evaluation ofeducation and training programmes: ‘Evaluationsshould be designed to assist organisations toaddress and effectively serve the needs of the fullrange of targeted participants’. The explanatorynotes on the standards contain the additionalcomment that the evaluations should help ensurethat education and training objectives are appro-priate, that learners’ development is sufficientlyheeded and that programmes which are uselessor even harmful are abandoned. In this way eval-uations can contribute towards making projectsaccountable to society and the community. Plan-ners, implementers, users and participants mustlook beyond the interests of educators andorganisations and aim to further learners’ devel-opment and improve society as a whole. Evalua-tions should serve the interests of communityprogramme participants and society. The JCguidelines on the standards explicitly state:(a) ‘Evaluations should be planned which foster

the quality of programmes for education,initial and continuing training.’;

(b) ‘Evaluations should be used to identifyintended and unintended effects of theprogramme on the learners.’;

(c) ‘Teaching and learning processes should beinterrupted as little as possible, but at thesame time effort should be made to realisethe evaluation project’.

We feel it would be useful to draft a corre-spondingly formulated VET standard (74).

Other wishes were expressed by a few individ-uals: paying more attention to external consis-tency than internal; expounding the qualificationsand experience of evaluators; incorporatinglong-term perspectives; establishing uniformbasic terms and definitions, for the sake of inter-national comparisons; and defining various setsof standards for the different capabilities of theimplementers.

Over half the participants did not respond toquestion 11, which solicited alternatives to thestandards that would improve VET evaluationquality. The rare suggestions that were madewould primarily complement the standards ratherthan replace them. Examples include improvingexchange between all stakeholders throughregular conferences or establishing electroniccommunication networks. Also mentioned wereintroducing a system for certifying evaluatorsand/or institutes to guarantee their competenceand developing certain aids (such as publishingsurvey guidelines) where possible.

The only possible alternatives to the use ofstandards that were proposed were socialcost-benefit analysis and the capabilities andfunctionings theory (1987) devised by the 1998Economics Nobel Prize winner Amartya Sen.

5.4. Summary of survey findings

This was the first survey on evaluation standardsinvolving experts from most EU Member States.Despite the small sample and short question-naire, the poll enabled us to identify tendenciesand provided numerous stimuli for discussion onthe further development of evaluation standards.

The sample mainly consisted of evaluators andresearchers. The respondents were scholars insocial and political science, the liberal arts andeconomics. Engineers and natural scientists wererare. The respondents identified themselves witha total of 13 different national professionalcultures, giving the study a broad spectrum. Thebest-represented area was northern and westernEurope. Around half the experts are familiar withVET but have had very little involvement in the


(74) See Summary and Outlook.

field. The remaining respondents described VETas a major or their main working field.

Everyone has a generally positive attitudetowards evaluation standards. None of therespondents felt that standards do not matter orare unnecessary or even harmful. Thebest-known set of standards is the US jointcommittee standards for evaluation and theGuidelines for evaluators. The vast majority of therespondents named at least one set of standardswith which they are at least familiar.

Respondents see the main benefits of evalua-tion standards as improvement in the quality ofevaluations and an opportunity to make evalua-tors’ work more legitimate and transparent.However, they fear that utilisation of the stan-dards could restrict the plurality and flexibility ofevaluations in theory and in practice, or that stan-dards could be applied too rigidly.

When given the choice, the majority preferredmaximum standards, which provide orientation,stimulate competent dialogue on evaluations andtheir methods and are open to innovation andfurther development. Only a few favouredprecisely formulated minimum standards.

Respondents named involvement of all stake-holders and transparency and use of a wide

variety of methods as the most important hall-marks of evaluation standards. Correspondingly,well-known standard sets were felt to lack a stan-dard which emphasises the desired flexibility andplurality of evaluation approaches and models.Most respondents did not see a superior alterna-tive to evaluation standards and only suggestedenhancements which concern evaluationmanagement.

In summary, we can see that many respon-dents who agreed to take part in the survey dueto an interest in the topic had already had someexperience with evaluation standards (75). Theselection procedure (to some degree self-selec-tion) could account for the very positive overallassessment of the standards (question 5). Objec-tions to VET standards are equally applicable tominimum standards and are thus consistent withthe fact that the majority of the respondentsfavoured maximum standards. Evaluationplurality seems to be an important fundamentalvalue in European evaluation. On the one hand,this is explicitly stated in certain standards. Onthe other hand, erratic developments such as theinappropriately rigid application of evaluationstandards could jeopardise it.


(75) See answers to question 6.

Documentation research involved a search forpertinent articles from the last five years on thequality of VET evaluations and evaluation require-ments. Reflection can take place at the end of anevaluation or from a scientific/methodologicalperspective during evaluation research. Olderliterature has only been consulted when it isperceived as being particularly relevant orregarded as a standard work in this field.

Although evaluation methodology has most ofits roots in North America, where it is widely usedand has a long research tradition, the literature tobe assessed should stem from European authorsor reflect a European background. This shouldensure that European cultures and unique institu-tions receive appropriate attention. This approachshould prevent the unqualified import of Amer-ican evaluation culture, which could reduceacceptance and provoke resistance. Mistrust ofgovernment intervention and a public right toinformation characterises American evaluationculture (Schmidt, 2000). Empirical social scienceresearch is common in the US. In Europe,however, labour-market policy studies emphasisedifferent programme outcomes by comparingemployment and income effects. European coun-tries rarely conduct empirical social scienceresearch. We should observe separate innova-tions in European States (Toulemonde, 2000).

Literature in English and German is systemati-cally researched. French and Italian sources areconsulted in exceptional cases.

The process involved methodical evaluation ofreference databases, particularly those ofCedefop, the University of Osnabrück Library(comprehensive social science section) and theUniversity of Cologne Library (designated asGerman Economic Science Library) and Internetresearch on evaluation standard terms.

Perusal of the literature and subsequent cate-gorisation of text segments according to indi-vidual standards consistently show that the stan-dards overlap. Notes on evaluation qualityrequirements, therefore, cannot always be clearlyassigned to a single standard. Below, evidence of

use is cited under the most applicable standard;reference to overlap is made where necessary.Overlap also stems from the fact that some stan-dards are either directly or indirectly related. Theindividual standards belong to very differentanalytical levels and, consequently, the number ofcomments on each standard differs markedly.

The literature analysis reveals that developmentof the theoretical basis for evaluation has slowedduring the past decade (76). It has given way to aphase of consolidation and application of estab-lished evaluation models. One field of application isVET. Literature on VET evaluation encompasses abroad palette of perspectives. Some articles focuson model theory or evaluation methods, whileothers report on completed evaluations. Anotherclear trend is documentation which provides guid-ance on conducting VET evaluations. We candefine various levels within the VET evaluation liter-ature examined, covering evaluations onsupporting government decisions or improving thequality of individual programmes and evaluationsas a mechanism to initiate public VET debate. Eval-uands in VET literature range from the use of newmedia in continuing training, through vocationaltraining pilot projects, to individual programmes aspart of in-company continuing training. Assess-ment of evaluation standards should incorporatethe whole spectrum of possible VET evaluands.

When dividing VET into micro-, meso- andmacro-perspectives, we can generally assignthese levels to different reference disciplines.Economics tend to dominate the macro-perspec-tive. Economists often use quasi-experimentalinvestigation forms or ‘advanced’ quantitativeprocedures. Economic theories such as thehuman capital theory are employed to try toexplain many meso-level phenomena. However,sociological and educational methods and theo-ries may also apply, depending on the line ofinvestigation. The micro-level is primarily viewedfrom a psychological or educational perspective.Standards which are to apply specifically to VETshould therefore comply with key scientificcriteria in all reference disciplines.

(76) See Stufflebeam (2001) for a survey of diverse evaluation models over the past decades.

6. Reflections on VET evaluation standards literature

Various evaluation studies exist for the levelsdistinguished here. Universities, related institutionsand individual researchers conduct macro-evalua-tions as a rule. They usually observe and complywith all general scientific (methodological) stan-dards, some of which feature in the evaluationstandards, as a matter of course. Academicexpertise is much less evident at meso- andmicro-levels. For example, part of the job of staffdevelopers is to conduct evaluations. They haveinsufficient methodological training for this task.Clear standards could act as guidelines with aninitial and continuing training function for thisgroup in particular.

Not all standards are equally applicable to everyevaluation project, which is also true for VET evalu-ations. Nevertheless, the validity of each individualDeGEval standard has been confirmed in variousVET contexts. The literature studied containsconcrete quality requirements, advice and guid-ance on using evaluations which resemble the indi-vidual DeGEval standards. We can illustrate theindividual standards in terms of the VET evaluandand its characteristics and we can refine some ofthe standards further. We will therefore proceed bybriefly introducing each group of standards andadding a commentary on individual standards. Thiscommentary is partly illustrative and descriptive,and partly more reflective, depending on theconflict potential which each standard contains.

Below are the 25 DeGEval standards, groupedaccording to the four standard categories, withnotes gleaned from European VET-oriented litera-ture. We have refrained from providing a detaileddescription of each standard, as this appears inthe attached version of the DeGEval standardsand their explanatory notes (printed in the Annex).

6.1. Commentary on the utilitystandards

‘The Utility Standards are intended to ensure thatan evaluation is guided by both the clarifiedpurposes of the evaluation and the informationneeds of its intended users’ (DeGEval, 2002,p. 8). The utility standards are particularly relevantwhen interfacing with the (intended) users of eval-uations and their findings. Stakeholders may be(programme) managers or employee representa-tives. Evaluator competence depends on experi-ence in the field of enquiry and interculturalawareness. In view of the vast disparities, discus-sion of values is particularly important not onlywithin various target groups, but also in differentEuropean countries. The utility standards seem tobe relatively sensitive to cultural diversity.

Standards N1/U1, N2/U2 and N5/U5 help clarifythe basis and hence the interests, influences,purposes and values of a specific evaluation.


Figure 5: Seven points which make evaluations useful

StakeholderIdentification

Clarificationof evaluation

purposes

Transparencyof values

Evaluatorcredibility andcompetence

Evaluationtimeliness

Information scopeand selection

Reportcomprehensiveness

and clarity

Evaluationutilisationand use


This is linked to evaluator competence andcredibility (N3/U3). These standards must also berespected in the following operational planningsteps. These include information scope andselection (N4/U4) and report timeliness anddissemination (N7/U7). They should be broughttogether into a comprehensive and clear report(N6/U6) and result in a high usage of evaluation(N8/U8).

6.1.1. N1/U1: stakeholder identification‘Persons or groups involved in or affected by theevaluand should be identified, so that their inter-ests can be clarified and taken into considerationwhen designing the evaluation.’

Evaluations generally involve a concerted efforton the part of all participants, especially in initialand continuing vocational training. They shareresponsibility for the evaluation process. As arule, all parties discuss evaluation planning,implementation and findings. ‘Goals [in thecontext of evaluations] [...] are often negotiatedbetween at least three interest groups, namelybetween management and works council repre-sentatives on the one hand and researchers onthe other’ (Antoni, 1993, p. 315). This applies toevaluation of both in-company and extra-plantvocational training. Many different groups of deci-sion-makers and other stakeholders exist, partic-ularly in training partnerships. The EU also stipu-lates that ‘involvement of the social partners in allphases of the evaluation is crucial for findingviable ways of meeting the prevailing locallabour-market requirements and solutions toemployment issues’ (Gontzou, 1997, p. 62).

According to Reischmann (2003), evaluationsmust not muzzle people if adult education is toencourage them to become independent andactive citizens, responsible employees andwell-rounded personalities. Participants shouldhave the opportunity to play an active role in theassessment process and to take advantage ofthis aspect of learning within evaluations tonurture their own development.

For evaluation of in-company trainingmeasures in large enterprises, the evaluatorshould develop tools in consultation with thehead of the training department and/or the head

of the personnel department, in cooperation withother department heads (Tremea, 2002). The roleof trainers in the evaluation should be clarified.They could participate in the evaluation byobserving, measuring performance or identifyingareas where more training is needed. The form oftrainee involvement in the evaluation processmust also be established, e.g. completion ofquestionnaires, interviews, self-assessment.Increases in productivity as a result of trainingmeasures could be determined by consultingexternal parties interacting with the company.These could include suppliers, distributors,current and potential end users, employers’ asso-ciations, trade unions, etc.

Stakeholders for evaluation of pilot projects inschool-related education include the pupils andteachers of government schools, educationalresearch institutes and their various departments(e.g. vocational school departments) andacademic institutes providing educationalsupport, as well as the pilot project sponsors,such as the central education ministry, a regionalministry or a national vocational training institute.‘These people always have divergent interests.Their allegiances are to quite different institutions.This leads to contrasting interpretations of a pilotproject’s mandate and widely varying concepts oftheir own function, their role in the pilot projectand the functions and roles of other players’(Sloane, 1995, p. 13).

Stakeholder orientation is also dictated byculture, particularly relating to hierarchy or egali-tarianism. In essentially egalitarian societies,interpretation takes for granted the incorporationof different stakeholders in the evaluationprocess. In other societies a certain degree ofunequal empowerment and strongly differentiatedspheres of influence are both legitimate anddesirable (Taut, 2000). Reflecting on the Americanstandards, Jang (2000) notes that in South Koreait is common practice only to consider the expec-tations of the commissioner. Although the culturaldiversity within the EU is certainly not as great asbetween the US and South Korea, differencesbetween European countries also have to betaken into account (77).


(77) Hofstede’s (1980) investigation of the ‘power distance’ dimension showed that it is relatively high in Belgium, Greece, Spain,France and Portugal in comparison with other European countries.

6.1.2. N2/U2: clarification of the purposes ofthe evaluation

‘The purposes of the evaluation should be statedclearly, so that the stakeholders can provide rele-vant comments on these purposes, and so thatthe evaluation team knows exactly what it isexpected to do.’

Antoni (1993) maintains that different interestgroups and their varying goals sometimesobscure the evaluation purpose. Nevertheless,the purposes of an evaluation should beexplained in accordance with standard N2/U2.

Possible evaluation purposes are preparationof government decisions, preparation ofcompany decisions on training, information onindividual decisions and quality improvement forspecific programmes (78).

‘At its most effective, the evaluation processneeds to relate to the needs and objectives of theorganisation, its component parts (e. g. depart-ments or teams) and the individual employee. Itshould be recognised that the requirements, andtherefore the objectives, may be different for eachof these. Provided this is recognised, and theexpectations from the training or developmentactivities are recognised, then the needs of allthree can be accommodated. There should be acoherent structure in the evaluation process thatstarts with expectations, leads through to reac-tion and measures the changes’ (Field, 1999,p. 218). The evaluation purpose should, therefore,coincide with the goals of organisation units orthe corporate strategy.

A survey of 2000 enterprises in Europe on thepurpose of training evaluations revealed thefollowing points (Field, 1998b, p. 72; authors’additions in parentheses): (a) to measure the extent to which objectives

have been met (Q);(b) to encourage the effective use of resources (P);(c) to further develop individuals and their

careers (P);(d) to improve the organisation’s turnover (P);(e) to increase the organisation’s competitive-

ness (P);(f) to obtain feedback on the training provision (Q);

(g) to identify the impact of the training activityon the employee’s job performance (Q);

(h) to justify money spent on training (P);(i) to identify the contribution to business objec-

tives (Q);(j) to identify the contribution to organisational

performance (Q);(k) to measure the effectiveness of the training (Q);(l) to provide information for sponsors (P).

This list demonstrates the difficulty in distin-guishing between evaluation purposes (P) andquestions (Q) which the evaluation shouldanswer (79). Purposes describe something whichan evaluation should set in motion in the socialand economic environment. Questions describesomething which the evaluation should clarify(N4/U4).

Ideally, all evaluation processes shoulddisclose and explain the evaluation purpose tothe stakeholders. Everyone should know what willhappen to the survey data and what kind of feed-back they can expect (Field, 1998b, p. 25).

6.1.3. N3/U3: evaluator credibility andcompetence

‘The persons conducting an evaluation should betrustworthy as well as methodologically andprofessionally competent, so that the evaluationfindings achieve maximum credibility and accep-tance.’

The competence of evaluators is a significantfactor, since they do not have a standardised jobprofile. Independent evaluation and evaluationresearch courses in social sciences are relativelyrare at European universities. A few Europeancountries offer postgraduate courses (80).

Application and interpretation of existing toolsrequires empirical and methodological knowl-edge. This applies all the more to tailoring oftools. In addition to these methodological skills,also cited in the accuracy standards, evaluatorsmust demonstrate knowledge of the evaluandand its context. ‘Professional competence as anevaluator in the technical field and geographicalarea of the project is one of the principalelements in the selection criteria for designating


(78) For more details, see Grubb and Ryan (1999), pp. 21 f.(79) Confusion about evaluation purposes and programme goals is not uncommon.(80) E.g. Spain, Sweden and Switzerland.

the evaluation team members. Objectivity andindependence are the other key considerationsfor selecting evaluators. The degree of indepen-dence, however, depends on who designates theevaluators’ (ILO, 1999, p. 10). Evaluators whowork exclusively in VET and show outstandingexpertise in this field are in danger of becomingblind to programme malfunctions and positiveside-effects.

It would be sensible for an (internationallyactive) evaluation team to include at least oneevaluation specialist and one VET expert, andother people with knowledge and awareness ofthe economic and social needs and problems ofeach country in which the evaluation takes place(Grubb and Ryan, 1999).

Other authors, such as Wottawa, emphasisethat academic competence must be transferredto corporate practice (Wottawa, 1999,pp. 112-113). ‘The “academic background”(education, social sciences, psychology,economics) of potential evaluators is secondary.The business world is less interested in whichdisciplines employees have training in, focusingmore on whether they show practical compe-tence which transcends subject boundaries. Formost vocational training evaluation procedures itis vital to work with people from diverse specialistbackgrounds. Many evaluation projects mustintegrate academics, education experts, manage-ment and participants themselves. They all havedifferent educational backgrounds.’

Evaluators must also be aware of the limits oftheir own knowledge and skill. This could promptthem to consult other experts and delegatecertain duties to other parties. VET could, forexample, involve determining the motivation ofparticipants in a measure. If an evaluator’spsychology expertise does not suffice for this, itmakes sense to apply standardised proceduresor to leave the collection and interpretation offindings to other psychologists (Tremea, 2002).The same applies to determining psychologicalprofiles, which are also very sensitive.

In deciding who should conduct the evaluation,one must also consider whether it should beinternal or external. Schmidt (2001) argues thatexternal evaluation is advisable, sinceprogramme planning could be based on false

premises. The competence of an external evalu-ator could be helpful, and outsiders are morescientific and independent.

If evaluators conduct evaluations in unfamiliarcountries, cultural distance will play a role.However, they will also have interpersonaldistance from the people in the countryconcerned. This can be advantageous, enablingthem to assume a ‘balanced view’ (Hendricksand Conner, 1995). If evaluators work abroad,intercultural skills may be part of their qualifica-tions. Sensitivity to social, cultural and economicdifferences between the various stakeholders iscrucial.

How different cultures determine evaluatorcredibility can vary dramatically. A society with avalid ‘seniority principle’, for example, may auto-matically regard the older generation as the more,or even only, competent group (Jang, 2000).Social status and gender can also significantlyaffect assessment of evaluator competence andcredibility, depending on the culture.

6.1.4. N4/U4: information scope and selection‘The scope and selection of the collected infor-mation should make it possible to answer rele-vant questions about the evaluand and, at thesame time, consider the information needs of theclient and other stakeholders.’

The logic model is one tool which can beapplied to clarify objectives and structure theprogramme for evaluation (81). This specifiesoverall goals, interim goals, indicators and effectsand puts them into context. The tool is widelyused in evaluations for structuring internalprogramme logic and formulating questions to beaddressed by the evaluation.

An oft-cited and popular approach for detailingVET evaluation questions is Kirkpatrick’s (1994)four-level model. This first examines learner reac-tions, and then what participants have gainedfrom the programme. The third stage evaluatesbehaviour in the work environment, and the fourthstudies the results from an organisationalperspective. This last stage entails a return oninvestment. Thus we have various evaluands. Itwould no doubt be more sensible to establish theevaluation purposes (N2/U2) before formulatingquestions or indicators.


(81) The ‘logic model’ is often used to structure evaluations.

‘This obliges evaluation providers to considerthe information needs of decision-makers (inbusiness, not in research) even more closelywhen selecting their strategies and assessmentindicators. If this does not happen, there is a riskthat decision-makers, who ultimately provide thefunding, will opt for alternatives, i.e. at best otherevaluators or, at worst, even to dispense withscientifically sound evaluations altogether’(Wottawa, 1999, p. 108). The information purposedetermines the value of knowledge, and not thequantity of information, according to Weiß (1997,p. 108). Information for evaluations should bechosen and condensed in such a way that it canserve as a basis for decision-making (82).

6.1.5. N5/U5: transparency of values‘The perspectives, procedures and thoughtprocesses that serve as a basis for the evaluationand the interpretation of the evaluation findingsshould be described carefully to clarify theirunderlying values.’

‘Before undertaking the mission, the teammembers should also familiarise themselves withthe cultural and social values and characteristicsof the recipients and intended beneficiaries.’ TheILO Guidelines concur (ILO, 1999, p. 12). Culturalvalues can vary dramatically within a country andbetween companies and organisations. This stan-dard on identification of values is highly relevantto evaluations which encompass several Euro-pean states or which are conducted in differentEuropean countries. Standard N3/U3 alsoapplies, as intercultural competence stronglyinfluences the identification of values.

‘Naturally, integrative concepts will sparkconsiderable debate as to their explicit valuejudgements with regard to the weighting ofdifferent types of outcomes as well as the timepreferences or even group preferences.’ (Schmidt,2001; p. 9). Trade-offs can occur between differenttimes or different social groups. Selection of indi-vidual parameters for evaluations is vulnerable tobiases, as it can affect the significance and evendetermine the survival or demise of political andcorporate programmes.

6.1.6. N6/U6 – report comprehensiveness andclarity: evaluation reports shouldprovide all relevant information and beeasily comprehensible

Annex 2 to the guide to evaluation of EU expen-diture programmes (European Commission, 1997)formulates specific questions for assessing thequality of evaluation reports: ‘Is the report wellpresented? [...] Is the scope of the reportadequate? [...] Is the methodology of the reportappropriate? [...] Are the report’s conclusions andrecommendations credible?’ (see also G8/A8)

These key questions are elucidated further. Forexample, the last question is complemented bythe following additional questions. ‘Are findingsbased firmly on evidence? Are conclusionssystematically supported by findings? Are recom-mendations adequately derived from conclu-sions?’ They not only articulate requirements forthe form of the report and the style of presenta-tion, but also impose clear quality demands onthe content. This overlaps with considerations ofmethodology and data quality in other stan-dards (83).

In the context of vocational training pilotproject research, Zimmer (1998, p. 598)comments that the findings should be processedin such a way that other enterprises and traininginstitutions can benefit from them. It should bepossible, therefore, to transfer available interimresults, and not just conclusive findings, to othercompanies or training establishments with similarproblems. According to Kaiser (1998), the poten-tial for gaining scientific insights from pilotprojects depends in particular on the structureand presentation of texts produced in the courseof the scheme, such as the final report andproject documentation on teaching and learningarrangements. Addressees, teachers, schooladministrators, trainers, company managers,education policy-makers, cultural bureaucratsand education coordinators will undoubtedly reada final report on a pilot project only if it is not tooextensive and overloaded with details and jargon.Every pilot project team should consider eachtime how to compose the final report to ensure


(82) It goes without saying that there are other evaluation purposes besides providing a basis for decision-making, such as ongoingimprovement or accumulation of general knowledge.

(83) See the accuracy standards G1/A1, G2/A2, G3/A3, G4/A4 and G8/A8.

that important findings and results are accessibleto vocational training policy-makers and useful tovocational training research. Reports should betailored to the relevant target group (see also usergroups N1/U1).

An evaluation report should also contain a listof any problems regarding concepts, contentsand methods which may surface (Kaiser, 1998,p. 547). This relates to the accuracy standardsand is crucial to the subsequent meta-evaluation.

6.1.7. N7/U7: evaluation timeliness‘The evaluation should be initiated andcompleted in a timely fashion, so that its findingscan inform pending decision and improvementprocesses.’

Ideally, the report should be completed imme-diately after gathering data. Often deadlines arebased on the needs of third parties, such as datafor important meetings in which results arepresented (Field, 1998b, p. 12).

Moreover, it has been ascertained in connec-tion with N7/U7 that evaluation design and qualityheavily depend on the timing of evaluation plan-ning. Evaluations planned after the launch of aprogramme lack certain opportunities to influencethe evaluation design to ensure evaluability andallocate participants to test and control groups.This applies especially to experimental studies.But before-and-after comparisons cannot beaccurate if evaluation planning only commencesafter the start of a programme. Beginning evalua-tion design only after the decision to run aprogramme, after the successful launch of theprogramme or even after its conclusion arecommon occurrences in VET (84). In such cases itis possible to complete the evaluation report ingood time, but the evaluation itself cannot beginpunctually. This affects both evaluation contentand method.

Here we must note that ‘timeliness’ of the eval-uation as described in the text to the DeGEvalstandards can favour the production of quickresults. Most evaluations at the end of a VETprogramme study short-term effects appearing in30 to 90 days. Evaluations which measure effectsafter two years tend to have more complex, often

randomised designs. However, since short-termand long-term effects are not necessarily related,a longer perspective is needed. Evaluationswhich only observe short-term developmentsmay approve programmes with immediate impactand underestimate those whose effects onlybecome evident or mature after several years.Focusing on immediate benefits can hamperobservation of long-term effects. The potentialworth of a programme for vocational training canincrease or decrease over the course of time.

Gaude (1997, p. 55) hypothesises that theincome of former vocational further trainingmeasure participants could be higher after aperiod of job-seeking than that of the controlgroup. The increased competence of the formertrainees would permit them to reach a higherrung on the career ladder. However, they may nothave upgraded their qualifications and could alsostagnate in poorly paid jobs. These possibleeffects can only be observed and measured overseveral, longer survey periods. Gaude states thatmany evaluations do not last long enough togather this data. Extending the evaluation overseveral years is the only solution. Grubb andRyan (1999) propose five to six years.

Fay (1997, p. 111) also calls for longer evalua-tion periods, especially for training programmes.The following relationships could be of interest(Tremea, 2002): training and employment, trainingand promotion, training and job-keeping. Thesequestions could be helpful. Are the former partic-ipants employed in the occupation for which theytrained? Do training participants use traininglessons regularly? What training would partici-pants have needed to perform their current dutiesmore efficiently?

Calls for longer evaluation periods increaseevaluation complexity and costs. Furthermore, ittakes longer to publish final evaluation reports.Stretching evaluation periods can also encouragethe separation of programme evaluation frompolitical cycles. Programme and evaluation dura-tion should be interdependent. The duration of aprogramme’s potential impact, including multi-plier effects, also has a close bearing on evalua-tion duration and timing of surveys.


(84) Expert discussion in the Vocational and in-company continuing training task force at the DeGEval conference in Mainz on17 October 2002.

6.1.8. N8/U8: evaluation utilisation and use‘The evaluation should be planned, conducted,and reported in ways that encourage attentivefollow-through by stakeholders and utilisation ofthe evaluation findings.’

Meta-evaluations are conducted to establishhow VET evaluations are utilised. We know of noEuropean studies on this subject.

A survey of enterprises in Europe obtained thefollowing responses to the question of trainingevaluation use (Field, 1998b, p. 73): (a) facilitating and reflecting on the transfer of

learning to the workplace;(b) reducing staff turnover;(c) ensuring that training meets company and

individual objectives;(d) raising awareness of the benefits of training;(e) increasing staff motivation;(f) improving the effectiveness of training activities;(g) measuring productivity increase;(h) increasing individuals’ responsibility for their

own training and personal development;(i) involving managers in the training and evalua-

tion process.This list demonstrates that both the evaluation

process (goals become clearer, broken links inthe chain of training elements are discovered andrepaired, etc.) and its findings (well-founded deci-sions are made, which increases both workermotivation and productivity in the long term) cantrigger the stated forms of utilisation. Some eval-uation approaches prefer process use (e.g. thequalitative approach related to organisationaldevelopment) while others adopt the findings use(as in quasi-experimental approaches). As stan-dard N2/U2 shows, stakeholder use requirementsdictate prioritisation of the central evaluationbenefit. Models and methods should adapt to userequirements, not vice versa. More academicevaluators and pragmatic evaluation approachesoften clash.

Evaluations of VET measures should providecommissioners with clear instructions andcomments not only to demonstrate that a trainingprogramme was completed more or lesssuccessfully, but also to encourage further utilisa-tion of the findings. This applies particularly toformative evaluations. They should formulatespecific proposals for possible changes, as ‘the

purpose of evaluation of training is not to prove,but to improve’ (Tremea, 2002). Evaluation stake-holders such as training measure purchasers,training measure providers, training participants,must be activated. Potential evaluation utilisationshould perhaps encompass a wider group toensure that colleagues of participants, or entiredepartments, enterprises or organisations, arealso informed. The actual or unrealised benefit ofan evaluation and the (non-)application ofproposals should be recorded (follow-up). Wasthe training programme restructured on the basisof the evaluation? Did selection of trainers reflectthe previous evaluation? If additional training wasrecommended, has it already taken place?

Evaluations can strengthen interpersonal rela-tions and worker motivation within a company.Workers who may also be training participantscould appreciate colleagues listening to their opin-ions and adopting their ideas, if there is keeninterest in the results of a training initiative (Tremea,2002). In the longer term it is vital to utilise theinformation gathered conspicuously to motivateworkers to participate in future evaluations.

Several factors can bolster the use, and hencethe success, of evaluations within enterprises andorganisations. A company with a corporate culturebased on trust rather than mistrust promotestraining as an investment. This kind of environmentalso tends to support data compilation and utilisa-tion. Linking evaluations to relevant strategic andorganisational goals increases the probability thatrecommendations will be respected and imple-mented (Field, 1998b, p. 75). The attitude of seniormanagement to, and support of, training and itsevaluation is a deciding factor in the utilisation ofevaluations and their findings.

Efforts to establish continuity can also boostevaluation utility. This includes concurrent devel-opment of monitoring systems (85) which cansupply data for evaluations and channel the infor-mation obtained in evaluations into the moni-toring process. This applies particularly tostate-financed and state-run initial and continuingtraining activities.

It should be emphasised that evaluation activi-ties are not always beneficial. They can be worth-less or even harmful. Reischmann (2003) coinsthe merit criterion of ‘didactic utility’ specifically


(85) See Auer and Kruppe (1996) for an overview of monitoring systems in the EU.

for adult education. He maintains that evaluationscan only help improve the understanding andstructure of adult education if they apply thiscriterion from the outset. Reischmann attachesmore weight to this factor than to any other. Hestates that evaluations are only a valid aspect ofadult education if their andragogical and didacticintentions and consequences are clear.

6.2. Commentary on thefeasibility standards

‘The Feasibility Standards are intended to ensurethat an evaluation is planned and conducted in arealistic, thoughtful, diplomatic and cost-effectivemanner’ (DeGEval, 2002, p. 9).

The feasibility standards are highly relevant toformative evaluations. Diplomatic conduct isespecially adaptable to different cultures and thusmore sensitive to national evaluation environ-ments than the other two feasibility standards.Both appropriateness of the procedure employed(D1/F1) and diplomatic conduct (D2/F2) signifi-cantly affect evaluation efficiency (D3/F3).

6.2.1. D1/F1: appropriate procedures‘Evaluation procedures, including informationcollection procedures, should be chosen so thatthe burden placed on the evaluand or the stake-holders is appropriate in comparison to theexpected benefits of the evaluation.’

Decisions on the evaluation design must reflectthe type of programme being assessed and thenature of the programme’s expected impact,which is affected by the following programmeattributes (Lindley, 1996, pp. 853-854):(a) scale:

(i) the coverage of the programme relative tothe size of the socioeconomic space forwhich the evaluation is being conducted;

(ii) the extent of the tax expenditure involvedrelative to the costs perceived by theactors whose behaviour is being influ-enced;

(b) selective: dealing only with a broad section ofeconomic activity, whether distinguished byaggregate sector (e.g. agriculture or manufac-turing), spatial area (e.g. poorer nations orregions) or major socioeconomic group (e.g.women);

(c) targeted: focused more sharply on particularsectors (e.g. coal mining), subregions (e.g.level 2 of the Eurostat regional classification)or labour force groups (e.g. unemployedyoung people, women returning to the labourforce);

(d) transitory: where the policy is seen by theactors as being merely a temporary measure,or one which may be used only recurrentlyfrom time to time;

(e) countercyclical: where policy intervention is areasonably predictable form of counter-cyclical measure (rather than being consid-ered to be so ex post);

(f) long-term: where the policy intervention isseen to be a long-term measure, even thoughaspects of it may be subject to variationaccording to socioeconomic conditions.

At one end of the scale, evaluands can be rela-tively small, target a specific group and cover alimited period. The other extreme comprises exten-sive, long-lasting programmes with diverse, some-times hierarchically-related target groups (e.g.provider managers, trainer trainers, trainers, endconsumers such as young people and their parents).Both extremes – and all graduations in between –require different evaluation models and methods.

Quantitative, standardised tools, which necessi-tate considerable investment in development oradaptation, may be efficient for large programmes.Qualitative but flexible, practical tools are oftenmore appropriate for small programmes.

A common situation involves nationally or evenEuropean-funded programmes which are imple-


Figure 6: Two points which make evaluationsfeasible

Appropriateprocedures

Diplomaticconduct

Evaluationefficiency


mented in many locations and function almostindependently. This raises the question ofwhether the combination of blanket monitoringand local case studies, quasi-experiments usingcontrol or comparison groups, or a cluster evalu-ation is the most suitable evaluation design(Beywl et al., 2003).

Butz (2000, p. 432) maintains that assumptionson the supposed acceptance by those ques-tioned and pollsters should steer the selection orconstruction of the actual survey materials. Ifprogramme organisers, for example, are involvedin a dense, binding monitoring system, they willresist additional, written surveys, but are morelikely to accept telephone interviews or interac-tive group survey procedures with integratedexchange. Reischmann (2003) also advises omit-ting everything which will not be evaluated ifinformation is obtained directly from participants.It is important to ascertain whether data compila-tion can be spread among various (groups of)people so that nobody is overtaxed. It is alsoprudent to check whether material or documentsalready contain some necessary informationwhich does not have to be gathered separately.

6.2.2. D2/F2: diplomatic conduct‘The evaluation should be planned andconducted so that it achieves maximal accep-tance by the different stakeholders with regard toevaluation process and findings.’

Views on appropriate diplomatic conductdepend on national cultures and differ betweencommercial and non-profit organisational environ-ments. Even the term ‘diplomatic’ has divergent oreven contradictory connotations (cautious, adept,covert, indirect, manipulative, etc.), depending onthe culture (86). This alone indicates the standard’shigh cultural sensitivity. At the same time the stan-dards N1/U1 and F2/P2 are relevant.

Practice shows that resistance from employeeassociations such as works councils can thwartevaluation projects. Data protection officers canalso exert a strong influence in Germany. To avoidunexpected barriers, employee representatives inan enterprise or organisation, who are granted avoice under national law, should be involved as

extensively as possible in planning from an earlystage.

Resistance may stem from negative experi-ences of preservation of anonymity and confiden-tiality in previous surveys. For example, unofficialbut widely distributed evaluation documents maycontain the names of individual trainers. Or,particularly in small organisations, it is possible todeduce who assessed performance negatively orpositively, as presentation of data in the finalreport is too specific.

At worst, this can result in a warning, transferor termination of contract for the affectedprogramme organiser or trainer, despite assur-ances of anonymity and confidentiality. Suchoccurrences deter enterprises and organisationsfrom participating in evaluations. To avoid thiseffectively, Butz (2000, p. 437) recommendsinvolving works councils, data protection officersand staff representatives from the evaluationdesign stage. It may be wise to conclude awritten evaluation agreement with the workscouncil. Names of participants and other individ-uals should not be mentioned in public evaluationdocuments. Reports on smaller departmentsshould be summarised.

6.2.3. D3/F3: evaluation efficiency‘The relationship between cost and benefit of theevaluation should be appropriate.’

Nuissl (1999, p. 73) notes, ‘It is necessary todevelop an acceptable system of evaluation,assessment and monitoring to assess the overalleffectiveness of projects and to ensure the qualityof the outcomes.’ When commissioning, tenderingfor, planning and implementing an evaluation, onemust ensure that the invested resources areeconomically proportional to the expected use ofthe evaluation. This concerns personnel involve-ment in the evaluation as well as the costs andburdens which the enterprise or other organisationhosting the evaluation may incur through datacollection or supervisory panel meetings.

Decision-making on the overall investment anevaluation warrants should consider the plannedscope of the evaluation findings (Tenberg, 1998,p. 533). For a pilot project it may be sensible tooverscale the evaluation to the entire programme


(86) The titles of JC standard F2, Political viability, and SEVAL standard D2 of the same name often provoke ambivalent reactions,as political is associated with unfounded, irrational or arbitrary in enterprises.

costs, as subsequent transfer of the project willaffect a large number of participants and demanda correspondingly high budget. Company-levelevaluation of every kind of continuing training issuperfluous. Only one or two employees mayparticipate in a training programme, or minimalinvestment has been needed, or all parties withoutexception are convinced of the use of awell-established training programme. In these andsimilar cases an evaluation will not be conductedor will only take place on certain levels.

The PAVE project’s evaluation resource packasks companies the following questions to helpthem decide whether an evaluation should beconducted, and if so, on what scale (Field,1998b, p. 52).(a) how many people does the training or devel-

opment measure affect?(b) how crucial is achieving the expected training

goal for the company?(c) how likely is this measure to run again?(d) has the training provider received a contract

before?(e) is the type of training new to the company, e.g.

new communication technique, new skills?(f) to what extent do the training measure and

the evaluation process support other areas ofcorporate policy?

(g) is evaluation of the training or developmentmeasure (urgently) needed?

Weiß (1997, p. 107) states that the more precisethe tools and the more differentiated the measure-ment criteria, the higher the investment. Commis-sioners and evaluators should consider what levelof perfection is required. In practice, it will often benecessary to compromise between the desire foraccuracy and the resources available. This appliesboth to individual programmes and to selection ofsub-projects for evaluation (Lindley, 1996).

6.3. Commentary on the proprietystandards

‘The propriety standards are intended to ensurethat in the course of the evaluation all stake-

holders are treated with respect and fairness’(DeGEval, 2002, p. 9).

Model contracts and legal foundations provideclear points of reference for the first two proprietystandards, formal agreement (F1/P1) and protectionof individual rights (F2/P2). Complete and fair inves-tigation (F3/P3) and unbiased conduct and reporting(F4/P4), in contrast, are much harder to clarify andjudge in the fierce conflict of interests. The form ofthe published findings (F5/P5) should embody theresult of the precautions taken in the first four stan-dards. Are all pertinent findings published or onlythose which do not collide head-on with the inter-ests of key participants? This crucial decision shouldbe made as soon as possible in the course of anevaluation, formally agreed (F1/P1) and communi-cated to the stakeholders (D2/F2).

The propriety standards set out the industrialrelations requirements for VET evaluations, suchas decision-making regulations and data protec-tion. They also articulate cultural differences intreatment and protection of minorities (87).

The propriety standards impose specificdemands on evaluators’ legal knowledge andsocial awareness (N3/U3). This is particularlyimportant when evaluators work abroad.

However, the service orientation standard inthe American JC standards (JC-P1) could be rele-vant to VET evaluations. The DeGEval standardsdo not contain this standard, as they aredesigned to apply beyond the field of humanservices (Section 6.5).

6.3.1. F1/P1: formal agreement‘Obligations of the formal parties to an evaluation(what is to be done, how, by whom, when) shouldbe agreed to in writing, so that these parties areobligated to adhere to all conditions of the agree-ment or to renegotiate it.’

Evaluating EU Expenditure Programmes (Euro-pean Commission, 1997) and the MEANS hand-book (European Commission, 1999b, Vol. 1,p. 76) list key elements which a contract shouldnormally contain: ‘the legal base and motivationfor the evaluation, the future uses and users ofthe evaluation, a description of the programme tobe evaluated, the scope of the evaluation, themain evaluation questions, the methodologies to


(87) We are unaware of any study which appraises the various existing stipulations regulating fair and legal implementation of VET eval-uations in EU Member States and general data compilation in the various subsystems (enterprises, public authorities, schools …).

be followed in data collection and analysis, thework plan, organisational structure and budget,the selection criteria for external evaluators, theexpected structure of the final evaluation report’(European Commission, 1997, p. 38 f.). The JCstandards include detailed guidelines on this (JC,1994, p. 88).

We know of no lawsuits between commis-sioners and contract recipients in Europe to datewhich have appealed to the standards. However,this could be a future role of the standards, ashas always been intended. When in doubt, courtswill consult professional standards. The formalagreement should explicitly state whether theevaluation standards form the basis of evaluationimplementation to create clarity betweencommissioners and contract recipients. We foundno specific references to formal agreements onevaluations in VET literature.

6.3.2. F2/P2: protection of individual rights‘The evaluation should be designed andconducted in a way that protects the welfare,dignity and rights of all stakeholders.’

Initial or continuing training participants oftenhave a high stake in their programmes. They arecounting on obtaining a vocational qualification,which will open the door to certain professions, alivelihood and social status. When (re-)enteringthe world of work, continuing training participantscan achieve promotion and secure their jobs, butthey may also lose out if they are transferred, ortheir contract is not renewed or is terminated.Full-time VET staff and freelance training

providers, in particular, associate evaluations withgreat opportunities and high risks.

Personal data protection and appropriatehandling of performance data and findings whichcan be traced back to individuals should be apriority in VET evaluations. For example, an evalu-ation may require psychological profiles, such asmeasurement of intelligence or other personaltraits, with especially sensitive information. It isuntypical for evaluations to gather this kind ofdata. If they do, to assess the aptness of a trainingconcept to participants’ initial cognitive status orto explain learning difficulties, for example, confi-dential handling of this data is vital (Tremea, 2002).

In any case it must be emphasised, ideally inthe formal agreement (F1/P1), that neither thegrading of participants nor the assessment oftrainers is the aim of programme evaluations.There are independent standard sets for this.They prescribe much more precise and narrowregulations for protecting personal rights than theProgramme Evaluation Standards (JC, 1988,Gullickson, 2002).

The gender issue is another important aspect.The status of men and women is culturallydependent and varies throughout Europe. Evalua-tions do not presume that a certain gender leadsto better or worse training results. However, someoccupational groups tend to employ mainly menor mainly women. Moreover, different Europeancountries have diverging views on the role ofwomen in the workplace. An evaluation mustconsider these aspects and decide whether ornot to record participant gender.

The same applies to data on participant age.Older employees have more difficulty findingemployment in some European countries than inothers. In Scandinavia, for example, age tends tohave less effect on the probability of finding a job.Ethnic or minority composition of a training groupcan also affect participant chances of employ-ment. Evaluators should only gather or assesssuch sociologically and politically sensitive data ifcommissioners expressly request it and explainwhy (Tremea, 2002).

This highlights the cross-reference to thetransparency of values standard (N5/U5), whichshould be respected at the conception stage ofdata collection in VET evaluations spanningnational boundaries.


Figure 7: Five guidelines which keep evaluations on a straight course

Formalagreement

Protectionof individual

rights

Completeand fair

investigation

Unbiasedconduct and

reporting

Disclosureof findings


6.3.3. F3/P3: complete and fair investigation‘The evaluation should undertake a complete andfair examination and description of strengths andweaknesses of the evaluand, so that strengthscan be built upon and problem areas addressed.’

Identifying and eliminating weaknesses duringevaluation implementation is conceivable. Inpractice this is more helpful than waiting to makechanges until publication of the final report. Reis-chmann (2003, p. 253) maintains that in extremecases, a final report could include the following:‘We have identified the following weaknesses: [...]We employed the following measures to eliminatethem successfully and permanently: [...] The eval-uation report thus has no further recommenda-tions!’ However, changes which have alreadybeen implemented should still be identified anddocumented in detail.

For this standard we have found neitherexplicit references to intercultural idiosyncrasies,nor references to VET. However, we know thatapproaches to programme errors or weaknessesand strengths can vary widely between cultures.For example, Germans are quoted as expressingtheir disagreement very bluntly and directly (‘Youare wrong!’) and are very sparing with praise. TheBritish, in contrast, ‘wrap up’ criticism ordisagreement in polite phrases: ‘To a certainextent I agree with you, but I’m not totallyconvinced’, and may express agreement verystrongly: ‘We see eye to eye on this affair’ (Bose-witz and Kleinschroth, 1997). An interculturalevaluation certainly demands ample knowledgeand confidence in communicating strengths andweaknesses.

6.3.4. F4/P4: unbiased conduct and reporting‘The evaluation should take into account thedifferent views of the stakeholders concerning theevaluand and the evaluation findings. Similar tothe entire evaluation process, the evaluationreport should evidence the impartial position ofthe evaluation team. Value judgements should bemade as unemotionally as possible.’

The nature of impartial conduct may differbetween various nationalities and even betweensubcultures within a country. Evaluations in coun-tries where VET institutions integrate social part-

ners almost automatically consider employer,union and public viewpoints so that they can beseen to be unbiased. In other cases, the status ofa public organisation can indicate high depen-dence or a high level of independence. Forexample, evaluators who work full-time at auniversity are perceived to be less biased thanthose who work for a business consultancy or asfreelancers, even if the reverse is true. In hierar-chical organisations such as patriarchal compa-nies or authorities, impartiality may be undesir-able. This puts evaluators in a difficult position.

Culture affects preferences for the minimumnecessary degree of consideration of variousperspectives versus the maximum permissible,and their mode of representation. Public debate,which reveals clear differences of opinion, mayeither be inappropriate or second nature,depending on the culture (Smith and Jang, 2002).

In some cases it may even be very difficult toascertain different viewpoints. Depending on theposition on the ‘individualism – collectivism’dimension (Smith and Jang, 2002), participantstend to present a more or less united front,particularly during group interviews. In other situ-ations, a private conversation may well beperceived as an insinuation that group discussiondoes not permit the frankness desired. Choice ofmethod can also encourage or hinder the disclo-sure of stakeholder perspectives.

Impartiality can be especially problematicwhen evaluators help develop the programme,formatively support its implementation and thendescribe and assess its results and effects (88).This inevitably leads to role conflicts, challengingprofessional competence to the utmost. Thisconflict could, no doubt, be avoided by replacingthe evaluation team between the formative andthe summative stages. However, this wouldincrease the cost of the evaluation. This standardthus places individuals in two or more incompat-ible roles in some evaluations (89).

6.3.5. F5/P5: disclosure of findings‘To the extent possible, all stakeholders shouldhave access to the evaluation findings.’

This DeGEval standard focuses on informingthe stakeholders. ‘If an evaluation should serve to


(88) This is typical of pilot projects run by the BIBB (Section 4.1). (89) The SEVAL K6 standard Declaration of conflicts of interests is formulated ‘more realistically’ than the DeGEval standard.

improve, justify and boost comprehension ofcontinuing training initiatives, the relevant partiesshould also have access to the evaluation inves-tigations.’

Reischmann (2003, p. 256) goes a step further,referring to JC standard K6. He points out that ifconfidentiality does not dictate otherwise, it issensible to disseminate the report, e.g. amonginterested colleagues, decision-makers, the massmedia and academic journals. This is the way toreach a much broader audience, i.e. academics,politicians and the general public.

Publication of findings can trigger conflicts invocational training pilot projects. Evaluators andacademic research institutions are predominantlyinterested in publishing their findings becausethey largely influence their reputation in academiccircles and/or on the evaluation market. Pilotproject sponsors, in contrast, have little interestin, or are even opposed to, publication as theyfear that the competition could benefit from theirknowledge (Zimmer, 1998, p. 600). They areagainst any transfer of findings other thanself-presentation as an innovative enterprise.According to Zimmer, evaluators also tend to notto favour transfer, as further pilot projects in othercompanies could lead to new contracts.

The consequences of this standard for evalua-tions conducted through private enterprises, thestate or public institutions, i.e. institutions with taxadvantages (such as foundations) must be adaptedin various ways, as the phrase ‘[...] as far aspossible’ indicates. In enterprises, ‘public’ basicallymeans the entire company, and therefore encom-passes management, staff and shareholders. Thepublicly financed sector targets a much wider audi-ence, incorporating the mass media and citizens. Ifa particular evaluation is largely funded as apublic-private partnership, its commissioning mustcontain optimal clarification to avoid subsequentdisagreements and even legal action.

Public commissioners also have interests whichmust be protected, e.g. when a programme is stillbeing developed and the evaluation commissionerdiscovers serious deficits at an early stage.Arrangements should be made for this eventuality.

This standard is closely related to the accuracystandards: ‘the publication of the research

methods, in particular of the identificationassumptions underlying the derivation of a set ofresults, and on statements regarding the extent ofany remaining uncertainty’ (Schmidt, 2001, p. 7)is important.

6.4. Commentary on the accuracystandards

‘The accuracy standards are intended to ensurethat an evaluation produces and discloses validand useful information and findings pertaining tothe evaluation questions’ (DeGEval, 2002, p. 10).

The nine standards in this group can be brokendown into four categories. The first two standards(G1/A1 and G2/A2) address the definition of theevaluand in context and demand a description ofit. The next two standards (G3/A3 and G4/A4)demand identification of purpose, proceduresand the information sources used in the evalua-tion. The next four standards refer to the actualprocesses for collecting, monitoring, evaluatingand utilising data to draft conclusions. Theydefine requirements for gathering and siftinginformation to reach findings. Standard G9/A9imposes meta-evaluations as a method of evalu-ation quality assurance and improvement.

Literature on quality requirements for VET evalu-ations only touches on some aspects of the accu-racy standards. The standards for empirical datagathering, in particular, match the criteria whichapply to the quality of scientific investigations ingeneral. Social science textbooks detail thesecriteria, possibly explaining why they are notaddressed separately (90). However, we mustremember that evaluations are not conductedexclusively by empiricists and that evaluationcommissioners using the standards should betaught to recognise ‘good evaluation’ characteris-tics. Standard sets for these accuracy requirementshave been issued by national trade organisations,academic societies and research promotion institu-tions like the German Research Committee (Assur-ance of good scientific practice) (91).

As evaluations are a special application ofempirical scientific methods, the quality criteria,


(90) See, for example, the references in the explanatory notes to the DeGEval Standards G5/A5 and G6/A6.(91) ibid.

which stem from basic research, must beadapted to the evaluand environment. This goesfor VET and other evaluation applications.

Curiously, VET literature on accuracy standardsfocuses more on pilot training projects andmacroeconomic evaluations than on businessstudies. The first two types prioritise generalis-ability of findings, whereas application in the thirdarea often only concerns one company.

6.4.1. G1/A1: description of the evaluand‘The evaluand should be described and docu-mented clearly and accurately, so that it can beunequivocally identified.’

Description of the evaluand enhances under-standing of results and findings and clarifieswhether, and how far, they can be transferred tosimilar programmes. The entire programme or itsparts and their relevant characteristics should bespecified. A distinction can be made between: (a) concept: goals, content, didactic focus, dura-

tion, scope in hours of attendance;(b) input: number, gender, age and previous qual-

ifications of participants and number andqualifications of trainers;

(c) structure: sponsoring organisation, premises,financial outlay, teaching and study aids.

Alongside this informative function, resultstransferability can only be assessed if the eval-uand is identifiable. This standard is closelylinked to disclosure of findings (F5/P5), G3/A3and G5/A5 and is also discussed there.

Users often do not value evaluation reportswhich lack such basic information.

6.4.2. G2/A2: context analysis‘The context of the evaluand should be examinedand analysed in enough detail.’

Schmid (1996) notes that conventional evalua-tion research tends to neglect economics. Consid-eration of the context, however, is vital when inter-regional, intertemporal or international lessons areat stake. Turbin (2000) also states that VETsystems are firmly rooted in their social context. Ifwe are to learn from evaluations – best practices inother countries – the socioeconomic variableswhich affect these programmes and policies haveto be identified. The ultimate goal is to assimilatethe lessons in different environments.

Context also includes programme implementa-tion conditions. Schmid calls for both programme


Figure 8: Nine components which make evaluations accurate

Description ofthe evaluand

Contextanalysis

Describedpurposes andprocedures

Disclosure ofinformation

sources

Valid andreliable

informationSystematicdata review

Analysis ofqualitative and

quantitativeinformation

Justifiedconclusions

Meta-evaluation

evaluation and goal-oriented evaluation proce-dures, describing them as ‘guidelines for interna-tional comparative research’ (Schmid, 1996,p. 205). His type of analytic strategy includesstructural components of labour-market policyregimes and institutional components. Here he isimplicitly referring to the organisational structureof political regimes, their responsiveness orimplementation forms and their organisationalefficiency (Schmid, 1996, p. 210).

These elements far exceed mere contextdescription and could fall under evaluationpurpose (N2/U2). This underlines the demand forcontext description.

According to Zimmer (1998) and Kaiser (1998),an evaluation of vocational training pilot projectsnaturally includes analysis of the social,economic, technical, occupational, cultural andeducational environment, requirements andconditions for the project concerned and articula-tion of these circumstances.

Stakeholder information needs (A1/U1) and theevaluation purpose (N2/U2) dictate the scope anddepth of programme context description.

6.4.3. G3/A3: described purposes andprocedures

‘Object, purposes, questions and procedures ofan evaluation, including the applied methods,should be accurately documented and described,so that they can be identified and assessed.’

The evaluation purposes should be specifiedand described both as a basic orientation in thedetailed planning and in the evaluation report(N2/U2). The questions formulated at the begin-ning of the evaluation and the way they wereadapted and extended should be recorded sothat it is possible to judge whether the evaluationhas answered them adequately (N4/U4). Timing,phases, methods applied, sampling and evalua-tion procedures should be presented. Thedescription should also document any subse-quent changes.

Although the DeGEval standards areconstructed as maximum standards with scopefor interpretation, as explained above, this stan-dard specifies elements which no evaluationdescription should lack. Due to G3/A3’s univer-

sality, no VET adaptation needs exist, as in G4/A4and G6/A6.

6.4.4. G4/A4: disclosure of informationsources

‘The information sources used in the course ofthe evaluation should be documented in appro-priate detail, so that the reliability and adequacyof the information can be assessed.’

This standard is also fundamental for assuringvalid empirical practice. It should reduce thedanger of ‘unscientific procedure’, particularly inresearch. Sources should be quoted precisely toguarantee intersubjective reliability. StandardG7/A7 is also relevant here. A mixture of methods,both qualitative and quantitative, can reduce therisk of erroneous procedure (Antoni, 1993) (92).

Summative evaluations generally use quantita-tive methods, and formative evaluations usuallyemploy qualitative methods, although it is virtu-ally impossible to separate the two approachesstrictly. Moreover, evaluators are often required touse both a formative and a summative procedureand apply the two methods appropriately. Antoni(1993) advocates an integrative approach.However, there is a danger that a comprehensiveevaluation approach could lead to insufficientcontrol and an increase in costs, rendering theevaluation impossible to implement. Data ongeneral and vocational training and training andlabour markets should be appropriately corre-lated. Discussion of the use of qualitative andquantitative information often reflects the conflictbetween micro- and macro-perspectives.

Incorporation of macroeconomic data whenevaluating initial and continuing training is oftenrequested, as public investment may be involved.Benefits to society as a whole and not just for theindividual participants or a company and its profitor efficiency, are then of interest. Brüss (1997,p. 119 f.) argues against an obligation to linkmicro- and macro-perspectives in evaluations. Ifthe programme budget constitutes only a tiny frac-tion of government spending, it is almost impos-sible to measure any macroeconomic effects ofthe programme. Furthermore, labour-marketprogrammes on employment show that their influ-ence is outweighed by other factors, such asgeneral business trends.


(92) See more detailed discussion under G7/A7.

James and Roffe (2000, p. 13) point out thatproblems can arise between evaluators andcommissioners if the latter have specific methodsin mind. Evaluators must then justify themselvesif they opt for less well-known methods, such asfocus groups.

The literature shows there has been in-depthdiscussion of method selection and applicationfor VET evaluations. Methodological issues ofteninfluence the choice of a specific evaluationmodel (Section 2.4).

We recommend referring to the Internationalhandbook of labour market policy and evaluation.This manual presents various methods andendorses them for certain investigations, e.g.experimental and non-experimental designs forevaluations of labour-market policy. Theseapproaches are hotly debated in Europe (93).

The goal-oriented evaluation model is designedto avoid the negative effects of one-dimensionalimpact assessment. They could be avoided bymore complex analysis, including study of thesocioeconomic context, monitoring and impact.Process-based and dialogue-oriented evaluationprocedures are preferred (Schmid et al., 1996,Bangel et al., 2000). These authors urge marryingquantitative and qualitative procedures.

Empirical methods and tools should be properlytailored to evaluation purposes and the evaluand.Most authors claim neutrality in terms of thevarious research models. However, methods andtools dictate structure and are in no way neutral.The DeGEval G4/U4 standard commentary onanalysis of qualitative and quantitative informationdemands that attention be paid to the validity ofmethods and their limitations. Kaiser (1998, p. 540)calls for elaboration of an evaluation concept andthe publication of survey methods and dataprocessing systems. We could go a step furtherand demand full disclosure for the evaluationmodels as well as for the methods employed.

6.4.5. G5/A5: valid and reliable information‘The data collection procedures should bechosen and developed and then applied in a waythat ensures the reliability and validity of the datawith regard to answering the evaluation ques-tions. The technical criteria should be based on

the standards of quantitative and qualitativesocial research.’

Validity and reliability are fundamental prereq-uisites for empirical investigations. Numerousdistinctions exist, e.g. between internal andexternal validity, content, criterion and construc-tion validity, etc. These originated in quantitativeresearch (e.g. testing procedures).

This is highly relevant for VET evaluationswhich use quantitative methods such as aptitudetests, personality inventories or standardisedachievement tests for evaluation purposes. Thesegauges of quality are also essential for evaluationmodels which are based primarily on quantitativeprocedures. Schmidt (2000, p. 429) thereforeadvocates including an appropriate control groupin labour-market policy evaluations. A convincingprogramme evaluation would hinge on this.Schmidt claims process analyses or before-and-after comparisons cannot replace thiscomparison situation. The literature speaks of the‘fundamental evaluation problem’, as a counter-factual situation must often be postulated.Non-experimental procedures require educatedestimates of what would have happened iftrainees had not participated in the measure.

Commissioners often have input regardingmethod selection and choice of appropriate meritcriteria, or they have preconceptions of whatmethod to use. We can also assume that methodselection also depends on the evaluator’sspeciality. Like Schmidt (2000), leading econome-tricians demand the construction of quasi-experi-ments or comparison situations. Evaluators witha teaching background may prefer to gatherbiographical data and narration from programmeparticipants.

Different evaluation approaches entail differingmethodological preferences. One approachadheres to the university research tradition and isoften employed by academics. It chiefly relies onquantitative methods and indicators. Seyfried(1998) says this is too far removed from reality.Other approaches favour management methodseven for evaluating training programmes. Forexample, the European Foundation for QualityManagement process measures the quality offindings with either (monetary) benchmarks orparticipant statements. This engenders consider-


(93) Heckman and Smith (1996) or Nobel Lecture, Heckman (2001); see also Schmid (1996).

able validity problems (what is being measured:the learning and transfer results of the training orpurely teacher or trainee attitudes?)

We recommend considering Cronbach’s posi-tion (94). He describes evaluation as an art which,as such, differs fundamentally from science. Hemaintains that each evaluand involves an attemptto supply the commissioner and other interestgroups with the maximum useful information forthe given situation. Methodological standards,therefore, sometimes play a subordinate role inevaluation research. One would sometimes haveto be content with a ‘fair research design’ andchiefly consider commissioner and stakeholderinterests.

Undoubtedly, evaluations constantly have tocompromise methodological quality in the face oftight deadlines and budgets, but failure to reach acertain minimum methodological level should beregarded as substandard and unacceptable.

These two stances were outlined to demon-strate briefly possible interpretations of qualityrequirements for evaluation methods. They repre-sent a broad spectrum of opinions on (internaland external) validity and reliability and weighthem differently. Stufflebeam (2001) provides anoverview of the various evaluation models, whichalso require different methods.

The standards demand valid information andthis element of validity is not subdivided intointernal or external validity. Various evaluationresearchers (95) believe that external validity, i.e.the extent to which we can generalise findings, isa research issue and not an evaluation criterion.They say we must draw a clear line. Even if noexplicit distinction is made between external andinternal validity, the first sentence of G5/A5suggests that it refers to internal validity, i.e.answering the evaluation questions, which willrarely refer to external validity. However, thesecond sentence of the standard speaks ofcriteria for quantitative and qualitative socialresearch merit.

For internal validity, i.e. when effects appearwhich have not been caused by trainingmeasures, we must note the following factors:history, maturation, testing, instrumentation,

selection, mortality and the Hawthorne effect.Awareness of these potential effects is essentialto avoiding them.

One final critical note on this standard: the meritcriteria for validity and reliability are quantitativesocial research traditions. Some authors maintainthat these criteria can be transferred, or at leastadapted, to qualitative methods (Bortz and Döring,2002, pp. 327-329). Others propose separatecriteria for qualitative methods, such as trustwor-thiness instead of validity, dependability instead ofreliability, and transferability instead of generalis-ability (Guba and Lincoln, 1989, pp. 233-251).

6.4.6. G6/A6: systematic data review‘The data collected, analysed and presented inthe course of the evaluation should be systemat-ically examined for possible errors.’

Collected facts and figures must be checked foraccuracy. Pitfalls can occur in any phase of datagathering and evaluation and infallibility cannot beguaranteed. Wottawa and Thierau (1998) thereforerecommend correcting project-related errors bymeans of organisational measures.

Professional standards dictate plausibility testsfollowing data analysis. Plausibility tests involveidentifying improbable data, e.g. by checkingminimum/maximum values, compiling ratios (e.g.continuing training expenditure in euro peremployee), calculating group averages, etc.Monitoring homogeneity (obtained from the vari-ance) of the data actually obtained in relation tothe overall sample can provide key indicators. Inaddition, Schmidt (2000) stresses stating allpotential sources of error in an evaluation report.

6.4.7. G7/A7: analysis of qualitative andquantitative information

‘Qualitative and quantitative information shouldbe analysed in an appropriate, systematic way, sothat the evaluation questions can be effectivelyanswered.’

Elaboration of data evaluation plans beforecommencing the actual data processing is indis-pensable. Interpretation can then be tailored to thequestions and assumptions which form the basis ofthe evaluation design. This also encourages


(94) Cronbach (1982) refers not only to VET but also to educational and social programmes which encompass VET.(95) Fitz-Gibbon, among others, at the EES conference (EES 2002) expounded the opinion that monitoring external validity could

not be the task of an evaluation.

targeted and efficient interpretation, which is essen-tial for both quantitative and qualitative approaches.

Antoni (1993) writes that a combination ofquantitative and qualitative procedures is oftenappropriate for meeting the situational require-ments of evaluation problems in organisationaland work psychology in particular. ‘[...] It is clearthat qualitative and quantitative methods shouldbe used in a complementary fashion in order toderive the highest value from research in thisarea,’ observes Barrett (1998, p. 20) in onechapter on the relationship of quantitative andqualitative methods in continuing vocationaltraining. He emphasises the importance of quali-tative methods for enhancing quantitativeapproaches and vice versa. Gaude (1997) alsounderlines the significance of linking quantitativeand qualitative procedures. Only quantitativeprocedures can demonstrate the effects of furthertraining programmes on employment and income.Qualitative studies, in contrast, are necessary toexplain why some programmes are moresuccessful than others. They are also the onlymeans of showing possible ways to improveprogrammes. Gaude deplores the mutual isolationof the various research disciplines, which meansthe more qualitative evaluations often lack infor-mation on the effects on income and employment,and qualitative evaluations are seldom conductedas part of quantitative evaluations.

6.4.8. G8/A8: justified conclusions‘The conclusions reached in the evaluationshould be explicitly justified, so that the audi-ences can assess them.’

Conclusions condense the data gathered andtheir interpretation into findings, which may takethe form of tenets, for example. This is a sepa-rate, essential task which the evaluator mustperform. Reports which mainly present data,including diagrams, but which make no effort todraw conclusions from the summarised findings,are unacceptable.

However, it must also be possible to follow theargumentation of conclusions. Rieper (1997; p. 42)quotes a European expert examination of ex postevaluations which reveal that ‘most reportsneglected to explain the type of information onwhich their conclusions were based and how thisinformation had been obtained.’ This lack of trans-parency with regard to the methods applied makes

it difficult to assess the credibility and the potentialuse of the results and conclusions.

Disclosure of specific difficulties, surveymethods and forms of data interpretation is partic-ularly important for evaluating pilot projects (Kaiser,1998, p. 540I). Some authors also believe that asummary of the findings at the end of the investi-gation should be accompanied by derived recom-mendations. Conclusions and recommendationsare closely connected. ‘To summarise, we canconclude the following from this measure on thebasis of our evaluation [...] We deduce the followingrecommendations […]’ (Reischmann, 2003, p. 255).

The DeGEval standards do not specify thatrecommendations should be part of the evaluationor the report, as the evaluation model determineswhether the evaluator is responsible for providingrecommendations as well as drawing conclusions.

6.4.9. G9/A9: meta-evaluation‘The evaluation should be documented andarchived appropriately, so that a meta-evaluationcan be undertaken.’

Seyfried (1998) comments that very little trans-parent communication and discussion ofmethods and findings from VET evaluations canbe found in Europe. Meta-evaluations couldredress this. Lindley (1996) has developed amodel for increasing evaluation transparency forEuropean ESF projects which should also facili-tate meta-evaluation. He proposes compiling aEuropean evaluation database, which would listevaluations according to key indicators andrecord programme characteristics and theirvarying effects. He also recommends creatingspatial typologies. Rieper (1997) also believes thatcomprehensive meta-evaluations are necessary toimprove the quality of European Commissionevaluations. The EU is a major commissioner.

Publication of evaluation reports could affordmore opportunities to conduct meta-analyses aswell as meta-evaluations. Disclosure of findings,as F5/P5 stipulates, should really be categorisedunder impact of the evaluation results. Incontrast, the meta-analysis option is particularlyinteresting for those not affected, such asresearchers, programme developers and govern-ment budgeters. This is the only way for evalua-tion findings to flow into strategic planning anddecision-making processes (Fay, 1997, p. 113).


6.5. Proposals for expandingexisting standards

Below we list gaps or ambiguities in the DeGEvalstandards that should be discussed at Europeanlevel and clarified when further developing andadapting the evaluation standards.

6.5.1. Selection of the evaluation modelNumerous evaluation models are presented in theoutline of evaluation standards and detailed inEuropean literature. They differ considerably in theirepistemological basis, their identification of values,their focus on specific elements of the evaluatedprogramme (e.g. goals versus process versuseffects), and many other issues. The reasons forselecting a particular model and its assumedstrengths and limits are seldom discussed whenthe evaluation contract is processed.

All the well-known sets of standards fail toprescribe explicit disclosure of the selected eval-uation model. Such a standard could furtherclarify the interaction between commissionersand contract recipients. It would also encouragemore explicit presentation of evaluation theoryand expose it to critical debate. In any case,evaluators should give their grounds for selectinga particular evaluation model (or combination ofseveral models) and review them when themission is accomplished.

6.5.2. Selection of suitable methodsChoice of method is often, although not always,closely linked to selection of the evaluationmodel. The DeGEval standards mention methodselection frequently. In standard N4/U4, Informa-tion scope and selection, method choice focuseson the utility of the information gathered. Stan-dard D1/F1, appropriate procedures, prioritisesminimising inconvenience to the evaluand andthe stakeholders in relation to the expectedbenefit of the evaluation. The explanatory notesto this standard point out that ‘the most conclu-sive methods from a scientific point of view areoften unsuitable because they are too laboriousor ethically unacceptable in the situationconcerned. The evaluation team should clarifyadvantages and disadvantages and justify therelevance of the chosen procedure.’

Some sources scrutinise methodologicalaspects. Surprisingly, the DeGEval standards

feature no separate standard on investigationdesign choice and thus method justification.Methods should encourage optimal response tothe evaluation questions. The MEANS handbookscontain short guides to selecting various methodsat different points in the evaluation (prospectiveversus retrospective analysis) and for differenttypes of evaluands (e.g. overall programme evalua-tion versus in-depth evaluation tools). (EuropeanCommission, 1999b, Vol. 3, p. 219).

Method selection often involves making andjustifying a decision on control groups or othersuitable survey designs. The literature frequentlyinsists that various levels, such as micro- andmacro-evaluation, must be dovetailed if evalua-tion is to be meaningful. It also focuses on theproblems of selecting and linking quantitative andqualitative survey methods. Here we discern agap which expanding the existing standards orformulating a new standard could close.

Selection of the right evaluation methodologycould be crucial. It is a vital condition for evalua-tion success. Many evaluation methods exist. Notevery evaluation method is suitable for everyevaluation purpose. The optimal solutiondepends on the questions and the solutionssought (evaluation purpose). ‘It is important tochoose the right methodology for evaluation, andthere is a broad range to choose from. The factthat there are so many different approaches inuse really reflects the view that no single method-ology can be universally applied. The optimumchoice depends on the questions and solutionsthat are sought.’ (James and Roffe, 2000, p. 17)

The MEANS handbooks consider methodselection in relation to the evaluation teamprofile and to assessment of the quality of anevaluation bid. European Commission (1999b),Vol. 1, pp. 82-83. They point out that choosingwhether to award a future evaluation contract toa business consultant or university researcheralso constitutes a decision for or against acertain approach. They suggest setting abudget, rather than stipulating a method, ideallyin invitations to tender, and then selecting theteam with the most interesting methodologicalproposal. The commissioner must then judgebid quality in a subsequent step. The method ormethods must be the best to answer theprescribed questions.


6.5.3. Explicit reference to evaluation oftraining programmes

Standard P1 from the JC standards (serviceorientation support) does not feature in theGerman and Swiss evaluation standards as itrefers explicitly to training programme evaluation,whereas they are intended to be universally appli-cable. Standard JC-P1 reads as follows. ‘Evalua-tions should be designed to assist organisationsto address and effectively serve the needs of thefull range of targeted participants.’ It adds thatevaluations should also play a supporting role inensuring that education and training goals areappropriate and that sufficient attention is paid tolearner development, that promised services arerendered and that non-beneficial or even harmfulprogrammes are abandoned. In this way evalua-tions should contribute towards making projectsaccountable to stakeholders and society. Evalua-tions in the VET sector should basically bedesigned to serve the interests of current orfuture learners.

Evaluators, commissioners and politiciansmust look beyond the short and medium-terminterests of programme organisers and spon-soring organisations and also focus on the devel-opment of the educational system and its interac-tion with society.

An additional VET standard corresponding toJC-P1 would be feasible. It could be based onthe guidelines to this JC standard, which includethe following (96):(a) ‘evaluations should be planned which foster

the quality of programmes for education,initial and continuing training.’;

(b) ‘evaluations should serve to identify intendedand unintended effects of the programme onthe learners.’;

(c) ‘teaching and learning processes should bedisrupted as little as possible, but an effortshould be made to realise the evaluationproject.’

JC-P1 content and comments could be highlyrelevant to VET evaluations in Europe.


(96) Guidelines A, D and H.

7.1. Objectives, questions andmethod of the study

The objective of the report is to reflect the trans-ferability of evaluation standards in the EuropeanVET context. The following initial questions areconsidered:

Does the terminology of the standards matchconcepts in the area of European initial andcontinuing vocational training? Are any standardsnot applicable in the context of initial and contin-uing vocational training? Do European evaluationexperts understand and accept the key concepts(e.g. definition of evaluation, differentiationbetween formative and summative evaluation,purpose of evaluation, etc.) conveyed? Are thereany specific national differences which should beconsidered in defining standards? The standardsof the DeGEval (2002) form a reference point forthe analysis. Other relevant standards arepresented and reflections on intercultural trans-ferability and applicability to the VET evaluand aremade. In further discussion the opinions ofexperts are included. This occurs first in docu-mented events on the standards attended byvocational training experts in Germany andAustria. Second, evaluation experts in widelydivergent European countries are sent a ques-tionnaire. Finally, the DeGEval standards are alsodebated in commentaries and in the formulationof criteria in recent European literature on VETevaluations.

7.2. Results and conclusions

7.2.1. Standards for programme evaluationThe background, evolution, and constitution ofDeGEval’s evaluation standards are described insome detail. The DeGEval standards draw heavilyon the constitution and content of the mostwidely known standards of the JC. In 1981 thecommittee first published the Standards for eval-uation of educational programs, projects and

materials (JC, 1981) and issued a revised edition,The program evaluation standards, in 1994.DeGEval standards consist of standards for eval-uation assigned to four different groups withexplanatory notes and annexes (DeGEval, 2002),the English translation of which is attached to thisreport. Evaluations should thus demonstrate thefollowing four basic attributes: utility, feasibility,propriety and accuracy. It is presumed that anevaluation will simultaneously take account of allfour criteria to fulfil specialist and professionalrequirements. The 25 standards are divided intothese four categories.

7.2.2. Transferability of standardsTo summarise, we can say that there are nodiscrepancies or contradictions between thevarious European evaluation standards presentedhere. Their respective evolution, emphases anddifferentiation bear witness to differentapproaches. The development of standards insome European countries, such as Germany andSwitzerland, draws on US evaluation standards intheir constitution and contents. Other countries,such as France and Finland, attempt to create oradopt their own. In France, for example, theaspect of social utility is more intenselydiscussed, while in UK attention is primarily givento the coordination processes between thevarious interest groups. Differences are alsoapparent with regard to the varied portrayals.Some sets of standards are more generallyformulated – such as Finnish ethical guidelinestandards – while other sets are very concreteand prescriptive, such as those governing thereadability of reports. Many of the standardsconsidered contain the principal components ofthe DeGEval standards. Equally, the EuropeanCommission’s MEANS criteria and the guidelinesof the International Labour Office (ILO) showsome similarities with DeGEval standards or theirAmerican bases.

The original US standards were initially used ineducation and were then applied to programmeevaluations in other policy areas, which in turninfluenced their content. Since the standards

7. Summary and outlook

originated in education and are supposed to beapplicable to all policy areas, the initial presump-tion is that they are also valid in VET. Furthermore,seven illustrative examples from JC Standards aredrawn from in-company vocational training, andare therefore directly applicable to VET.

7.2.3. Results from group discussion on theapplicability of DeGEval standards tovocational training

In general the standards for evaluation have beenwell received by evaluation and/or VET expertswho participated in investigations within theframework of the discussion. Neither the Germannor the Austrian experts have any reservations asto the applicability of evaluation standards to thefield of vocational and continuing training. Nodoubts were expressed as to the transferability ofthe standards to vocational training as an eval-uand with specific institutional arrangements (forexample, the dual system of vocational training inGermany). The experts do not propose a specificadaptation, although they would like to seecertain standards illustrated by examples frominitial and continuing vocational training.

In line with the publications of the JointCommittee, there is a call to have standardscomplemented by extensive explanatory noteswith guidelines and practical examples from VET.It would be a great advantage for these to includeexplanations of concepts such as the differencebetween stakeholders, addressees and otherusers. Furthermore, workshop participants,particularly academics from universities andpublic and independent research institutes, havedrawn attention to the stark conflict between thestandard group’s utility and accuracy. Expecta-tions of immediately utilisable results and thedemands of empirical social and economicresearch for such qualities as the validity ofinstruments and the reliability of data compilation,are often in direct competition.

Some participants, who have been working foryears within a particular discipline or theoreticaltradition, have expressed initial concerns thattheir methodology might not be adequatelycovered by the evaluation standards. A generaldesire for maximum standards has been accom-panied by ambivalence to them. On the onehand, participants appreciate the advantage thatmaximum standards can refer to many differentapproaches and types of evaluations and impact

investigations. On the other hand, representativesof certain schools of thought bemoan the lack ofobligatory minimum standards. Furthermore,participants want more emphasis on the fact thatthe standards do not prescribe a given evaluationapproach. The explanatory notes to the stan-dards already state that there are ‘numerousdifferent approaches to professional evaluation’and that these contrast starkly depending onepistemological approach, discipline and profes-sional ethics. The pluralistic foundation of thestandards is sometimes, however not immedi-ately, clear to experts the first time they read thetext. They often worry that the standards willhave a restrictive effect on the approach theyadvocate, or even exclude it entirely.

One important point for further research shouldbe the question of whether other European coun-tries are aware of self-evaluation approaches toVET evaluation or whether they have their owninterpretation of external or internal independentevaluation.

7.2.4. Survey of evaluation experts in EuropeThe experts interviewed generally have a positiveattitude towards standards for evaluation. Noneof the respondents feel that standards do notmatter or are unnecessary or even harmful; andthey express a preference for maximum stan-dards.

The best-known sets of standards are the USjoint committee standards for evaluation and theAmerican guidelines for evaluators. The vastmajority of the respondents named a minimum ofone set of standards with which they are at leastfamiliar.

The main benefits of evaluation standardsnamed in critical discussion are improvement inthe quality of evaluations and the opportunity tomake evaluators’ work more legitimate and trans-parent. However, they fear that utilisation of thestandards could restrict the plurality and flexibilityof evaluations in theory and in practice, or thatstandards could be applied too rigidly. Themajority of respondents mention no preferablealternative to the standards.

Respondents named involvement of all stake-holders, transparency and use of a wide variety ofsuitable methods as the most important hall-marks of evaluation standards.


7.2.5. Reflections on VET evaluationstandards literature

Not all standards are equally applicable to everyevaluation project. This also goes for VET evalua-tions, of course. Nevertheless, the validity of eachindividual DeGEval standard has been confirmedin various VET contexts. The literature studiedcontains concrete quality requirements, adviceand guidance on using evaluations whichresemble the individual DeGEval standards. Wecan thus illustrate the individual standards in termsof the VET evaluand and its characteristics. More-over, we can refine some of the standards further.

The text, therefore, features a brief introductionto each group of standards, followed by acommentary on individual standards. Thiscommentary is partly illustrative and descriptive,and partly more reflective, depending on theconflict potential which each VET standardcontains. At the same time, it is noteworthy thatthe utility, feasibility, and propriety standardgroups yielded many more points of reference forVET evaluation than do those of accuracy, whichrepeatedly formulate universal demands onempirical investigations.

The polarity between the groups of standardsrelating to accuracy and utility proves to be thecause of a lasting and irrevocable clash. Onoccasions, expectations of immediately utilisableresults and the demands of empirical social andeconomic research for such qualities as validityand reliability are scarcely reconcilable, so thateither one or the other must make sacrifices.

7.3. Outlook

The following contains proposed tenets for theutilisation and elaboration of evaluation standardsin European initial and continuing training.

European and national organisations workingin VET should determine a set of standards by agiven deadline to provide orientation and guide-lines for professional VET evaluations theycommission.

The selection and prescription of such a set ofstandards should be undertaken throughdialogue between European evaluation societiesand supplemented by academic specialist andprofessional associations operating particularly inthe field of VET. It may be advisable to allow

associations, in particular the European Evalua-tion Society, to take the initiative.

The text accompanying the evaluation stan-dards should emphasise that, in the light ofnational and disciplinary peculiarities, evaluationtheory and practice evidences divergent tradi-tions and models and that appropriate adaptationof standards is possible and desirable.

The duty of evaluation to the ‘common good’may be addressed in the context of national eval-uational tradition, but should, however, be set at aEuropean level in cases where broader consensusexists.

The application of evaluation standardsthroughout Europe demands a suitable degree ofintercultural competence on the part of evaluatorsand evaluation commissioners, which is to bepromoted by appropriate training and processes ofsystematic reappraisal (e.g. in European evaluationjournals and international congresses).

Standards are to be maximum standards. Thedescription should be as plain as possible andcite the ideal that an evaluation is to strivetowards in the respective categories for it to bejudged high quality. Such maximum standardsoffer clear orientation, but also leave sufficientscope for flexibility, national and local adaptation.

Attention is to be drawn to the inappropriate-ness of rigid application of evaluation standardsand to continuing incompatibility between indi-vidual standards, particularly between those ofaccuracy and utility.

Publications on evaluation standards in Europeare to contain key definitions for concepts suchas evaluation, evaluation model, evaluationpurpose, evaluation questions, formative evalua-tion, process evaluation, etc. A multilingual glos-sary could improve cooperation between evalua-tors working within European programmes andpolicy-making.

Sets of standards are to contain evaluationstandards that generally apply to evaluation andtherefore also to VET evaluation. Accompanyingmaterial must be made available for VET. Thisshould offer illustrative examples from conductedVET evaluations for as many relevant system levelsas possible (i.e. self-study, companies/schools,associations of learning locations, communitiesand regions, national and pan-European VETprogramme). This is essential to demonstrate thevalidity of evaluation standards to all VET system


levels and to minimise existent reservations amongprofessionals who are unfamiliar with standards.

Because of the present gap in research,meta-evaluations are to review systematicallywhether evaluation standards are applicable andappropriate to VET evaluations of nationalprogrammes or to EU policies, and which addi-tions are necessary, particularly to supplementarymaterials, to steer and evaluate evaluations.

A separate, general evaluation standard shouldbe formulated which calls on those responsiblefor evaluations to explain the model or modelsused for a given evaluation and to justify its/theirsuitability to the evaluation in question. Such acall for disclosure and justification might supportthe propagation of evaluation models, themooting of their strengths and weaknesses andthe culture of meta-evaluation.

Formulation of an additional VET-specific stan-dard is proposed. This standard, Quality orienta-tion support in vocational training, could read asfollows: ‘Evaluations should assist VETpolicy-makers and programme managers to meetquality requirements within the vocational trainingsector (VET standards). These particularly includestandards which require evaluations to considerthe needs of target groups, social partners andsociety, have a scientifically founded theoreticaland teaching concept, help shape the structureand organisation of political education and help

manage educational processes and ensure theprofitability of VET activities.’ The explanatorynotes on this standard should mentionwell-known, recognised VET standards and pointthe way to the most important sources.

Since teaching personnel in VET in particularalways question the tool of self-evaluation, evalu-ation standards should specify that this isprimarily relevant to internal and external investi-gations undertaken piecemeal by specialists.Self-evaluation in the field of education shouldemploy a set of standards oriented towardsgeneral standards and adapted to those ends.

As the professionalisation of evaluation is stillfairly recent in the majority of Member States,further investigations, founded on a broad empir-ical usage of data, are particularly necessary toclarify matters bearing on the compatibility ofculturally sensitive individual standards and therole of self-evaluation in VET. We consider work-shops and conferences related to data collection,as used in these studies, especially useful tothese ends.

This report appraises standards for programmeevaluation. In the future, further evaluation stan-dards such as Personnel evaluation standardsand the Student evaluation standards are to beanalysed in relation to VET and their transfer-ability across Europe and their intercompatibilityexamined.


List of abbreviations

Abbreviations of individual standards used in textNn/Un (Nützlichkeit/Utility), Dn/Fn (Durchführbarkeit/Feasibility), Fn/Pn (Fairness/Propriety) and Gn/An(Genauigkeit/Accuracy) refer to DeGEval standards (2001). The capital letter to the left of the slash signi-fies the German original designation of the group of standards, the capital letter to the right of the slashthe English translation.

The Arabic numeral to the right of each capital letter indicates the individual standard in the order inwhich it is listed in the appropriate group of standards (e.g. N2/U2, Klärung der Evaluationszwecke/Clar-ification of the purposes of the evaluation).

A standard from Group U, F, P or A of the Joint Committee standards (1994) is addressed by prefixing‘JC’, e.g. JC-A7 – Systematic Information.

BIBB German Federal Institute for Vocational Training

BifEb Austrian Federal Institute for Adult Education

DeGEval Deutsche Gesellschaft für Evaluation [German Evaluation Society]

EES European Evaluation Society

ISO International Organisation for Standardisation

JC Joint Committee on Standards for Educational Evaluation

SEVAL Schweizerische Evaluationsgesellschaft [Swiss Evaluation Society]

SFE Société Française de l’Évaluation [French Evaluation Society]

Annex 1: transformation table

Deutsche Gesellschaft für Evaluation Joint Committee on Standards (US)

U1 Stakeholder identification U1 Stakeholder identification

U2 Clarification of the purposes of the evaluation missing

U3 Evaluator credibility and competence U2 Evaluator credibility

U4 Information scope and selection U3 Information scope and selection

U5 Transparency of values U4 Values identification

U6 Report comprehensiveness and clarity U5 Report clarity

U7 Evaluation timeliness U6 Report timeliness and dissemination

U8 Evaluation utilisation and use U7 Evaluation impact

F1 Appropriate procedures F1 Practical procedures

F2 Diplomatic conduct F2 Political viability

F3 Evaluation efficiency F3 Cost effectiveness

inapplicable P1 Service orientation

P1 Formal agreements P2 Formal agreements

P2 Protection of individual rights P3 Rights of human subjects

P4 Human interactions

P3 Complete and fair investigation P5 Complete and fair assessment

P5 Disclosure of findings P6 Disclosure of findings

in P4 unbiased conduct and reporting P7 Conflict of interest

in F3 evaluation efficiency P8 Fiscal responsibility

A1 Description of the evaluand A1 Program documentation

A2 Context analysis A2 Context analysis

A3 Described purposes and procedures A3 Described purposes and procedures

A4 Disclosure of information sources A4 Defensible information sources

A5 Valid and reliable information A5 Valid information

A6 Reliable information

A6 Systematic data review A7 Systematic information

A7 Analysis of qualitative and quantitative A8 Analysis of quantitative information information

A9 Analysis of qualitative information

A8 Justified conclusions A10 Justified conclusions

A4 Unbiased conduct and reporting A11 Impartial reporting

A9 Meta-evaluation A12 Meta-evaluation

Annex 2: questionnaire

Quality requirements for evaluations in vocational education andtraining (VET)

Dear ….we would like to invite you to participate in a pilot study on quality in evaluation. Please answer our shortquestionnaire. We would need it back at least until ……….

We contact you as one of about 30 experts in evaluation and/or VET from all European countries. Wehave got your name and address from ……, who recommended to contact you.

Aim of the surveyWe should appreciate your answer to the question, whether or not VET evaluations in Europe need aprofessional codified framework for securing and enhancing the quality of evaluation practice. We alsowould like to ask for your advice: which values and demands should be considered in such a framework?

This study is commissioned by the European Centre for the Development of Vocational Training(Cedefop), an agency of the EU. The results of this study will be included into the third Cedefop researchreport entitled Research on evaluation and impact of vocational education and training which will bepublished in 2004. The title of our paper will be Ethical and normative standards for evaluation practices.

This is a pilot study!We just started this pilot study to bring more clearness into an emerging field: the evaluation of VETmeasures and programmes in European countries. VET evaluation as a theme across the Europeancountries is just in its prime and we consider to need an open dialogue to promote it! You can read moreof the background of this pilot study in the attached description.

What is your investment? How to send back your answers?It takes around 10 minutes to fill in the following questionnaire. If you are very short in time pleaseanswer all closed questions and skip one or another open ended question which may not be so impor-tant for you. If you want to comment some questions in more detail we would appreciate it.

Please print out the document, fill it in by hand and fax it back to us (+49 221 4248072). If you wouldlike the document by fax please let us know (+49 221 4248071).

What do we offer for your active participation? We will compile a documentation and summary of the answers on this questionnaire and will send thismaterial and our theses/conclusions to you in autumn this year.

We would appreciate your feedback on our conclusions but this is really voluntary!End 2002, we will post you an electronic version of the survey report and ask you whether or not you

want to be mentioned as participant of the pilot study. You can find the following files as attachments:

(a) our questionnaire as a Word-file;(b) our questionnaire as a pdf-file;(c) a short description of the pilot study.

Thank you in advance for your kind cooperation.

Wolfgang Beywl Sandra Speer

Univation – Institute for evaluationZuelpicher Str. 58D – 50674 KoelnTel: +49 221 424 8071Fax: +49 221 424 8072

Quality requirements for evaluations in vocational education andtraining (VET)

About this studyThis study is commissioned by the European Centre for the Development of Vocational Training(Cedefop), an agency of the EU. The results of this study will be included into the third Cedefop researchreport entitled Research on evaluation and impact of vocational education and training which will bepublished in 2004. The title of our paper will be Ethical and normative standards for evaluation practices.In this study we will discuss important issues of applying standards to evaluations of VET measures andprogrammes.

What do we offer for your active participation? We will compile a documentation and summary of the answers on this questionnaire and will send thismaterial and our theses/conclusions to you in autumn this year. End 2002 we will post you an electronicversion of the survey report and ask you whether or not you want to be mentioned as participant of thepilot study in this document.

What about confidentiality?We will ensure full confidentiality of all information you give us and handle them anonymously. In thedocumentation we will only mention the country the respondent refers to (see question 3), and nonames. After we have finalised the final report, we will send it to you and ask whether or not you wantto be included in the expert list which will be added to the report. By doing so we want to enable you toexpress your considerations and arguments plus your emerging ideas and issues in an open manner.

Any questions/reservations?Please do not hesitate to contact us by e-mail ([email protected]) or phone (+49 221 424 8071). Wewill answer your questions immediately and phone back if you wish so (in this case please attach yourphone number).

Questionnaire (please mark the box belonging to the fitting answer ⌧)

1) Your primary position in/to evaluation (choose one alternative).� Client/sponsor /commissioner� Evaluator� Programme director/ programme staff� Other: …………………………………………………………….....................................(Please specify)

2) What is your main professional background? (choose one alternative)� Economics � Social and political sciences� Natural sciences� Liberal arts incl. pedagogics � Engineering� Other:……………………………………................................………........................…(Please specify)

3) The national professional culture you mostly identify with (this might be the country you have beeneducated/studied, the country you work in normally/at present, or it might be or not your nationalityin your passport).International country code: ...........................................................................................……...................



4) What is your relation to vocational educational and training (VET)? (choose one alternative)� VET is my main/most relevant working field� VET is one of my most relevant working fields� VET is a known field for me but I am (nearly) not active in VET

5) What are, in the nearer and distant future, the strongest competitors of evaluation in VET in thecountry you mainly work in (if you work on an international level please answer the question for theEU and its Member States)? (Please mark the best fitting category in each row)

Very strong Strong WeakVery weak/not existing

Auditing � � � �

Benchmarking � � � �

Certification/accreditation � � � �

Monitoring � � � �

Performance/results based management � � � �

Quality management/assurance � � � �

State supervision � � � �

Other: ………………………….. � � � �

Other: ………………………….. � � � �

6) If you would describe your general position to standards for evaluation, which of the following state-ments would mostly express your opinion? (Choose one alternative)� Standards for evaluation are not necessary or even detrimental � Standards for evaluation do not matter� Standards for evaluation could be useful; but I am not convinced whether they will be in fact� Standards for evaluation are important� Standards for evaluation are absolutely necessary

7) Preferred type of standards (If you answered ‘unnecessary’ or ‘do not matter’ in question 6, skip thisquestion)

There are two distinct concepts of standards

Minimum standards (as in engineering or work security): they describe ‘features of evaluation in a veryprecise, operational way; if one ore more standards are not fulfilled, the evaluation will be judged as ‘poor’ or‘non-professional’.

Maximum standards (as in education or consulting) are standards one should strive for; it should be clearly justified if one or more standards are not taken into account in evaluation practice. Some standards notconsidered within an evaluation would not automatically lead to a negative judgement of the evaluation as awhole.

Strongly prefer Prefer Cannot decide Prefer Strongly prefer

Minimum � � � � �

Maximum standards standards

Which kind of standards do you prefer for evaluation? (Choose one alternative)


8) How familiar are you with the following sets of standards/guidelines for evaluation? (Please mark onealternative within every row)

9) Pro’s for standards for VET evaluation.Please write down some arguments (if any) which call for a more intensive use of standards forevaluation in VET evaluation.………………………………………………………………………………………………................................………………………………………………………………………………………………................................

10) Con’s against Standards of VET evaluation. Please write down some arguments (if any) which speak against a more intensive use of standardsfor evaluation in VET evaluation.………………………………………………………………………………………………................................………………………………………………………………………………………………................................

11) Are there essential omissions?Please indicate essential omissions (what lacks?) in the standard set(s) you know which should be supple-mented for VET evaluations, or indicate important demands/aspects a set of standards should include.………………………………………………………………………………………………................................………………………………………………………………………………………………................................

12) Better alternative?Is there some tool/regulation which suits better than standards for evaluation to enhance/securequality of VET evaluations?………………………………………………………………………………………………................................………………………………………………………………………………………………................................

13) Basic values which should be included in regulations for VET evaluations.We are looking for basic values you would associate with ‘good’ VET evaluation. Please name oneto five attributes which are essential for VET evaluation quality.………………………………………………………………………………………………................................………………………………………………………………………………………………................................………………………………………………………………………………………………................................

NoVery Quite Know Do not

familiar familiar a little bit know

1 Joint committee standards for evaluation (USA 1994)English: http://www.eval.org/EvaluationDocuments/progeval.html � � � �

2 Guidelines for evaluators (American evaluation association, 1994) � � � �

English: http://www.eval.org/EvaluationDocuments/aeaprin6.html

3 The Means collection, European Communities, Directorate General XVI. Luxembourg, 1999. � � � �(Not available on the Internet)

4 Swiss evaluation society (2001)German: http://www.seval.ch/deutsch/stad/stad1.htm � � � �

French: http://www.seval.ch/franz/staf/staf1.html

5 German evaluation society (2001)German: http://www.degeval.de/standards/standards.htm � � � �

English: http://www.degeval.de/standards/Standards_engl.pdf

6 Best practice guidelines for evaluation of the OECD English and French: http://www.oecd.org/home [search]

� � � �

7 Other: ..........………………………………………................... � � � �

8 Other: ..........………………………………………................... � � � �


14) Address for contactPlease state your name, phone number and e-mail address, so that we can contact you, if you haveany question.………………………………………………………………………………………………................................………………………………………………………………………………………………................................

15) A second respondent you proposeMaybe you have an idea to whom else from your country or elsewhere the questionnaire should besent. If you like, please state his/her name and e-mail address.………………………………………………………………………………………………................................………………………………………………………………………………………………................................

We would like to thank you for your kind cooperation.

Wolfgang Beywl Sandra Speer

Annex 3: list of experts answering the e-mail survey

(12 out of 19 persons agreed to publish their names and addresses)

Name, surnameInstitution, city, country, e-mail Other functions in VET/evaluationFunction in VET/evaluation

Barbier Jean Claude Centre d’Études de l’Emploi Evaluation of public policiesResearch Director Noisy-le-Grand – France

[email protected]

Bjørnkilde Thomas PLS Rambøll Management A/SManager Copenhagen – Denmark

[email protected].

Field Jane Education and Development Author of Evaluating Community Consultant, specialising in Whitehead Co Antrim Development Projects; NIACE, evaluation and LLL Northern Ireland March 2003

[email protected]

Franz Hans-Werner Sozialforschungsstelle Dortmund VET, CVET, TQM, EFQM; Researcher, consultant, Landesinstitut several books and articles manager Labour-related research and advice on the subject

Dortmund – [email protected]

Hartkamp Jannes DESAN Research Solutions Main fields: VET, transition from Researcher Amsterdam – The Netherlands education to work,

[email protected] metadata standards

Kirsch Jean-Louis Centre d’études et de recherches Statistics, accompaniment Researcher sur les qualifications of actions in the field of training

Centre for research on education, and employment training and employmentMarseille – [email protected]

Nicaise Ides HIVA (Higher Institute for Labour Studies) and Dept of Education University of Leuven – [email protected]

Nurmi Johanna Finnish Ministry of Finance Secretary of the Finnish Senior Adviser Public Management Department Evaluation Society (FES)

Valtioneuvosto – [email protected]

Rouland Olivier DG Budget – Evaluation UniteAdministrator Brussels – Belgium

[email protected]

Schiefer Ulrich ISCTE – Higher Institute for Labour Board member of the and Business Studies European Evaluation Society,Lisbon – [email protected]

Smid Gerhard Interuniversity Centre for DevelopmentProgramme manager in Organisation and Change Management

Utrecht – The [email protected]

Vedung Evert Uppsala University Institute for Evaluation teacher Housing and Urban Research – IBF

Gävle – Sweden

Department of GovernmentUppsala – [email protected]

Antoni, C. H. Evaluationsforschung in der Arbeits-und Organisationspsychologie. In: Bungart, W.;Herrmann, T. (eds) Arbeits- und Organisation-spsychologie im Spannungsfeld zwischenGrundlagenorientierung und Anwendung. Bernet al.: Huber, 1993, p. 309-337.

Auer, P.; Kruppe, Th. Monitoring of labour marketpolicy in EU Member States. In: Schmid, G.et al. (eds) International handbook of labourmarket policy and evaluation. Cheltenham:Edward Elgar, 1996, p. 899-922.

Bangel, B. et al. Arbeitsmarktpolitik. In: Stock-mann, R. (ed.) Evaluationsforschung, Sozial-wissenschaftliche Evaluationsforschung 1.Opladen: Leske und Budrich, 2000, p. 309-341.

Barrett, A. Methodological Introduction. In: Elson-Rogers, S. (ed.) Approaches and obstacles tothe evaluation of investment in continuing voca-tional training: discussion and case studiesfrom six Member States of the European Union.Luxembourg: Office for Official Publications ofthe European Communities, 1998, p. 18-24(Cedefop Panorama Series, 5078).

Beywl, W. Zur Weiterentwicklung der Evaluations-methodologie. Frankfurt: Lang, 1988.

Beywl, W.; Schobert, B. Evaluation – Controlling –Qualitätsmanagement in der betrieblichenWeiterbildung: kommentierte Auswahlbiogra-phie. Bielefeld: Bertelsmann, 2000.

Beywl, W.; Taut, S. Standards: Aktuelle Strategie zurQualitätsentwicklung in der Evaluation. DIWVierteljahresheft, 2000, No 3, p. 358-370.

Beywl, W.; Widmer, Th. Die ‘Standards’ im Vergleichmit weiteren Regelwerken zur Qualität fach-licher Leistungserstellung. In: Sanders, J. R.(ed.) Handbuch der Evaluationsstandards.Opladen: Leske und Budrich, 2000, p. 259-295.

Beywl, W.; Speer, S.; Kehr, J. WirkungsorientierteEvaluation in der Armuts- und Reichtums-berichterstattung – Eine Perspektivstudie.Bonn: BMGS (ed.), 2003 (in print).

Bezzi, C. Claudio Bezzi’s statements about evalua-tion standards. In: The 2002 EES Conference –Three movements in contemporary evaluation:learning, theory and evidence. Sevilla,10-12 October 2002 (Evidence – Roundtable

EV SR 4: Do we need European evaluation standards?).

BIBB – Bundesinstitut für Berufsbildung et al. (eds)A European comparison of controlling in corpo-rate continuing training. Bielefeld: Bertelsmann,2001.

Björklund, A.; Regnér, H. Experimental evaluation ofEuropean labour market policy. In: Schmid, G.et al. (eds) International handbook of labourmarket policy and evaluation. Cheltenham:Edward Elgar, 1996.

Bortz, J.; Döring, N. Forschungsmethoden undEvaluation für Human- und Sozialwissen-schaftler (Third edition). Heidelberg: Springer,2002.

Bosewitz, R.; Kleinschroth, R. Getting through atmeetings. Business English für Konferenzenund Präsentationen. Reinbek: Rowohlt, 1997.

Brüss, K. Approaches to evaluate the effects oflabour market policies in Germany, and inparticular those co-financed by the EuropeanSocial Fund (ESF). In: Evaluation of Europeantraining, employment and human resourceprogrammes. Luxembourg: Office for OfficialPublications of the European Communities,1997, p. 115-123 (Cedefop Panorama, 5062).

Bustelo Ruesta, M. Deontolgia de la evaluación: elmodelo de los códigos éticos anglosajones.Gestión y Análisis de Politica Pública, 1998,No 11-12, p. 141-156.

Butz, M. Evaluationsverfahren in der betrieblichenWeiterbildung im IT- und TK-Bereich. In: Heinrich, L.J.; Häntschel, I. (eds) Evaluation undEvaluationsforschung in der Wirtschaftsinfor-matik. Munich; Vienna: Oldenbourg, 2000,p. 423-438.

Cedefop. Ethical and Normative Standards for Evaluation Practices. Thessaloniki, 2001(Project 0730 [MT/PDE RR3-13]).

Chen, H. Theory-driven evaluation. London: Sage,1990.

Cronbach, L. J. Designing evaluations of educa-tional and social programs. San Francisco:Jossey-Bass, 1982.

Danida – Danish Agency for Development and Aid.Evaluation Guidelines – February 1999 (2nd

References

edition, revised 2001) Available from Internet:http://www.um.dk/danida/evaluerings rapporter/eval-gui/index.asp [cited 24.10.2003].

DeGEval – Deutsche Gesellschaft für Evaluation(ed.) Standards für Evaluation. Cologne: Eigen-verlag, 2002.

EES – European Evaluation Society. The 2002 EESConference – Three movements in contemporaryevaluation: learning, theory and evidence. Sevilla,10-12 October 2002. Available from Internet:http://www.europeanevaluation.org/general/ees_conferences.htm [cited 13.10.2003].

ETF – European Training Foundation. Developmentof standards in vocational education andtraining. Luxembourg: Office for Official Publi-cations of the European Communities, 1999,Vol. 1.

European Commission. Evaluating EU expenditureprogrammes: a guide. Ex post and intermediateevaluation. Directorate-General XIX-Budgets,January 1997. Available from Internet:http://europa.eu.int/comm/budget/evalua-tion/pdf/guide_en.pdf [cited 24.10.2003].

European Commission. Guidelines for systemsof monitoring and evaluation of ESF assis-tance in the period 2000-2006. Luxembourg:Office for Official Publications of the Euro-pean Communities, 1999a. Available fromInternet: http://europa.eu.int/comm/employment_social/esf2000/guidelines/evaluation/en.pdf [cited 24.10.2003].

European Commission. The MEANS Collection.Luxembourg: Office for Official Publications ofthe European Communities, 1999b, Vol. 1–6.

European Commission. DG for Education andCulture and DG for Employment and SocialAffairs. Making a European area of lifelonglearning. Luxembourg: Office for Official Publi-cations of the European Communities, 2002.

Fay, R. G. What can we learn from evaluations ofactive labour market policies undertaken inOECD countries? The case of training. In: Eval-uation of European training, employment andhuman resource programmes. Luxembourg:Office for Official Publications of the EuropeanCommunities, 1997, p. 101-113 (CedefopPanorama, 5062).

Fetterman, D. M. Foundations of empowermentevaluation. Thousand Oaks: Sage Publications:2000.

Field, J. Promoting added value through the evalu-ation of training (PAVE) Handbook. University ofPlymouth, 1998a.

Field, J. Promoting added value through the evalu-ation of training (PAVE) Evaluation ResourcePack. University of Plymouth, 1998b.

Field, J. Promoting added value through the evalu-ation of training-PAVE. In: Künzel, K. (ed.) Eval-uation der Weiterbildung. InternationalesJahrbuch der Erwachsenenbildung. Cologne:Böhlau, 1999, Vol. 27, p. 215-229.

FES – Finnish Evaluation Society, 2002. Availablefrom Internet: http://www.finnish evaluationso-ciety.net/index_en.php [cited 24.10.2003].

Finné, S. Guidelines for the intermediate monitoringand evaluation of Structural Fund operations.In: Evaluation of European training, employ-ment and human resource programmes.Luxembourg: Office for Official Publications ofthe European Communities, 1997, p. 47-50(Cedefop Panorama, 5062).

Gaude, J. Evaluating public training and employ-ment programmes. In: Evaluation of Europeantraining, employment and human resourceprogrammes. Luxembourg: Office for OfficialPublications of the European Communities,1997, p. 51-58 (Cedefop Panorama, 5062).

Gontzou, C. Methodological aspects of Europeantraining and employment programmes. In: Eval-uation of European training, employment andhuman resource programmes. Luxembourg:Office for Official Publications of the EuropeanCommunities, 1997, p. 61-62 (CedefopPanorama, 5062).

Grubb, N. W.; Ryan, P. The roles of evaluation forvocational education and training. Geneva:International Labour Office, 1999.

Guba, Y.; Lincoln, E. Forth generation evaluation.London: Sage, 1989.

Gullickson A R. The student evaluation standards:how to improve evaluations of students.Thousand Oaks: Corwin Press, 2002.

Hägele, H. Experteninterviews in der öffentlichenVerwaltung: ausgewählte praktische Probleme.In: Brinkmann, Ch. et al. (eds) Experteninter-views in der Arbeitsmarktforschung, Beiträgezur Arbeitsmarkt- und Berufsforschung. 1995,No 191, p. 69-72.

Hale, J. Performance-based evaluation: tools andtechniques to measure the impact of training.San Francisco: Jossey-Bass 2002.


Haller, S. Beurteilung von Dienstleistungsqualität.Dynamische Betrachtung des Qualitätsurteilsim Weiterbildungsbereich. Wiesbaden: 1998.

Heckman, J. J. Micro data, heterogeneity and theevaluation of public policy: Nobel Lecture.Journal of Political Economy, 2001, Vol. 109,No 4, p. 673-748.

Heckman, J. J.; Smith, J. A. Experimental andnonexperimental evaluation. In: Schmid, G.et al. (eds) International handbook of labourmarket policy and evaluation. Cheltenham:Edward Elgar, 1996, p. 37-87.

Hendricks, M.; Conner, R. F. International perspec-tives on the guiding principles. Shadish, W. R.et al. (eds) Guiding principles for evaluators.New directions for program evaluation. SanFrancisco: Jossey-Bass, Summer 1995, No 66,p. 77-90.

Hofstede, G. Culture’s consequences: internationaldifferences in work-related values. Londonet al.: Sage Publications, 1980.

House, E. R.; Howe, K. R. Deliberative democraticevaluation. New Directions for Evaluation,2000, No 85, p. 3-12.

Hujer, R. et al. Evaluation aktiver Arbeitsmarktpolitik– Probleme und Perspektiven. MittAB, 2000,No 3, p. 341-344.

ILO – International Labour Office. Guidelines for thepreparation of independent evaluations of ILOprogrammes and projects. Last update15 November 1999. Available from Internet:http://www.ilo.org/public/english/bureau/program/guides/indpen/index.htm [cited 3.11.2003].

Jacob, St.; Varone, F. L’évaluation au concret enBelgique: méta-évaluation au niveau fédéral.Deuxième rapport intermédiaire du projet derecherche AM/10/016 par les SSTC, provisionalversion, May 2002.

James, C.; Roffe, I. The evaluation of goal andgoal-free training innovation. Journal of Euro-pean Industrial Training, 2000, Vol. 24, No 1,p. 12-20.

Jang, S. The appropriateness of Joint Committeestandards in non-western settings: a casestudy of South Korea. In: Russon, C. (ed.) Theprogram evaluation standards in internationalsettings. Kalamazoo, MI: The EvaluationCenter, 2000, p. 41-59 (Occasional PapersSeries, May 1).

Jesse, E. Typologie politischer System der Gegen-wart. In: Bundeszentrale für politische Bildung

(ed.) Grundwissen Politik. Bonn: 1993,p. 165-227.

JC – Joint Committee on Standards for EducationalEvaluation (ed.) The program evaluation stan-dards. How to assess evaluations of educa-tional programs. Thousand Oaks: Sage, 1994.

JC – Joint Committee on Standards for EducationalEvaluation (ed.) Standards for evaluation ofeducational programs, projects and materials.New York: JC, 1981.

JC – Joint Committee on Standards for EducationalEvaluation (ed.) The personnel evaluation stan-dards. Newbury Park, CA: Sage Publications,1988.

JC – Joint Committee on Standards for EducationalEvaluation (ed.) Handbuch der Evaluationsstan-dards. Opladen: Leske und Budrich, 2000.

JC – Joint Committee on Standards for EducationalEvaluation

Kaiser, F.-J. Fremdevaluation: Inwieweit sind dieErkenntnisse aus Modellversuchen inhaltlichund methodologisch für die Berufsbildungs-forschung verwendbar? In: Euler, D. (ed.) Beru-fliches Lernen im Wandel – Konsequenzen fürdie Lernorte? Beiträge zur Arbeitsmarkt- undBerufsforschung, 1998, No 214, p. 537-550.

Kellaghan, T.; Stufflebeam, D. L. (eds) Internationalhandbook of educational evaluation. Dordrecht:Kluwer, 2002.

Kirkpatrick, D. L. Evaluating training programs: thefour levels. San Francisco, CA: Berrett Koehler,1994.

Klieme,E. Bildungsstandards als Beitrag zur Qualität-sentwicklung im Schulsystem. DIPF informiert,August 2002, No 3, p. 2-6.

Kluge, F. Etymologisches Wörterbuch der deutschenSprache. Berlin/New York: de Gruyter 1999.

Knox, A. B. Evaluation of continuing education inthe USA. In: Künzel, K. (ed.) Evaluation der Weit-erbildung. Cologne: Böhlau, p. 201-213 (Inter-nationales Jahrbuch der Erwachsenenbildung,Vol. 27)

Kuffner, A. Evaluation von Nachhaltigkeitsaspekten– Nachhaltige Evaluation? Dissertation Univer-sity of Vienna, 2000 (unpublished).

Kushner, S. Personalizing evaluation. Sage: Thousand Oaks, 2000.

Leeuw, F. L. Evaluation in Europe. In: Stockmann, R.(ed.) Evaluationsforschung, Sozialwissenschaftli-che Evaluationsforschung 1. Opladen: Leske undBudrich, 2000, p. 57-76.


Levin, H. M.; McEwan, P. J. Cost-effectiveness analysis. London: Sage, 2001.

Lindley, R. M. The European Social Fund: a strategyfor generic evaluation In: Schmid, G., O’Reilly, J.;Schömann, K. (eds) International handbook oflabour market policy and evaluation. Cheltenham: Edward Elgar, 1996. p. 843-867.

Luschei, F.; Trube, A. Evaluation und Qualitätsman-agement in der Arbeitsmarktpolitik – Einigesystematische Vorüberlegungen und prak-tische Ansätze zur lokalen Umsetzung. MittAB,2000, No 3, p. 533-549.

Madaus, G. F.; Stufflebeam, D. L. Educational eval-uation: the classical writings of Ralph W. Tyler.Boston: Kluwer. 1988.

Mark, M. M.; Henry, G. T.; Julnes, G. Evaluation: anintegrated framework for understanding,guiding, and improving policies and programs.San Francisco: Jossey-Bass, 2000.

Marklund, S. Applicability of standards for evalua-tions of educational programs, projects andmaterials in an international setting. Evaluationand Program Planning, 1984, Vol. 7, p. 355-362.

Nuissl, E. Adult education and learning in Europe –Evaluation of the adult education action withinthe Socrates programme. Frankfurt a.M.: DIE –Deutsches Institut für Erwachsenenbildung,1999.

Oliva, D.; Samek Lodovici, M. Le politiche formativetra occupazione e valorizzazione delle risorseumane. Rassegna Italiana di Valutazione, 1999,No 13. Available from Internet: http://www.valu-tazioneitaliana.it/riv/rivista99/13-olivasamek.doc[cited 6.11.2003].

Owen, J. M.; Rogers, P. J. Program evaluation.London: Sage Publications, 1999.

Patton, M. Q. Utilization-focused evaluation: thenew century text. Thousand Oaks: Sage 1997.

Pawson, R.; Tilley, N. Realistic Evaluation. Thou-sand Oaks: Sage, 1997.

Peltzer, U. Formative Prozessbegleitung desForschungs- und Entwicklungsprogramms‘Lernkultur Kompetenzentwicklung’. Beru-fliche Kompetenzentwicklung Bulletin, 2002,No 2, p. 8-10.

Perret,B.; Barbier,J.-C. [2000 update]. Ethical Guide-lines, Process and Product Quality Standards,What For? An SFE (French Evaluation Society)Perspective. Paper presented at the EuropeanEvaluation Society Conference in Lausanne,

12-14 October 2000. Available from Internet:http://www.europeanevaluation.org/pdf/6-3_barbier-perret.pdf [cited 4.11.2003].

Polverari, L.; Fitzgerald, R. Integrating GenderEquality in the Evaluation of the Irish 2000-06National Development Plan, Vol. 2: Tool Kit forGender Evaluation. Glasgow: 2002.

Raabe, B. Wirkungen aktiver Arbeitsmarktpolitik.Evaluierungsergebnisse für Deutschland,Schweden, Dänemark und die Niederlande.Wissenschaftszentrum Berlin für Sozialforschung,July 2000 (Discussion Paper FS I 00-208).

Reischmann,J. Weiterbildungs-Evaluation. Neuwied,Kriftel: Luchterhand, 2003.

Rieper, O. Evaluation practice of European Struc-tural Funds. In: Evaluation of European training,employment and human resourceprogrammes. Luxembourg: Office for OfficialPublications of the European Communities,1997, p. 37-43 (Cedefop Panorama, 5062).

Rossi, P. H.; Freeman, H. E.; Lipsey, M. W. Evalua-tion a systematic approach. Thousand Oaks:Sage, 1999.

Rost, J. Allgemeine Standards für die Evaluations-forschung. Hager, W. et al. (eds) Evaluationpsychologischer Interventionsmaßnahmen.Bern: Hans Huber, 2000, p. 129-140.

Russon, C; Russon, K. (eds) The annotated bibliog-raphy of international programme evaluation.Dordrecht: Kluwer, 2000.

Sanders, J. R. Standards and Principles. In:Shadish, W. R. et al. (eds) Guiding Principles forEvaluators, New Directions for Program Evalu-ation. San Francisco: Josses-Bass, Summer1995, No 66, p. 47-52.

Sauter, E.; Schmidt, H. Training standards inGermany. The development of new vocationaleducation and training standards. Bonn:Federal Institute for Vocational Training, 2002.

Schmid, G. Process evaluation: policy formationand implementation. In: Schmid, G. et al. (eds)International handbook of labour market policyand evaluation. Cheltenham: Edward Elgar,1996, p. 198-231.

Schmid, G. et al. (eds) International handbook of labour market policy and evaluation. Cheltenham: Edward Elgar, 1996.

Schmidt, Chr. Arbeitsmarktpolitische Maßnahmenund ihre Evaluierung: Eine Bestandsaufnahme.DIW Vierteljahresheft, 2000, No 3, p. 425-437.


Schmidt, Ch. Knowing what works: the case forrigorous program evaluation. London: CEPR,2001 (CEPR Working paper).

Schmitt von Sydow, H. Report of the workinggroup ‘Evaluation and transparency. WhitePaper on European governance, Work AreaNo 2: Handling the process of producing andimplementing community rules. July 2001.Available from Internet: http://europa.eu.int/comm/governance/areas/group4/report_en.pdf [cited 3.11.2003].

Schömann, K. Longitudinal designs in evaluationstudies. In: Schmid, G. et al. (eds) Internationalhandbook of labour market policy and evaluation.Cheltenham: Edward Elgar, 1996, p. 115-142.

Sen, A. K. The standard of living. Cambridge 1987.Seyfried, E. Evaluation of quality aspects in voca-

tional training programmes. Luxembourg:Office for Official Publications of the EuropeanCommunities, 1998. (Cedefop Document,1171).

Shadish, W. R. et al. (eds) Guiding principles forevaluators, New Directions for Program Evalua-tion. San Francisco: Jossey-Bass, Summer1995, No 66.

Shadish, W. R.; Cook, T. D.; Campbell, D. T. Exper-imental and quasi-experimental designs forgeneralized causal inference. Boston:Houghton Mifflin, 2002.

Sloane, P. F. E. Das Potential von Modellversuchs-feldern für die wissenschaftliche Erkenntnis-gewinnung. In: Bentler, P. et al. (eds) Modellver-suchsforschung als Berufsbildungsforschung.Cologne: Böhlau, 1995, p. 11-44.

Smith, N. L. et al. Considerations on the develop-ment of culturally relevant evaluation stan-dards. Studies in Educational Evaluation, 1993,Vol. 19, No 1, p. 3-13.

Smith, N. L.; Jang, S. Increasing cultural sensitivityin evaluation practice. Studies in EducationalEvaluation, 2002, Vol. 28, No 1, p. 61-69.

SFE – Société Française de l´Évaluation. Projet de‘texte préliminaire pour une charte’, présenté aucolloque SFE, June 2001 (internal document,unpublished).

Speer, S. Evaluation und Benchmarking – Ein Metho-denvergleich am Beispiel der betrieblichenPersonalarbeit. In: Geise, W. (ed.) ÖkonomischeBildung zur Bewältigung von Lebenssituationen.Bergisch-Gladbach: Hobein, 2001, p. 51-67.

Stake, R. E. The art of case study research. Sage:Thousand Oaks, 1995.

Stockdill, S H. Evaluation standards. A study of theirappropriateness in business and education.University of Minnesota: DAI – DissertationAbstracts International, 1986 (47-08 A:2967).

Stufflebeam, D. L. Standards of practice for evalua-tors. Lecture at the annual conference of theAmerican Educational Research Association,San Francisco, 1986.

Stufflebeam, D. L. Evaluation Models. In: NewDirections for Evaluation. San Francisco:Jossey-Bass, Spring 2001, No 89.

Stufflebeam, D. L. et al. Educational evaluation anddecision making. Itasca, Ill.: Peacock, 1971.

Taut, S. Cross-cultural transferability of theprogram evaluation standards. In: Russon, C.(ed.) The program evaluation standards in inter-national settings. Kalamazoo, MI: The Evalua-tion Center, 2000, p. 5-27 (Occasional PapersSeries, May 1).

Tenberg, R. Selbstevaluation des ModellversuchsFächerübergreifender Unterricht in der Beruf-sschule durch den Lehrstuhl für Pädagogik derTechnischen Universität München. In: Euler, D.(ed.) Berufliches Lernen im Wandel – Konse-quenzen für die Lernorte? Beiträge zur Arbeits-markt- und Berufsforschung, 1998, No 214,p. 527-535.

Toulemonde, J. Evaluation culture(s) in Europe:differences and convergence between nationalpractices. DIW Vierteljahresheft, 2000, No 3,p. 350-357.

Tremea – Training Effectiveness Measurement.Tremea Handbook. A Guide for EvaluatingTraining Programmes. 2002. Available fromInternet: www.tremea.gr [cited 3.11.2003]

Turbin, J. Policy borrowing: lessons from Europeanattempts to transfer training practices.Leicester: Centre for Labour Market Studies,2000 (CLMS Working Paper No 27)

UKES – United Kingdom Evaluation Society.Guidelines for good practice in evaluation.Available from Internet: http://www.evalua-tion.org.uk/ukes_new/Pub_library/Guidance-Good Practice.doc [cited October 2002].

Uusikyla, P; Virtanen, P. Meta-evaluation as a toolfor learning: a case study of the European struc-tural fund evaluations in Finland. Evaluation,2000, Vol. 6, Issue 1, p. 50-65.


Vedung, E. Evaluation im öffentlichen Sektor.Vienna: Böhlau, 1999.

Villa,A. Estándares para la Evaluación de Programas.Bilbao: Ediciones Mensajero, 1998.

Villa, A. Estándares de Evaluación de Personal.Bilbao: Ediciones Mensajero, 1999.

Webster’s encyclopedic unabridged dictionary ofthe English language. New York: Thunder BayPress, 1989.

Weiß, R. Methoden und Faktoren der Erfolgsmes-sung in der betrieblichen Weiterbildung. GdWZ,1997, Vol. 3, p. 104-108.

Widmer, Th. Meta-Evaluation: Kriterien zur Bewer-tung von Evaluationen. Bern: Haupt, 1996.

Widmer, Th. Kontext, Inhalt und Funktion der ‘Stan-dards’ für die Evaluation von Programmen. In:Müller-Kohlenberg, H.; Münstermann, K. (eds)Qualität von Humandienstleistungen. Opladen:Leske und Budrich, 2000, p. 77-88.

Widmer, Th. Instruments and procedures forassuring evaluation quality: a Swiss perspec-tive. In: Schwartz, R. Mayne, J. Toulemonde, J.(eds) Assuring the quality of evaluative informa-tion: prospects and pitfalls. New Brunswick:Transaction, 2003 (forthcoming).

Widmer, Th.; Beywl, W. Die Übertragbarkeit derEvaluationsstandards auf unterschiedlicheAnwendungsfelder. In: JC; Sanders, J. R. (eds)Handbuch der Evaluationsstandards, Opladen:Leske und Budrich, 2000, p. 243-257.

Widmer, T.; Landert, Ch.; Bachmann, N. Evalua-tions standards. Genève: SEVAL – Swiss Eval-uation Society, December 2000. Available

from Internet: http://www.seval.ch/en/ docu-ments/SEVAL_Standards_2000_en.pdf [cited4.11.2003].

Wingens, M.; Sackmann, R. EvaluationAFG-finanzierter Weiterbildung. MittAB, 2000,No 1, p. 39-53.

Wollmann, H. Policy knowledge and contractualresearch. International Encyclopedia of Socialand Behavioral Sciences, 2002, Vol. 5.4 (forth-coming).

Wottawa, H. Evaluation in der betrieblichenBildung. In: Künzel, K. (ed.) Evaluation der Weit-erbildung. Cologne: Böhlau, 1999, p. 105-116.(Internationales Jahrbuch der Erwachsenenbil-dung, Vol. 27)

Wottawa, H.; Thierau, H. Lehrbuch evaluation.Bern: Hans Huber, 1998.

Zimmer, G. Durch Modellversuche zu Erkenntnis-gewinn und Praxisinnovation? Zur Positions-,Funktions- und Interessenbestimmung derwissenschaftlichen Begleitforschung. In:Euler, D. (ed.) Berufliches Lernen im Wandel –Konsequenzen für die Lernorte? Beiträge zurArbeitsmarkt- und Berufsforschung, 1998,No 214, p. 596-607.

Zorzi, R. et al. Canadian evaluation society projectin support of advocacy and professionaldevelopment. Evaluation benefits, outputs,and knowledge elements. Toronto: Zorzi andAssociates, October 2002. Available fromInternet: http://consultation.evaluationcanada.ca/pdf/ZorziCESReportDuplex.pdf [cited04.11.2003].


Date post:	12-Apr-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times