Evaluation Methods

Evaluation Methods

Research & Project Methods SECC504

Professor Julian Newman15/10/08

Why should we accept your proposal?

Bologna Process and CPHC/QAA Benchmarking Standards

An MSc Project/Dissertation should demonstrate:– a sound justification for the approach adopted– self-critical evaluation of effectiveness– sense of vision about the direction of developments in aspects of

the disciplineMasters Ethos requires you to:• be acquainted with the newest theories, methods and techniques in

your specialised field• have sufficient competence in techniques of independent research

and be able to interpret the results at an advanced level• demonstrate the ability to apply the principles and practices of the

discipline in making an original contribution by tackling a significant technical problem

Summarised from CPHC(2008) Benchmarking Standards for Masters Degrees

Why should we support you? (1)

• Crane: Invisible Colleges– Influential scientists & engineers determine

• What is worth pursuing• What was successful• “Published papers are not for information”

– An exaggerated view – with a grain of truth– Working scientists use pre-print servers etc

• Whitely: Results are valued if other researchers can build upon them.

Why should we support you? (2)

• Latour (School of Mining, Paris): there are “Internal” and “External” scientists– Internal Scientists do the lab work– External Scientists

• promote importance of the research area• mobilise resources

– “An isolated scientist cannot even create a controversy”

– To count as “research” your work must relate to a recognisable “research programme”

• Foucault: Power resides in Discourse

Why should we pay attention to your results? – relevance and reliability

• Parnas: There are different research Paradigms followed in Science and in Engineering.

• In Science, the problems are taken from the literature (i.e. from other researchers)

• In Engineering, the problems are taken from practitioners (i.e. real-world problems)

• Software Engineers too often take problems from Literature, like Scientists, ignoring real-world problems.

• Hence software developers ignore the research literature.

• Literature search should not be the only source of problems – look also at problems of existing technology.

Evaluation in Scientific and Technological Discourse

• Hume’s problem of Induction: we cannot logically reason from fact (specific) to theory (general).

• So how do we persuade scientific or professional community of the importance of our results?

• This is not just a philosophical issue, it is one of practical importance.

• Evaluation is essential in mobilising resources and in “selling” your results.

Jamie Fleck’s “Credibility Cycle”(adapted from Bruno Latour)

Discourse

Resources

Effectiveness

Generation of ideas

Needed to test out the ideas

Show that the ideas work

Example from the History of Artificial Intelligence in 1970s

• AI pioneers made big claims for what could be achieved and about timescale.

– “Computers will soon have an IQ of 120 and then we shall have to give them the vote.”

• UK Government commissioned Sir James Lighthill (a Physicist) to report on whether it was worth continuing to fund AI.

• Lighthill said only Robotics was worthwhile.

Lighthill blocks AI’s Credibilitythus denying resources

Discourse

Resources

Effectiveness

X

Without resources, it is hard to prove effectiveness of ideas

Ideas and motivation

Lighthill: “Most AI is not worthwhile, except robotics”

Govt allocates inadequate resources to AI

McCarthy’s Review of Lighthill

• AI community criticised Lighthill report as based on misunderstanding of what AI is about.

• John McCarthy, US ‘father of AI’ also criticised Lighthill, but said that the AI community itself was partly to blame.

• Too much published AI research suffered from the “Look Ma, No hands!” disease.– Described what the program did, and pointed out no

program had done it before.– But did not elucidate any general lessons,

principles or insights.

The Credibility Cycle of an MSc

Discourse

Resources

Effectiveness

Your ProposalLiterature Search & Technology Review

Evaluation of Results

Lab Access

Supervision

etc

Your Dissertation

Assessment

Timing of Evaluations

• Designers often distinguish “Early” and “Late” Evaluations. – Early evaluation is often “Predictive Evaluation”, i.e. based on

model or theory plus either• Previous published results• Simulation

– Late evaluation is likely to be in a Usability Lab.

• Accountants distinguish “Ex Ante” and “Ex Post” Evaluations (before and after investment).

• Scriven introduced distinction between – “Formative” Evaluation (during development, helping to improve

artefact or system) and – “Summative” Evaluation (of completed end-product).

Evaluation of a Project• Beforehand: Proposal Evaluation (e.g. for funding or registration)• Evaluation of a Product or Prototype

– ‘Early evaluation’ of proposed design– Usability labs– Stakeholder evaluation (a wider group than “users”)

• Investigators’ evaluation of the answers to Research Issues (possibly Hypotheses)– Critical self-analysis of the work done– Statistical tests– Design rationale

• Evaluation of Project– Evaluation of Outputs (e.g. by journal referees)– Evaluation of Project as a Whole (e.g. MSc assessment,

Research Council IGR, EC Project Review etc – similar reviews of industrial projects take place within companies)

Evaluation of a Masters’ Proposal (1)

A Proposal will be evaluated with respect to the following classes of issue:-

• Logistical and Practical Issues– Is it possible?

• Methodological and General Issues– Does it make sense?– Is it worthwhile?

Evaluation of a Masters’ Proposal (2)Logistical and Practical Issues

Is the proposed work realistic for a 4 person-month project?Is the Plan based on a Work Breakdown that relates

Activities to the Aim and Objectives?Is the Plan sufficiently detailed for progress monitoring?Does the Plan allow enough time for initial Literature

Search and for final Writing-up of the Dissertation?Has the student provide a clear assessment of the major

Risks in the project and strategy for managing these?Has access been assured for all Resources required?Have Ethical issues been correctly addressed? (see form

EC5)

Evaluation of a Masters’ Proposal (3)Methodological and General Issues

Is the Topic suitable for the Programme on which the student is enrolled?

Does the proposal clearly state the Aim and Objectives?Is the Scope of the work clearly defined?Is the proposed work sufficiently novel and rigorous for

MSc/MA?Does the proposal justify the importance of the problem?Is the intended Method clearly described?Does the Method specify an appropriate practical element?Is the Method logically adequate to the stated Aim and

Objectives?

“Is the Method Logically Adequate to the Stated Aim?”

• We previously – Noted that the Hypothetico-Deductive method was ONE

approach to analysing RESEARCH DISCOURSE;– Identified some alternatives to Hypothesis-testing:

• Comparison of techniques through toy implementation• Development to meet real-world requirements• Experiments to establish parameters• Observation for understanding in Usability Labs or in Field• Case Study of a particular situation• Detailed Ethnographic study of work practices

• All these pose problem: how to justify generalisations?• Answer to this may lie in the idea of “Design Science” or

“Sciences of the Artificial”.

Design research and the “reflective practitioner”.

• ‘Design Studies’ aims to establish Design as separate from the Humanities & Sciences

• What is the form of generalisation that constitutes design knowledge?

• When is ‘design’ ‘design research’?– Schon: Reflective practitioner– Simon: Sciences of the Artificial– Maclean: Design space analysis– Sutcliffe, Carroll: Claims Analysis

Reflective Practitioner

• Donald Schon advanced the notion that education for practical professions should be based on “reflection”.

• The research component consists of systematically analysing and reflecting upon design decisions and implicit design principles.

• Keeping a Design Journal can be an important aid to being a reflective practitioner.

Reflection and the MSc Dissertation

• Systematic reflection, especially keeping a Design Journal, will assist in evaluative assessment of your own work within the Dissertation.

• Systematic reflection supported by a Design Journal will also link to Personal Development Planning, and provide material that can be used in improving CV, job applications, etc.

• However, the Dissertation itself is not a piece of personal autobiography: thus evaluation within the Dissertation needs to be framed in an academic and professional register.

Sciences of the artificial(H A Simon)

• "Artificial" systems: have a given form and behaviour only because they are:

• ADAPTED TO ENVIRONMENT IN REFERENCE TO GOALS

• Analogy between:-– EVOLVED SYSTEMS– DESIGNED SYSTEMS


• Principles: – Stable intermediate forms speed the

development of complex systems– Human decision-making is conditioned by

limited capacity for handling information


• MAIN CHARACTERISTICS OF DESIGN (H A Simon, as developed by William Newman, John Long, Allan Maclean …)– Inner environment: i.e. Technology that

Designer selects as means of bringing about Change

– Outer environment: in which change is to be produced

– Inner environment acts on outer environment across an Interface


• MAIN CHARACTERISTICS OF DESIGN (CONTD)– Interface protects outer environment from

internal complexities of inner environment– Designer needs to model both Inner and

Outer environment– Designer needs to subdivide problem into

manageable parts– Simulation can show how complex system is

likely to behave


• POSSIBLE IMPLICATIONS– Is your Dissertation aiming to model the Inner

or Outer environment?• A Development project might focus on Inner

Environment (Technology)• An Experimental or Case Study dissertation might

focus on Outer environment (Application Context)• A Usability study might focus on the Interface• Technology Interfaces can also be studied

– Is proposed Dissertation a ‘manageable part’ of a larger problem?

Evaluation Within a Project

• BASELINE is an Initial Evaluation of– State of the Art (Technology Assessment)– Current theory and practice

• Evaluation of Design Alternatives– Important to identify the reasoning that

underlies your design decisions– Avoid the “Look Ma, no hands!” disease

• Evaluation of Results– E.g. Usability Lab, Hypothesis Testing, etc etc

Evaluation of Lessons Learned

• Showing generality of results

• Design as search in a problem space• Decision space• Evaluation space

• Problems of evaluation in real world– Large Complex (usually Distributed) Systems– Organisational factors affecting real-world

evaluation

In an Experimental project, how do we show generality of results?

Experimental report (body)• General

• Specific

• General


Experimental report (body)• Introduction• Hypothesis (proposed

generality)



generality)• Method (specifics)• Results



generality)• Method (specifics)• Results• Statistical test (assessed

generality)• Conclusions & Discussion

In a Developmental project, how do we show generality of results?

Report (body)• Introduction• Problem• Method/Design

Principles• Results• What goes here?• Conclusions &

Discussion

Design Rationale can take place of a formal hypothesis test

Report body• Introduction• Problem• Method• Results• Design Rationale

(relate design decisions to general principles)

• Conclusions & Discussion

Date post:	19-Jan-2016
Category:	Documents
Upload:	quasim
View:	22 times
Download:	0 times

Evaluation Methods

Documents