CAL evaluation: A cautionary word

Comput. Educ. Vol. 8, No. 1, pp. 77-84, 1984 0360-1315/8453.00+0.00 Printed in Great Britain Pergamon Press Ltd

C A L E V A L U A T I O N : A C A U T I O N A R Y W O R D

MARILYN E. KIDD ~ a n d GLYN HOLMES 2 ~Huron College, 1349 Western Road, London, Ontario, Canada N6G 1H3 and 2Department of French,

University of Western Ontario, London, Ontario, Canada N6A 3K7

A b s t r a c t ~ e a c h e r s , school administrators and funding agencies are often reluctant to support the acquisition of a CAL facility unless some sort of statistical evidence can be brought forth that will prove the desirability of CAL. There appears to be a widespread belief that statistically tabulated data is the ultimate form of evaluation and an infallible indicator of success or failure. Although statistical evidence should not be overlooked, it is necessary to place it in a more realistic perspective.

Through the consideration of a representative discipline, second language learning, the following paper discusses a variety of statistical evaluation types in order to indicate their advantages and their limitations. The paper suggests that other forms of evaluation previously descredited because they do not conform to the norms of scientific measurement are nonetheless valid and reliable--evaluation based on informed opinion, personal judgement and observations formed as a result of direct experience. Although statistical evaluations may be useful to the prospective implementor as a rough indicator of the potential of the computer medium, they cannot guarantee the success, nor forecast the failure of a particular CAL application. Each CAL application is unique unto itself and involves a host of local variables, hardware, courseware, place in curriculum, nature of users, attitude of faculty and so on. Any decision concerning the desirability of CAL should therefore not rely on blind faith in a set of figures but on personal judgement aided by an awareness of local conditions, a good knowledge of the medium and one's own pedagogical expertise.

THE T R E N D TO EMPIRICISM

The past several years have witnessed not only a growing interest in computer-assisted learning (CAL) but also a very definite upsurge in the number of educators who are in the process of considering implementation of a computer facility. However, before they become involved they would like to have empirical evidence in the form of statistical results that will prove the desirability of CAL. Due to our expertise in the field of CAL we have received, and continue to receive, a number of requests on the part of would-be implementors to furnish them with statistical d a t a - -o f any description--that would indicate the effectiveness of CAL. The desire for statistical evidence and the faith in its reliability is also apparent amongst the members of funding agencies and school administrators who are loathe to become involved in CAL or to approve the allocation of funds for this new type of educational technology unless such statistical data can be provided. There appears to be a widespread belief that statistically tabulated data is the ultimate form of evaluation and that decisions as to whether to implement or not should be made on the basis of its findings. Even the detractors of CAL cite statistical criteria as a basis for their arguments. A colleague at the University of Western Ontario strongly opposed the installation of a CAL facility on the grounds that it had not yet been shown "in even a semi-scientific manner" what contribution CAL could make to the study of languages--the suggestion being that scientific proof constitutes the only reliable form of evaluation. The dependence on statistical data, the belief in the precision and infallibility of numbers is quite pervasive among educators today. Since the earlier part of the century when scientific methods were introduced into the realm of education a trend has developed towards an increasing reliance on this one form of evaluation. The dangers of such an approach to education have been indicated recently by at least one educator, Elliot W. Eisner, in his book The Educational Imagination [1]. Commenting on the growth of scientism in education promoted by such eminent figures as Edward L. Thorndike and John Dewey, Eisner states that it has led to the present-day educational researcher who resembles nothing so much as a scientist. As he points out, and quite rightly, the validation of the scientific method has been at the expense of other forms of inquiry. Eisner criticizes this attitude when he states with heavy sarcasm that to do historical or critical analysis, to engage in philosophical inquiry is not to do research. To count is somehow better, perhaps, because counting or measuring yields numbers that can be carried to the third or fourth decimal place and hence provide the illusion of precision. Eisner does not wish,

77

78 MARILYN E. KIDD and GLYN HOLMES

and neither do we, to totally reject statistical evidence. What one is trying to do is to redress the balance and view it in a more realistic perspective. Statistical evidence is not the only form of evaluation, nor is it necessarily the best or most accurate form. It reveals several limitations which restrict its reliability. Other forms of evaluation based on informed opinion, personal judgement or intuition can be very useful indicators and should not be discounted.

In the following paper, through the consideration of material available for a representative discipline, in this case second language learning, we shall discuss a variety of different types of evaluation in order to indicate their particular advantages and limitations.

S T A T I S T I C A L ANALYSES

The prospective implementor in search of statistical data proving the desirability and effectiveness of CAL in the field of second language learning (and in many other disciplines) will find that relatively few rigorous statistical analyses are at his disposal. His first problem is, therefore, that there is a dearth of data of this kind available. In addition, results of such studies are not always easily consulted, often being published in progress reports of limited circulation. The types of statistical study that are at hand fall into three categories. First there is the full-scale experiment using a control group in order to test the effectiveness of CAL. Then there is the more localized study to test a specific aspect of student performance. Finally there is the study designed to evaluate student reaction.

Statistical evaluation with control group

Information on studies of this type is quite limited in academic journals. Such evaluations are aimed at determining the effectiveness of the computer medium relative to traditional classroom instruction. Two groups are set up for the purposes of comparison one using CAL, the other receiving no CAL. In a study conducted at SUNY (State University of New York) [2] a group of randomly selected students in elementary German received their grammar instruction and language lab training via a computer equipped with audio and slide projector peripherals. Although students had some classroom instruction the CAL facility provided all grammar practice and also tested the student in translation, aural comprehension, and dictation. At the end of the experiment students from the control group who had not been using CAL and the CAL group both took the Modern Language Aptitude Test covering the four skills of reading, writing, speaking and listening. The statistical results indicated that the CAL group was better in reading and writing but weaker in speaking and listening comprehension although not significantly weaker (Fig. 1). In another control group study performed at Stanford University[3] a group of volunteer students were given no classroom instruction whatsoever the latter being replaced by 5 hours of CAL a week. They did, however, maintain their traditional language lab sessions. Performance was compared with that of the control group on the basis of a series of in-house term tests consisting of the translation of sentences from English to Russian. The results indicated that CAL students performed as well or better than the non-CAL group (Fig. 2). Although the statistical results in both studies point to the success of CAL, due to the lack of similarities between the two experiments in terms of the courseware used, the mode of implementation, the skills developed, the areas of performance tested and the means of testing, these results remain inextricably wed to the specific set of conditions which produced them. It would ultimately be erroneous to take these statistics as proof that computer-assisted learning in second languages will meet with the same success in another institution under another set of variables.

Localized evaluation

Other less rigorous studies of a statistical nature would seem to raise greater problems as to the significance of their results. In an experiment conducted at the University of Alberta[4] a program was devised to help drill students in elementary French grammar. Statistics were tabulated concerning the time it took students to complete the program (Fig. 3). Although the results seemed to indicate that students took less time than that which would normally be required in the classroom, if one looks behind the statistical results one discovers that the study was done in the absence of any control group. Whereas the program was devised for independent use, most students

CAL evaluation: a cautionary word 79

I00

2

//

r- J

i /

t

IO0

O Percenti le rank in group IOO 0

Reading Achievement

"/ f /

CAI

. . . . ALM

Percentile rank in group

Writing Achievement

I I00

ioo F [__ ___ , ~ IO0

,," / - ' /

/ / / /

0 /

I / I o Percentile rank in group IOO 0

Speaking Achievement

I

/

/

/ / - k C A I

.... ALM

I IOO Percentile rank in group

Listening Achievement

Fig, 1. Results of CAL experiment in German at SUNY. CAI = Computer assisted instruction; ALM = audio-lingual method.

were doing the program in addition to their normal classroom work. As the author herself states, these data should be viewed with caution. Statistical results are only as reliable as the quality of the measurement tool and should not be taken at face value. Furthermore, experiments which attempt to measure only one aspect of a CAL program run the risk of producing a distorted picture. The focusing of attention on a single element of what is essentially a multi-faceted experience usually precludes the observation of the influence of other aspects that are not being measured. Because any CAL program is composed of a complex assemblage of elements, only one of which is the medium itself, it is very difficult to ascertain which is the determining factor that is causing the particular results.

Attitude evaluation

The prospective implementor will also come across statistical studies pertaining to student reaction[5]. Such information provides a very convenient format that can be perused rapidly in order to obtain a general impression concerning student attitude. The evaluation also permits a researcher to take a sounding of student reaction in areas that he feels are pertinent to learning with a computer, areas such as attitude towards the medium, lesson quality, skills acquired. What this type of evaluation does not do is allow the student to identify elements that he himself has perceived as significant. In addition, because his answer is expressed in numerical terms or as a choice from already formulated answers (Figs 4 and 5) the student is unable to qualify his response. For any of us who have undergone student evaluations of the numerical kind we are well aware that this form of expression lacks the subtlety and precision of a verbal response. In attempting

Tabl

e 12

-14.

E

rro

r D

istr

ibu

tio

n

for

the

Com

mon

Po

rtio

n

of

the

Aut

umn

Qua

rter

F

inal

E

xam

inat

ion

- R

ussi

an

Pro

qram

Num

ber

of

Stu

dent

s

Tab

le

12-1

5,

Err

or

Dis

trib

uti

on

fo

r th

e T

able

12

-18.

C

omm

on P

ort

ion

of

th

e W

inte

r Q

uart

er

Fin

al

Exa

min

atio

n -

Rus

sian

P

rogr

am N

umbe

r o

f S

tude

nts

Err

or

Dis

trib

uti

on

fo

r th

e S

~ri

ng

Qu

art

er

Fin

al

Exa

min

atio

n -

Rus

sian

P

rogr

am

Num

ber

of

Stu

dent

s

Num

ber

of

err

ors

C

ompu

ter-

Bas

ed

Reg

ular

N

umbe

r o

f e

rro

rs

Com

pute

r-B

ased

R

egul

ar

Num

ber

of

err

ors

C

ompu

ter-

Bas

ed

Reg

ular

3.5

l 2

l l

21.5

l

5 2

] 6

l 24

.5

l 6

3 6.

5 l

26

l 7

l 8

l 27

l

8 2

9.5

l 31

.5

7 9

3 lO

1

32

] ]l

3

ll

1 34

1

13

l 12

2

35

l 15

]

13

l 37

]

l 16

l

1 14

.5

l 39

1

17

2 16

1

40

l 19

l

16.5

l

41

1 21

2

1 18

l

42

l 22

l

l 18

.5

1 45

l

23

2 19

l

46

I 25

l

l 19

.5

l 47

.5

l 27

3

2]

2 50

.5

l 29

1

22.5

1

l 51

.5

1 30

l

23

1 l

60

1 1

31

2 23

.5

1 61

1

33

1 24

]

63.5

]

34

1 24

.5

1 67

1

37

l 25

1

69

1 3L

~ 1

26.5

l

60.5

l

41

l 27

1

73

1 43

1

29.5

l

74.5

2

l 45

l

30

1 76

.5

l 53

l

30,5

l

80.5

1

67

l 32

.5

1 81

l

64

l 33

l

82

1 65

l

37.5

?

89

l 72

1

38

1 91

l

76

1 3g

.5

1 92

1

79

] 41

1

2 93

1

93

1 47

.5

1 10

6 l

97

1 16

6 1

120

7 14

1 l

Tot

al

num

ber

of

stud

ents

29

4 28

b T

ota]

nu

mbe

r of

st

uden

ts

274

18b

Tot

al

num

ber

of

stud

ents

24

4 16

b A

vera

ge

num

ber

of

err

ors

15

.8

49.0

D

vera

ge

nuiY

lber

of

e

rro

rs

21.8

24

.2

Ave

rage

nu

mbe

r of

e

rro

rs

53.0

71

.I

a O

f th

e

thir

ty

stu

de

nts

e

nro

lle

d,

one

]eft

du

ring

th

e q

ua

rte

r.

b O

f th

e th

irty

-eig

ht

en

roll

ed

, te

n le

ft

duri

ng

the

qu

art

er.

a T

hree

o

f th

e o

rig

ina

l st

uden

ts

did

n

ot

en

roll

, tw

o ne

w

stud

ents

w

ere

adde

d an

d on

e st

ud

en

t di

d no

t fi

nis

h

the

ou

art

er.

b

Th

irte

en

o

f th

e o

rig

ina

l st

uden

ts

did

not

en

roll

.

a T

hre

e

stu

de

nts

di

d n

ot

en

roll

. b

Thr

ee

stud

ents

di

d no

t e

nro

ll,

thre

e st

uden

ts

en

roll

ed

fo

r th

e fi

rst

tim

e,

and

one

stud

ent

tra

nsf

err

ed

fr

om

the

com

pute

r-ba

sed

sect

ion

to

th

e re

gu

lar

sect

ion

.

F z m'l

r~

,< ?

Fig

. 2.

R

esul

ts

of

CA

L

expe

rim

ent

in

Rus

sian

at

S

tanf

ord

Uni

vers

ity.


Means, Standard Dev ia t ions , and Ranges o f the Number of Minutes Taken to Complete One Uni t o f I n s t r u c t i o n

by the Four Student Groups

Group N llean S.D. Range

Elementary 16 82.56 24.04 51.11 - 134.25

Jun io r High 16 51.79 10.07 36.84 - 72.90

Senior High 14 37.72 7.97 25.34 - 56.00

Adu l t 12 33.28 I I . 2 7 20,30 - 65.62

Table I

Means, Standard Dev ia t ions , and Ranges o f the Pro jected Number of Hours Required to Complete the Program

by the Four Student Groups

Group N Mean S,D. Range

Elementary 16 41.28 12.02 25.55 - 67.12

Jun io r High 16 25.89 5.03 18.42 - 36.45

Senior High 14 18.86 3.98 12.67 - 28.00

Adul t 12 16.64 5.63 10.15 - 32.81

Fig. 3. Results of CAL experiment in French at the University of Alberta.

to reduce something essentially very complex to a unidimensional symbol one must simplify and this process leads to the elimination of many meaningful nuances.

We hope to have shown that statistically tabulated evaluation, although useful, does have its limitations. It is not the unerring, irrefutable, indicator that some would have us believe. Its basis resides in a narrowly circumscribed set of variables, and its values are not absolute but relative to the manner in which they were obtained. Their reliability will vary directly in proportion to the quality of the measurement tool and the conditions under which this measurement took place. Under optimum conditions statistically tabulated evaluations of a CAL project can give some indication as to the possible potential of the medium but they can give no guarantee that this potential will be realized in any application other than the one in question.

OTHER FORMS OF E V A L U A T I O N

If statistically tabulated evaluation is placed in what we believe to be a more balanced perspective other forms of evaluation will emerge as valid--forms that in recent years have often been discredited merely because they do not conform to the norms of scientific measurement: evaluation based on informed opinion, personal judgement and observation based on direct experience.

Evaluation based on informed opinion

The opinions of teachers and students who have had direct experience with CAL can offer the prospective implementor valuable insights into this form of educational technology. An example of such non-scientific analysis is apparent in the comments of educators such as Fernand Marry of the University of Illinois[6] made in reference to his PLATO program for elementary French. He states that one of the basic advantages for the student using computerized materials is that he achieves higher levels of concentration for longer periods of time with a resulting increase in retention. Although his evaluation has no statistical basis that does not mean that it must therefore be unreliable. In the case of Fernand Marty it is founded on insights gained in 35 years of teaching, 6 years of working with classes using CAL, and numerous discussions with students who have used

82 MARILYN E. KIDD and GLYN HOLMES

TABLE I

Statements about FRELEM

l Strongly Agree; 2 - S l i g h t l y Agree; 3 S l i g h t l y Disagree; 4 ~ t rong ly Disagree. N - Number o f students responding to item,

Percentabe o f those answering the quest ion who marked each response.

I , Using FRELEM is f r i g h t e n i n g .

I , 2. 3. 4. Mean

Sem. ] N 89 l 4 15 80 3.74 Sem. 2 N = I09 l 4 lO 86 3.83

2. Using FRELEM is boring.

I . 2, 3. 4. Mean

Sem, ] N 91 2 17 41 40 3.19 Sem. 2 N - l l 3 2 17 39 4l 3.17

3, Using FRELEH is fun.

] . 2. 3. 4. Mean

Sem, l N - 93 33 54 14 0 l.R3 Sem. 2 N - I l l 23 6 3 9 4 1.92

S. The best items in FRELEM are t h e . . .

(1) scrambled sentences (2) t r a n s l a t i o n s (3) t ransformat ions (4) f i l l in blanks (5) a l l equal (6) don ' t know

I . 2. 3. 4. 5, 6.

Sem. l N 94 3 ]5 9 2] 48 4 Sem. 2 N - l l 4 4 9 14 25 39 lO

9. I learned fundamental p r i nc ip les or theor ies .

~6. 5. 4, 3. 2. 1. Mean "6 = stronm]~ aaree

Sem. l N = 93 2 5 32 3l 20 l 1 4.67 I - s t rong ly disagree Sem. 2 N =I09 1 3 31 3 7 12 4 4 4.27

lO. l acquired a basic understanding o f the subject area,

6, 5. 4. 3. 2. I . Hean

Sem. l N = 93 25' 42 30 3' 0 0 4.88 Sem. 2 N - I l l 18 48 25 6 2 l 4.71

TABLE 2

AGREE DISAGREE

I . I became more in te res ted in the subject .

6. 5. 4, 3. 2. I . Mean

Sem. l N -95 2 2 29 41 4 2 I 4.62 Sem. 2 N = I ] 4 I f ' 28 45 lO 3 2 4.28

2. [ was motivated to do work beyond minimunl requirements.

6. 5. 4. 3. 2, l , Mean

Sem. l N =94 18] 33 33 15 0 l 4.51 Sem. 2 N = l l 2 12 32' 34 18 l 4 4.25

3, Concepts were presented in a manner tha t aided nly learn ing.

6. 5. 4. 3, 2. l . Mean Sem. I N =94 40 33 20 5 O' l 5.05 Sem. 2 N= ]13 33 3 5 23' 5 3 l 4.88

4. Hy work was evaluated in ways tha t were meaningful to me.

6. 5. 4, 3. 2. I . Mean Sere. I N=93 2 2 34' 32 g l l 4.66 Sere. 2 N=108 13 39' 32 12 2 l 4,44

Fig. 4. Sample of student evaluation of C A t at University of Iowa.

CAL. In addition, observations culled from direct experience with the medium can provide a wealth of detail that is not found in statistical compilations. Furthermore, such evidence is usually quite abundant within the confines of a specific discipline and covers a broad spectrum of topics including the effectiveness of various types of implementation (mainline, adjunctive), the aspects of language learning best suited to the medium, the use of peripherals, effective design of courseware, as well as student performance and reaction. For the would-be implementor this type of evaluation made by a seasoned practitioner can prove invaluable in assisting him to assess the desirability of the medium for his own institution.


How sa t i s f i ed were you with what you learned from the CAI course?

Would you say that the computer was l ike a private tutor?

How would you compare the effectiveness of CAI as opposed to exer- cises done in class or at home?

Did the grammar expla- nation at the beginning of each section rein- force what you learned in class?

What was your overal l reaction to the res- ponses from the computer when you made a mistake?

very very much much so-so l i t t l e

17 (11) 47 (41) 30 (40) 6 (8)

strongly agree agree disagree

5 (15) 65 (67) 23 (13)

strongly disagree

0 (2)

much better better same worse

16 (12) 39 (37) 32 (40) II (14)

was better reinforced confused me

29 (7) 67 (86) 2 (7)

helped recognize reinforced guided me to what was wrong grammar rules correct answer

35 (28) 28 (30) 33 (31)

useless

4 (o)

The figures represent the results for each of two classes expressed in terms of percentage.

Fig. 5. Sample of student evaluation of CAL at Ohio State University.

In the same way, appraisals of a CAL facility made by its users, the students themselves, can be very useful and revealing. Student users at the University of California, Riverside [7] were asked to identify what they considered to be the advantages and disadvantages of the CAL facility for French. A sample reply identified the advantages as being that you study only what you are weak in, each exercise is different and the computer provides instant correction. On the negative side the student complained of a lack of accent characters and felt it would be advantageous to have a program that tested all the verb tenses. This type of appraisal makes no assumptions but allows the student to reveal what he has found to be beneficial or deficient in the system.

Naturally, when one is dealing with subjective evaluation there is always the risk of human error or personal bias. Yet, there is equally the potential for keen perception and dead accurate analysis. Above all it offers the possibility of evaluation over a wide range of areas that do not yield very readily to statistical analysis.

Personal .judgement Although a good number of studies of CAL projects express some form of evaluation either

verbal or statistical, there is much literature available that is purely descriptive in nature. Not only are no statistical data presented, the practitioner does not feel inclined to make any personal assessment. He is content merely to provide detailed information about various aspects of his particular facility, the hardware configuration used, the method of implementation, the elements of his courseware design, lesson content, program logic and mode of record keeping [8]. Once again, such studies can be valuable to the prospective implementor by providing him with information about the various aspects involved in the creation and implementation of a CAL facility. However, the absence of critical comments or evaluations does mean that the implementor is obliged to assess the viability of the project and its various elements himself. The advantage of other forms of evaluation is that everything is done for him. When faced with a mere description of a project he has to rely on his own critical powers and make his own evaluation. Ultimately, this can be a highly desirable situation for no one knows better than he the conditions that exist in his institution, the nature of the students who will be working with the computer, the uses to which it will be put,

84 MAR1LYN E. KIDD and GLYN HOLMES

the place in the curriculum and so on. He alone can accurately judge if the installation o f C A L would be appropr ia te for his part icular situation. A thorough knowledge of the varieties o f p rograms that have been devised, and the uses to which they have been put should equip him to make an informed judgement about which he can feel confident.

Direc t observation

Finally, the definitive evaluat ion for anyone comtempla t ing the implementat ion o f CAL, as well as those who are called upon to make decisions on the allocation o f funds, should come from first-hand experience o f the medium. Reliance on the experiences o f others is fine to a point but beyond that personal judgement based on direct observat ion should be the deciding factor. Here, as with other areas o f pedagogical innovat ion, it would be unwise to underest imate personal evaluation made in the light o f a tho rough knowledge o f the medium, an awareness o f local condit ions and a reliance upon one 's own pedagogical expertise.

C O N C L U S I O N

By way o f a summary let us say that, because o f the tendency a m o n g present-day educators to consider statistical evidence as being the ultimate deciding factor when a t tempt ing to evaluate a new educat ional medium other forms o f assessment are ignored or rejected as being unreliable or unimportant . We maintain that this posit ion is extremely short-sighted. Because o f the limited number o f such evaluations available, as well as the limitations inherent in this type o f statistically tabulated assessment such evidence is inadequate to provide a reliable indicator. In the area o f C A L the would-be implementor , or even the school administrator , must look at other forms of evaluat ion that will provide the wealth o f informat ion he needs in order to arrive at a well-founded decision as to whether C A L is desirable for his institution. This will entail a movement away from blind faith in a set o f figures towards a reliance on one 's own judgement . Al though this movement may be difficult for some, we believe that it will ultimately lead to better decision making and a more successful implementat ion o f CAL.

R E F E R E N C E S

1. Eisner E. W., The Educational Imagination. Macmillan, New York (1979). 2. Morrison H. W. and Adams E. N., Pilot study of a CAI laboratory in German. Mod. Lang. J. 52, 279-287 (1968). 3. Holtzman W. H., Computer-Assisted Instruction, Testing, and Guidance. Harper & Row, New York (1970). 4. McEwen N., Computer-assisted instruction in second-language learning: an Alberta project. Can. Mod. Lang. Rev. 33,

333-343 (1977). 5. Hope G., Elementary French computer assisted instruction. Foreign Lang. Ann. 15, 347-353 (1982) and Taylor H. F..

Students' reactions to computer assisted instruction in German. Foreign Lang. Ann. 12, 289-291 (1979). 6. Marty F., Reflections on the use of computers in second-language acquisition. Stud. Lang. Learn. 3, 25-53 (1981). 7. Decker H., Computer-aided instruction in French syntax. Mod. Lang. J. 60, 263-267 (1976). Copies of student

evaluations were supplied to us by the author. 8. See for example, Olmsted H. M., Two models of computer-based drill: Teaching Russian with APL. Slat'. East Eur.

J. 19, 11-29 (1975); Smith P. D., $2400, a computer assisted instructional review of Basic Spanish Grammar. System 4, 182-190 (1976); Collett M. J., A tenses computer program for students of French. Mod. Lang. J. 66, 17(~179 (1982).

Date post:	27-Dec-2016
Category:	Documents
Upload:	glyn
View:	216 times
Download:	0 times

CAL evaluation: A cautionary word

Documents