Comput. Educ. Vol. 8, No. 1, pp. 77-84, 1984 0360-1315/8453.00+0.00 Printed in Great Britain Pergamon Press Ltd
C A L E V A L U A T I O N : A C A U T I O N A R Y W O R D
MARILYN E. KIDD ~ a n d GLYN HOLMES 2 ~Huron College, 1349 Western Road, London, Ontario, Canada N6G 1H3 and 2Department of French,
University of Western Ontario, London, Ontario, Canada N6A 3K7
A b s t r a c t ~ e a c h e r s , school administrators and funding agencies are often reluctant to support the acquisition of a CAL facility unless some sort of statistical evidence can be brought forth that will prove the desirability of CAL. There appears to be a widespread belief that statistically tabulated data is the ultimate form of evaluation and an infallible indicator of success or failure. Although statistical evidence should not be overlooked, it is necessary to place it in a more realistic perspective.
Through the consideration of a representative discipline, second language learning, the following paper discusses a variety of statistical evaluation types in order to indicate their advantages and their limitations. The paper suggests that other forms of evaluation previously descredited because they do not conform to the norms of scientific measurement are nonetheless valid and reliable--evaluation based on informed opinion, personal judgement and observations formed as a result of direct experience. Although statistical evaluations may be useful to the prospective implementor as a rough indicator of the potential of the computer medium, they cannot guarantee the success, nor forecast the failure of a particular CAL application. Each CAL application is unique unto itself and involves a host of local variables, hardware, courseware, place in curriculum, nature of users, attitude of faculty and so on. Any decision concerning the desirability of CAL should therefore not rely on blind faith in a set of figures but on personal judgement aided by an awareness of local conditions, a good knowledge of the medium and one's own pedagogical expertise.
THE T R E N D TO EMPIRICISM
The past several years have witnessed not only a growing interest in computer-assisted learning (CAL) but also a very definite upsurge in the number of educators who are in the process of considering implementation of a computer facility. However, before they become involved they would like to have empirical evidence in the form of statistical results that will prove the desirability of CAL. Due to our expertise in the field of CAL we have received, and continue to receive, a number of requests on the part of would-be implementors to furnish them with statistical d a t a - -o f any description--that would indicate the effectiveness of CAL. The desire for statistical evidence and the faith in its reliability is also apparent amongst the members of funding agencies and school administrators who are loathe to become involved in CAL or to approve the allocation of funds for this new type of educational technology unless such statistical data can be provided. There appears to be a widespread belief that statistically tabulated data is the ultimate form of evaluation and that decisions as to whether to implement or not should be made on the basis of its findings. Even the detractors of CAL cite statistical criteria as a basis for their arguments. A colleague at the University of Western Ontario strongly opposed the installation of a CAL facility on the grounds that it had not yet been shown "in even a semi-scientific manner" what contribution CAL could make to the study of languages--the suggestion being that scientific proof constitutes the only reliable form of evaluation. The dependence on statistical data, the belief in the precision and infallibility of numbers is quite pervasive among educators today. Since the earlier part of the century when scientific methods were introduced into the realm of education a trend has developed towards an increasing reliance on this one form of evaluation. The dangers of such an approach to education have been indicated recently by at least one educator, Elliot W. Eisner, in his book The Educational Imagination [1]. Commenting on the growth of scientism in education promoted by such eminent figures as Edward L. Thorndike and John Dewey, Eisner states that it has led to the present-day educational researcher who resembles nothing so much as a scientist. As he points out, and quite rightly, the validation of the scientific method has been at the expense of other forms of inquiry. Eisner criticizes this attitude when he states with heavy sarcasm that to do historical or critical analysis, to engage in philosophical inquiry is not to do research. To count is somehow better, perhaps, because counting or measuring yields numbers that can be carried to the third or fourth decimal place and hence provide the illusion of precision. Eisner does not wish,
77
78 MARILYN E. KIDD and GLYN HOLMES
and neither do we, to totally reject statistical evidence. What one is trying to do is to redress the balance and view it in a more realistic perspective. Statistical evidence is not the only form of evaluation, nor is it necessarily the best or most accurate form. It reveals several limitations which restrict its reliability. Other forms of evaluation based on informed opinion, personal judgement or intuition can be very useful indicators and should not be discounted.
In the following paper, through the consideration of material available for a representative discipline, in this case second language learning, we shall discuss a variety of different types of evaluation in order to indicate their particular advantages and limitations.
S T A T I S T I C A L ANALYSES
The prospective implementor in search of statistical data proving the desirability and effectiveness of CAL in the field of second language learning (and in many other disciplines) will find that relatively few rigorous statistical analyses are at his disposal. His first problem is, therefore, that there is a dearth of data of this kind available. In addition, results of such studies are not always easily consulted, often being published in progress reports of limited circulation. The types of statistical study that are at hand fall into three categories. First there is the full-scale experiment using a control group in order to test the effectiveness of CAL. Then there is the more localized study to test a specific aspect of student performance. Finally there is the study designed to evaluate student reaction.
Statistical evaluation with control group
Information on studies of this type is quite limited in academic journals. Such evaluations are aimed at determining the effectiveness of the computer medium relative to traditional classroom instruction. Two groups are set up for the purposes of comparison one using CAL, the other receiving no CAL. In a study conducted at SUNY (State University of New York) [2] a group of randomly selected students in elementary German received their grammar instruction and language lab training via a computer equipped with audio and slide projector peripherals. Although students had some classroom instruction the CAL facility provided all grammar practice and also tested the student in translation, aural comprehension, and dictation. At the end of the experiment students from the control group who had not been using CAL and the CAL group both took the Modern Language Aptitude Test covering the four skills of reading, writing, speaking and listening. The statistical results indicated that the CAL group was better in reading and writing but weaker in speaking and listening comprehension although not significantly weaker (Fig. 1). In another control group study performed at Stanford University[3] a group of volunteer students were given no classroom instruction whatsoever the latter being replaced by 5 hours of CAL a week. They did, however, maintain their traditional language lab sessions. Performance was compared with that of the control group on the basis of a series of in-house term tests consisting of the translation of sentences from English to Russian. The results indicated that CAL students performed as well or better than the non-CAL group (Fig. 2). Although the statistical results in both studies point to the success of CAL, due to the lack of similarities between the two experiments in terms of the courseware used, the mode of implementation, the skills developed, the areas of performance tested and the means of testing, these results remain inextricably wed to the specific set of conditions which produced them. It would ultimately be erroneous to take these statistics as proof that computer-assisted learning in second languages will meet with the same success in another institution under another set of variables.
Localized evaluation
Other less rigorous studies of a statistical nature would seem to raise greater problems as to the significance of their results. In an experiment conducted at the University of Alberta[4] a program was devised to help drill students in elementary French grammar. Statistics were tabulated concerning the time it took students to complete the program (Fig. 3). Although the results seemed to indicate that students took less time than that which would normally be required in the classroom, if one looks behind the statistical results one discovers that the study was done in the absence of any control group. Whereas the program was devised for independent use, most students
CAL evaluation: a cautionary word 79
I00
2
//
r- J
i /
t
IO0
O Percenti le rank in group IOO 0
Reading Achievement
"/ f /
CAI
. . . . ALM
Percentile rank in group
Writing Achievement
I I00
ioo F [__ ___ , ~ IO0
,," / - ' /
/ / / /
0 /
I / I o Percentile rank in group IOO 0
Speaking Achievement
I
/
/
/ / - k C A I
.... ALM
I IOO Percentile rank in group
Listening Achievement
Fig, 1. Results of CAL experiment in German at SUNY. CAI = Computer assisted instruction; ALM = audio-lingual method.
were doing the program in addition to their normal classroom work. As the author herself states, these data should be viewed with caution. Statistical results are only as reliable as the quality of the measurement tool and should not be taken at face value. Furthermore, experiments which attempt to measure only one aspect of a CAL program run the risk of producing a distorted picture. The focusing of attention on a single element of what is essentially a multi-faceted experience usually precludes the observation of the influence of other aspects that are not being measured. Because any CAL program is composed of a complex assemblage of elements, only one of which is the medium itself, it is very difficult to ascertain which is the determining factor that is causing the particular results.
Attitude evaluation
The prospective implementor will also come across statistical studies pertaining to student reaction[5]. Such information provides a very convenient format that can be perused rapidly in order to obtain a general impression concerning student attitude. The evaluation also permits a researcher to take a sounding of student reaction in areas that he feels are pertinent to learning with a computer, areas such as attitude towards the medium, lesson quality, skills acquired. What this type of evaluation does not do is allow the student to identify elements that he himself has perceived as significant. In addition, because his answer is expressed in numerical terms or as a choice from already formulated answers (Figs 4 and 5) the student is unable to qualify his response. For any of us who have undergone student evaluations of the numerical kind we are well aware that this form of expression lacks the subtlety and precision of a verbal response. In attempting
Tabl
e 12
-14.
E
rro
r D
istr
ibu
tio
n
for
the
Com
mon
Po
rtio
n
of
the
Aut
umn
Qua
rter
F
inal
E
xam
inat
ion
- R
ussi
an
Pro
qram
Num
ber
of
Stu
dent
s
Tab
le
12-1
5,
Err
or
Dis
trib
uti
on
fo
r th
e T
able
12
-18.
C
omm
on P
ort
ion
of
th
e W
inte
r Q
uart
er
Fin
al
Exa
min
atio
n -
Rus
sian
P
rogr
am N
umbe
r o
f S
tude
nts
Err
or
Dis
trib
uti
on
fo
r th
e S
~ri
ng
Qu
art
er
Fin
al
Exa
min
atio
n -
Rus
sian
P
rogr
am
Num
ber
of
Stu
dent
s
Num
ber
of
err
ors
C
ompu
ter-
Bas
ed
Reg
ular
N
umbe
r o
f e
rro
rs
Com
pute
r-B
ased
R
egul
ar
Num
ber
of
err
ors
C
ompu
ter-
Bas
ed
Reg
ular
3.5
l 2
l l
21.5
l
5 2
] 6
l 24
.5
l 6
3 6.
5 l
26
l 7
l 8
l 27
l
8 2
9.5
l 31
.5
7 9
3 lO
1
32
] ]l
3
ll
1 34
1
13
l 12
2
35
l 15
]
13
l 37
]
l 16
l
1 14
.5
l 39
1
17
2 16
1
40
l 19
l
16.5
l
41
1 21
2
1 18
l
42
l 22
l
l 18
.5
1 45
l
23
2 19
l
46
I 25
l
l 19
.5
l 47
.5
l 27
3
2]
2 50
.5
l 29
1
22.5
1
l 51
.5
1 30
l
23
1 l
60
1 1
31
2 23
.5
1 61
1
33
1 24
]
63.5
]
34
1 24
.5
1 67
1
37
l 25
1
69
1 3L
~ 1
26.5
l
60.5
l
41
l 27
1
73
1 43
1
29.5
l
74.5
2
l 45
l
30
1 76
.5
l 53
l
30,5
l
80.5
1
67
l 32
.5
1 81
l
64
l 33
l
82
1 65
l
37.5
?
89
l 72
1
38
1 91
l
76
1 3g
.5
1 92
1
79
] 41
1
2 93
1
93
1 47
.5
1 10
6 l
97
1 16
6 1
120
7 14
1 l
Tot
al
num
ber
of
stud
ents
29
4 28
b T
ota]
nu
mbe
r of
st
uden
ts
274
18b
Tot
al
num
ber
of
stud
ents
24
4 16
b A
vera
ge
num
ber
of
err
ors
15
.8
49.0
D
vera
ge
nuiY
lber
of
e
rro
rs
21.8
24
.2
Ave
rage
nu
mbe
r of
e
rro
rs
53.0
71
.I
a O
f th
e
thir
ty
stu
de
nts
e
nro
lle
d,
one
]eft
du
ring
th
e q
ua
rte
r.
b O
f th
e th
irty
-eig
ht
en
roll
ed
, te
n le
ft
duri
ng
the
qu
art
er.
a T
hree
o
f th
e o
rig
ina
l st
uden
ts
did
n
ot
en
roll
, tw
o ne
w
stud
ents
w
ere
adde
d an
d on
e st
ud
en
t di
d no
t fi
nis
h
the
ou
art
er.
b
Th
irte
en
o
f th
e o
rig
ina
l st
uden
ts
did
not
en
roll
.
a T
hre
e
stu
de
nts
di
d n
ot
en
roll
. b
Thr
ee
stud
ents
di
d no
t e
nro
ll,
thre
e st
uden
ts
en
roll
ed
fo
r th
e fi
rst
tim
e,
and
one
stud
ent
tra
nsf
err
ed
fr
om
the
com
pute
r-ba
sed
sect
ion
to
th
e re
gu
lar
sect
ion
.
F z m'l
r~
,< ?
Fig
. 2.
R
esul
ts
of
CA
L
expe
rim
ent
in
Rus
sian
at
S
tanf
ord
Uni
vers
ity.
CAL evaluation: a cautionary word 81
Means, Standard Dev ia t ions , and Ranges o f the Number of Minutes Taken to Complete One Uni t o f I n s t r u c t i o n
by the Four Student Groups
Group N llean S.D. Range
Elementary 16 82.56 24.04 51.11 - 134.25
Jun io r High 16 51.79 10.07 36.84 - 72.90
Senior High 14 37.72 7.97 25.34 - 56.00
Adu l t 12 33.28 I I . 2 7 20,30 - 65.62
Table I
Means, Standard Dev ia t ions , and Ranges o f the Pro jected Number of Hours Required to Complete the Program
by the Four Student Groups
Group N Mean S,D. Range
Elementary 16 41.28 12.02 25.55 - 67.12
Jun io r High 16 25.89 5.03 18.42 - 36.45
Senior High 14 18.86 3.98 12.67 - 28.00
Adul t 12 16.64 5.63 10.15 - 32.81
Fig. 3. Results of CAL experiment in French at the University of Alberta.
to reduce something essentially very complex to a unidimensional symbol one must simplify and this process leads to the elimination of many meaningful nuances.
We hope to have shown that statistically tabulated evaluation, although useful, does have its limitations. It is not the unerring, irrefutable, indicator that some would have us believe. Its basis resides in a narrowly circumscribed set of variables, and its values are not absolute but relative to the manner in which they were obtained. Their reliability will vary directly in proportion to the quality of the measurement tool and the conditions under which this measurement took place. Under optimum conditions statistically tabulated evaluations of a CAL project can give some indication as to the possible potential of the medium but they can give no guarantee that this potential will be realized in any application other than the one in question.
OTHER FORMS OF E V A L U A T I O N
If statistically tabulated evaluation is placed in what we believe to be a more balanced perspective other forms of evaluation will emerge as valid--forms that in recent years have often been discredited merely because they do not conform to the norms of scientific measurement: evaluation based on informed opinion, personal judgement and observation based on direct experience.
Evaluation based on informed opinion
The opinions of teachers and students who have had direct experience with CAL can offer the prospective implementor valuable insights into this form of educational technology. An example of such non-scientific analysis is apparent in the comments of educators such as Fernand Marry of the University of Illinois[6] made in reference to his PLATO program for elementary French. He states that one of the basic advantages for the student using computerized materials is that he achieves higher levels of concentration for longer periods of time with a resulting increase in retention. Although his evaluation has no statistical basis that does not mean that it must therefore be unreliable. In the case of Fernand Marty it is founded on insights gained in 35 years of teaching, 6 years of working with classes using CAL, and numerous discussions with students who have used
82 MARILYN E. KIDD and GLYN HOLMES
TABLE I
Statements about FRELEM
l Strongly Agree; 2 - S l i g h t l y Agree; 3 S l i g h t l y Disagree; 4 ~ t rong ly Disagree. N - Number o f students responding to item,
Percentabe o f those answering the quest ion who marked each response.
I , Using FRELEM is f r i g h t e n i n g .
I , 2. 3. 4. Mean
Sem. ] N 89 l 4 15 80 3.74 Sem. 2 N = I09 l 4 lO 86 3.83
2. Using FRELEM is boring.
I . 2, 3. 4. Mean
Sem, ] N 91 2 17 41 40 3.19 Sem. 2 N - l l 3 2 17 39 4l 3.17
3, Using FRELEH is fun.
] . 2. 3. 4. Mean
Sem, l N - 93 33 54 14 0 l.R3 Sem. 2 N - I l l 23 6 3 9 4 1.92
S. The best items in FRELEM are t h e . . .
(1) scrambled sentences (2) t r a n s l a t i o n s (3) t ransformat ions (4) f i l l in blanks (5) a l l equal (6) don ' t know
I . 2. 3. 4. 5, 6.
Sem. l N 94 3 ]5 9 2] 48 4 Sem. 2 N - l l 4 4 9 14 25 39 lO
9. I learned fundamental p r i nc ip les or theor ies .
~6. 5. 4, 3. 2. 1. Mean "6 = stronm]~ aaree
Sem. l N = 93 2 5 32 3l 20 l 1 4.67 I - s t rong ly disagree Sem. 2 N =I09 1 3 31 3 7 12 4 4 4.27
lO. l acquired a basic understanding o f the subject area,
6, 5. 4. 3. 2. I . Hean
Sem. l N = 93 25' 42 30 3' 0 0 4.88 Sem. 2 N - I l l 18 48 25 6 2 l 4.71
TABLE 2
AGREE DISAGREE
I . I became more in te res ted in the subject .
6. 5. 4, 3. 2. I . Mean
Sem. l N -95 2 2 29 41 4 2 I 4.62 Sem. 2 N = I ] 4 I f ' 28 45 lO 3 2 4.28
2. [ was motivated to do work beyond minimunl requirements.
6. 5. 4. 3. 2, l , Mean
Sem. l N =94 18] 33 33 15 0 l 4.51 Sem. 2 N = l l 2 12 32' 34 18 l 4 4.25
3, Concepts were presented in a manner tha t aided nly learn ing.
6. 5. 4. 3, 2. l . Mean Sem. I N =94 40 33 20 5 O' l 5.05 Sem. 2 N= ]13 33 3 5 23' 5 3 l 4.88
4. Hy work was evaluated in ways tha t were meaningful to me.
6. 5. 4, 3. 2. I . Mean Sere. I N=93 2 2 34' 32 g l l 4.66 Sere. 2 N=108 13 39' 32 12 2 l 4,44
Fig. 4. Sample of student evaluation of C A t at University of Iowa.
CAL. In addition, observations culled from direct experience with the medium can provide a wealth of detail that is not found in statistical compilations. Furthermore, such evidence is usually quite abundant within the confines of a specific discipline and covers a broad spectrum of topics including the effectiveness of various types of implementation (mainline, adjunctive), the aspects of language learning best suited to the medium, the use of peripherals, effective design of courseware, as well as student performance and reaction. For the would-be implementor this type of evaluation made by a seasoned practitioner can prove invaluable in assisting him to assess the desirability of the medium for his own institution.
CAL evaluation: a cautionary word 83
How sa t i s f i ed were you with what you learned from the CAI course?
Would you say that the computer was l ike a private tutor?
How would you compare the effectiveness of CAI as opposed to exer- cises done in class or at home?
Did the grammar expla- nation at the beginning of each section rein- force what you learned in class?
What was your overal l reaction to the res- ponses from the computer when you made a mistake?
very very much much so-so l i t t l e
17 (11) 47 (41) 30 (40) 6 (8)
strongly agree agree disagree
5 (15) 65 (67) 23 (13)
strongly disagree
0 (2)
much better better same worse
16 (12) 39 (37) 32 (40) II (14)
was better reinforced confused me
29 (7) 67 (86) 2 (7)
helped recognize reinforced guided me to what was wrong grammar rules correct answer
35 (28) 28 (30) 33 (31)
useless
4 (o)
The figures represent the results for each of two classes expressed in terms of percentage.
Fig. 5. Sample of student evaluation of CAL at Ohio State University.
In the same way, appraisals of a CAL facility made by its users, the students themselves, can be very useful and revealing. Student users at the University of California, Riverside [7] were asked to identify what they considered to be the advantages and disadvantages of the CAL facility for French. A sample reply identified the advantages as being that you study only what you are weak in, each exercise is different and the computer provides instant correction. On the negative side the student complained of a lack of accent characters and felt it would be advantageous to have a program that tested all the verb tenses. This type of appraisal makes no assumptions but allows the student to reveal what he has found to be beneficial or deficient in the system.
Naturally, when one is dealing with subjective evaluation there is always the risk of human error or personal bias. Yet, there is equally the potential for keen perception and dead accurate analysis. Above all it offers the possibility of evaluation over a wide range of areas that do not yield very readily to statistical analysis.
Personal .judgement Although a good number of studies of CAL projects express some form of evaluation either
verbal or statistical, there is much literature available that is purely descriptive in nature. Not only are no statistical data presented, the practitioner does not feel inclined to make any personal assessment. He is content merely to provide detailed information about various aspects of his particular facility, the hardware configuration used, the method of implementation, the elements of his courseware design, lesson content, program logic and mode of record keeping [8]. Once again, such studies can be valuable to the prospective implementor by providing him with information about the various aspects involved in the creation and implementation of a CAL facility. However, the absence of critical comments or evaluations does mean that the implementor is obliged to assess the viability of the project and its various elements himself. The advantage of other forms of evaluation is that everything is done for him. When faced with a mere description of a project he has to rely on his own critical powers and make his own evaluation. Ultimately, this can be a highly desirable situation for no one knows better than he the conditions that exist in his institution, the nature of the students who will be working with the computer, the uses to which it will be put,
84 MAR1LYN E. KIDD and GLYN HOLMES
the place in the curriculum and so on. He alone can accurately judge if the installation o f C A L would be appropr ia te for his part icular situation. A thorough knowledge of the varieties o f p rograms that have been devised, and the uses to which they have been put should equip him to make an informed judgement about which he can feel confident.
Direc t observation
Finally, the definitive evaluat ion for anyone comtempla t ing the implementat ion o f CAL, as well as those who are called upon to make decisions on the allocation o f funds, should come from first-hand experience o f the medium. Reliance on the experiences o f others is fine to a point but beyond that personal judgement based on direct observat ion should be the deciding factor. Here, as with other areas o f pedagogical innovat ion, it would be unwise to underest imate personal evaluation made in the light o f a tho rough knowledge o f the medium, an awareness o f local condit ions and a reliance upon one 's own pedagogical expertise.
C O N C L U S I O N
By way o f a summary let us say that, because o f the tendency a m o n g present-day educators to consider statistical evidence as being the ultimate deciding factor when a t tempt ing to evaluate a new educat ional medium other forms o f assessment are ignored or rejected as being unreliable or unimportant . We maintain that this posit ion is extremely short-sighted. Because o f the limited number o f such evaluations available, as well as the limitations inherent in this type o f statistically tabulated assessment such evidence is inadequate to provide a reliable indicator. In the area o f C A L the would-be implementor , or even the school administrator , must look at other forms of evaluat ion that will provide the wealth o f informat ion he needs in order to arrive at a well-founded decision as to whether C A L is desirable for his institution. This will entail a movement away from blind faith in a set o f figures towards a reliance on one 's own judgement . Al though this movement may be difficult for some, we believe that it will ultimately lead to better decision making and a more successful implementat ion o f CAL.
R E F E R E N C E S
1. Eisner E. W., The Educational Imagination. Macmillan, New York (1979). 2. Morrison H. W. and Adams E. N., Pilot study of a CAI laboratory in German. Mod. Lang. J. 52, 279-287 (1968). 3. Holtzman W. H., Computer-Assisted Instruction, Testing, and Guidance. Harper & Row, New York (1970). 4. McEwen N., Computer-assisted instruction in second-language learning: an Alberta project. Can. Mod. Lang. Rev. 33,
333-343 (1977). 5. Hope G., Elementary French computer assisted instruction. Foreign Lang. Ann. 15, 347-353 (1982) and Taylor H. F..
Students' reactions to computer assisted instruction in German. Foreign Lang. Ann. 12, 289-291 (1979). 6. Marty F., Reflections on the use of computers in second-language acquisition. Stud. Lang. Learn. 3, 25-53 (1981). 7. Decker H., Computer-aided instruction in French syntax. Mod. Lang. J. 60, 263-267 (1976). Copies of student
evaluations were supplied to us by the author. 8. See for example, Olmsted H. M., Two models of computer-based drill: Teaching Russian with APL. Slat'. East Eur.
J. 19, 11-29 (1975); Smith P. D., $2400, a computer assisted instructional review of Basic Spanish Grammar. System 4, 182-190 (1976); Collett M. J., A tenses computer program for students of French. Mod. Lang. J. 66, 17(~179 (1982).