Exploring Student-Centred Teaching,
Open-Ended Tasks, and Real Data Analysis
to Promote Students’ Reasoning about Variation
Thesis submitted for the MSc in Mathematics and Science Education
Dian Kusumawati
Research Supervisor: drs. André Heck
AMSTEL Instituut
Universiteit van Amsterdam
Science Park 904
1098 XH Amsterdam
The Netherlands
July 2010
ii
iii
Abstract
In this master research thesis I report about a study in which I explored the influence of a
specific approach in teaching variation on the progress and development of students‟
statistical reasoning about variation. A socio-constructivist teaching and learning approach
was designed and tried out in a pretest-posttest experimental-control-group research design.
This was done with students of a social science stream in a secondary school in a rural area of
Indonesia. The teaching approach contained three new key elements, namely, the use of
(1) real data within a context instead of the use of artificial data; (2) open-ended tasks; and
(3) group work. The research results indicated that the experimental teaching approach pro-
vided students a more conducive learning environment for developing statistical reasoning.
Although students from both experimental and control groups were mostly at a low level of
reasoning, the quantitative and qualitative analysis of their response indicated that there were
more students in the experimental group that improved regarding the level of statistical
reasoning. Qualitatively, students in the experimental group began to use central measures in
making their conclusions. Regarding the procedural knowledge, there was no statistically
significant difference in the performance between the two groups. These results and the fact
that the cooperating teacher was ready to adopt the teaching approach have encouraged me to
conclude that the chosen teaching approach has potential to help students develop and
progress with statistical reasoning about variation. Based on the analysis of the teaching
experiment, recommendations for adopting the teaching approach in future practice are given.
iv
v
Acknowledgement
I would like to express my deep gratitude to my supervisor, drs. André Heck, whose valuable
advice, supervision, flexibility, and motivational support have enabled me to complete this
thesis. Many thanks are directed to dr. Mary Beth Key, who has helped me go through this
master program without much troubles.
I thank the cooperating teachers who have accommodated me to do my teaching experiment
in an unusual time in their school plan. I also would like to thank dr. Willem Jan Gerver from
Maastricht University who gave me raw data of a recent Indonesian growth survey for use in
the teaching and learning activities in my study.
Not least importantly, I also thank my fellow students at the AMSTEL Institute and my
friends in Amsterdam for all support they have given me, especially Clea Matson, Lilia
Ekimova, and Budi Mulyono.
Finally, by nature I tend to be easily worried and self-negative. This trait played tricks on me
when finishing the master‟s thesis. I wish to thank my parents, my brother, my sisters, and my
dear friend Dharma, for listening to my worries when the self-negativity engulfed me and
cheering me up all the way. To my mother and father, this thesis is simply dedicated to you.
vi
vii
Table of Contents
ABSTRACT III
ACKNOWLEDGEMENT V
1. INTRODUCTION 1
1.1. Statement of the Problem 1
1.2. The Indonesian Education System 2
1.3. Statistics in the Indonesian Mathematics Curriculum 4
1.4. The Purpose of the Study 6
2. THEORETICAL BACKGROUND 9
2.1. Statistical Literacy, Statistical Reasoning and Statistical Thinking 9
2.2. Assessment of Statistical Reasoning 10 2.2.1. The Structure of the Observed Learning Outcome (SOLO) Taxonomy 10
2.2.2. Statistical Thinking 13
2.2.3. Statistical Reasoning about Variation 15
2.3. Teaching Variation 17 2.3.1. Conceptions about Variability 17
2.3.2. Suggestions from Research Studies about Contexts 18
2.3.3. Principles Underpinning the Design of My Lesson Activities 19
3. RESEARCH DESIGN AND METHODOLOGY 21
3.1. Research Question 21
3.2. Research Setting and Research Methodology 22 3.2.1. The School Setting 22
3.2.2. Research Methodology 23
3.2.3. The Teaching Materials 24
3.3. Research Instruments 26 3.3.1. The Pretest and Posttest 26
3.3.2. The Questionnaire 32
3.3.3. The Interview 33
4. RESULTS AND ANALYSIS OF THE TEACHING EXPERIMENT 35
4.1. Classroom Observations Prior to the Teaching Experiment 35
4.2. The Teaching Experiment 36 Lesson 1: Activity 1 37
Lesson 2: Activity 1 revisited 40
viii
Lesson 3: Activity 2 42
Lesson 4: Activity 2 42
4.3. The Teaching in the Control Group 44
4.4. The Questionnaire 45 The Use of Real Data 45
Group Work 46
The Teaching Approach 46
Students‟ Free Feedback 47
4.5. The Interview with the teacher of the experimental group 48
4.6. Analysis of the Teaching Experiment 50
5. RESULTS AND ANALYSIS OF THE PRETEST AND POSTTEST 53
5.1. Subtest A: Question 1 and 2 53
5.2. Subtest B: Question 3 and 4 60
5.3. Subtest C: Question 5-10 61
5.4. Result of the Interview 82
5.5. Summary and Analysis of the Findings from Pretest and Posttest 84 5.5.1. Subtest A 84
5.5.2. Subtest B 85
5.5.3. Subtest C 85
6. CONCLUSIONS AND DISCUSSIONS 87
6.1. Conclusions 87
6.2. Limitations of my Study and Suggestions for Future Research 91
REFERENCES 93
EXTENDED BIBLIOGRAPHY 95
LIST OF APPENDICES 97
Appendix A. Garfield and Ben-Zvi’s Framework for Assessing Reasoning about
Variability 99
Appendix B. Students’ Activity Sheets 101
Appendix C. Pretest/Posttest 111
Appendix D. The Questionnaire 117
1
1. Introduction In this chapter, I explain the background of my research, including my motivation. Firstly, I
state the background problem in general. Secondly, I describe the Indonesian education
system. Thirdly, I give an overview of the Indonesian statistics curriculum at primary and
secondary level. Finally, I explain the purpose of my research. I hope that my personal aims
and motivation for doing this research can be grasped from the contents of this chapter.
1.1. Statement of the Problem
Statistics has become part of primary and secondary mathematics education curriculum across
the world, although the breadth and depth of its content differ from country to country.
Statistics being considered by many people as a part of mathematics, it is no surprise that
statistics teaching in school practice does not differ much from mathematics teaching.
Therefore, the recent reform efforts in mathematics education based on a constructivist view
of education have also influenced statistics education (cf., Moore, 1997).
Recent reforms in statistics education promote the idea that the focus of teaching and
learning statistics should be on the understanding of statistical concepts, rather than on the
procedural knowledge and skills. As Moore & Cabe (2005, p. xxxi) wrote: “The goal of
statistics is to gain understanding from data.” Thus, students should not merely be able to
compute statistical measures. It is recommended that the focus of the statistical content to be
learned by students is the understanding of statistical ideas and concepts. To this end,
statistics should be less taught by lectures, but more through real data investigation carried out
by students. Cobb & Moore (1997, p. 801) pointed out that the role of context in statistics and
mathematics are different: “Statistics requires a different kind of thinking, because data are
not just numbers, they are number with context”. This requires the introduction of a real
world context in any interesting statistics problem.
What can be said about constructivist approaches in Indonesian mathematics
education? Since 2000, the Indonesian Ministry of Education has enforced a new curriculum
that promotes a constructivist view of education (Badan Litbang Puskur, 2002). Student-
centred teaching and learning is endorsed and the use of ICT in education is promoted.
However, the implementation of the top-down reform has not been successful yet. Teachers
still rely heavily on textbooks, and their teaching and learning still tends to emphasize
formulas and procedures (Sembiring et al., 2008). In other words, rote learning and teacher-
centred activities are still the dominant ways of knowledge transfer. In particular, the in-
service teacher training for this new curriculum has not been effective.
2
If current mathematics teaching in Indonesia still gives students the impression that
mathematics is only about plugging numbers into formulas to get the correct answer of a
problem, then I believe that the impression is even more strongly felt by students toward
statistics, as statistics teaching is still mostly based on textbooks which mainly deal with
formulas, computation and closed problems with artificial data. My belief is in line with Ben-
Zvi & Garfield (2004, p. 4) who wrote: “Students equate statistics with mathematics and
expect the focus to be in numbers, computations, formulas, and one right answer.”
In fact, I can still remember that statistics meant just number plugging to me when I
was a high school student. I first got interested in statistics when I did a course in Regression
Analysis during my bachelor study. Only then, when I got in contact with data analysis, I
started to see that statistics is useful in making decisions and drawing conclusions.
The above problem in statistics education in Indonesia at secondary school level
motivated me in my master‟s study to investigate a different approach of statistics
teaching that helps students improve their understanding of and reasoning about statistical
concepts and ideas, and not just learn how to do statistical computations. The usual teaching
sequence of (1) explaining the formulas, (2) working out examples, and (3) giving procedural
problems has not been a sufficiently successful approach to enable students to reason statisti-
cally at a proficient level. Exploratory data analysis by students seems to me more promising.
My personal experience as an Indonesian secondary school student and as a teacher-
student in mathematics education also motivated me to try out a student-centred approach in
which students would analyze real data. I saw and still see no reason why the students could
not learn how to draw conclusions based on real data and simple descriptive statistics which
they had learned before. The depth of the data analysis can be adjusted to the content of the
Indonesian curriculum.
In my study, I conducted an experiment in a secondary school class in which I tried
out a constructivist approach to learning about variation. I compared the results of the
experimental group with that of a control group, who received traditional teaching. I hoped
and expected that the results of my study could lead to recommendations for teachers and
future teachers and give them ideas and/or suggestions about better ways of teaching the
subject of variation.
1.2. The Indonesian Education System
Based on Law Number 20 [UU no. 20 year 2003] about the Indonesian education system, the
national education system consists of formal, non-formal and informal education. The system
of formal education consists of primary education (Grade 1-9), secondary education
3
(Grade 10-12) and higher education (see Figure 1). The primary education consists of 6 years
elementary school or Sekolah Dasar (SD) and 3 years lower (literally, first) secondary school
or Sekolah Menengah Pertama (SMP). It is free of charge and compulsory for every child of
age between 7 to 15 years. There are two types of secondary education: General Secondary
School or Sekolah Menengah Atas (SMA) and vocational secondary school or Sekolah
Menengah Kejuruan (SMK). In SMA, there are three streams:
Natural science or Ilmu Pengetahuan Alam (IPA)
Social science or Ilmu Pengetahuan Sosial (IPS)
Language or Bahasa.
A student graduates from secondary education through a nationwide examination.
Figure 1. The education system in Indonesia.
Regarding the curriculum, as mentioned in Section 1.1., the government introduced in
2000 a new curriculum, which is competency-based and promotes a constructivist student-
centred approach. In 2005, the government introduced another curriculum called Kurikulum
Tingkat Satuan Pendidikan (KTSP) or Curriculum of an Education Unit Level (Naskah
Akademik KTSP Pendidikan Dasar dan Menengah, Puskur, 2005). KTSP is basically an
extension and diversification of curriculum 2000 in the spirit of school autonomy and local
government autonomy. The curriculum is still competence-based but the central government
gives freedom to every school to develop its own implementation of the curriculum, based
Higher Education
Secondary Education (3 years)
SMA/SMK
Lower Secondary (3 years)
SMP
Elementary School (6 years)
SD
National Examination
National Examination
Primary Education
4
upon the potential of its own students, the social characteristics and the potential of the local
community. The KTSP implementation must conform to the basic structure of the formal
curriculum and the competency standards of graduates dictated by the Ministry of Education.
Personally, I really agree with this curricular scheme and really like the fact that this
means that teachers have freedom to develop their own subject curriculum, which is then later
together with all other subjects compiled into the KTSP implementation of the school. I also
agree that education should be tailored according to the potential of the students. However, in
practice, the teachers and the school still have trouble with design and implementation of their
own curriculum. As mentioned in the first section, the student-centred approach of curriculum
2000 has not yet been adopted by many teachers and in the end, the syllabus and KTSP of
many schools is produced from copying other school‟s KTSP or from an in-service teacher
training event held by the government (Kajian Kebijakan Kurikulum Matematika, Puskur,
2007). Usually only better-facilitated schools in the bigger cities are able to produce their own
curriculum. Efforts are still needed to improve teacher professionalism so that the goal of
accommodating each student‟s needs can be realized. With this research study I hope to
contribute to such standard of education regarding statistical notion of variation.
1.3. Statistics in the Indonesian Mathematics Curriculum
The Indonesian mathematics curriculum is somewhat modular in the sense that each big
mathematical concept is taught in a separate chapter of the textbook. Once it has been
completed, students are not likely to touch upon the subject again for a while, except for
reviewing or refreshing purposes when needed as prerequisites of subsequent book chapters.
The Indonesian curriculum is also a spiral curriculum in the sense that at every higher level of
education, the breadth and depth of a big concept are increased.
At each level of schooling, elementary, lower and upper secondary level, there is a
book chapter about statistics (see Table 1). In elementary school (SD), statistics is taught in
grade 6, in the first semester, under the topic of „Data Analysis‟. In this grade, students mainly
learn to analyze data in simple ways, to present data in simple graphs and tables and to
interpret them. I reviewed one book from the government (Sumanto et al., 2008) and in this
textbook; the measure of centre is indeed not present. However, my personal communication
with a primary school teacher revealed that the common measures of centre, namely mode,
median and mean, are taught in reality because they usually appear in the final school
examination.
In lower secondary school (SMP), statistics is taught in Grade 9, first semester, under
the topic „Statistics and Probability‟. The students learn more ways to represent data and, in
addition, they learn about central measures. Moreover, probability is introduced.
5
Finally, in upper secondary school (SMA/SMK), statistics is taught in grade 11, under
the topic of „Statistics and Probability‟. In Table 1, it is shown that measures of dispersion are
included in the contents of statistics in SMA. Furthermore, the probability content is more
advanced compared to that in SMP. Regarding the standard contents of statistics, the three
streams have the same statistics contents, but the contents of probability differ. Students in
natural science stream (IPA) learn more about probability. Another difference is the time allo-
cation for learning this topic. For IPA students, the topic „Statistics and Probability‟ is only
one out of three topics in the first semester. On the other hand, students in the social science
stream (IPS) only learn this topic for the whole semester and students in the language stream
(Bahasa) have the whole year to learn Statistics and Probability. I believe that the underlying
idea is to adjust the pace of mathematical learning of students from stream to stream.
Level Standard Competency Explanation of Standard Competency
SD Grade 6
Semester 1
Data Analysis
Collecting and analyzing
data
Collect and read data
Analyze and present data in table form
Interpret data representations
SMP Grade 9
Semester 1
Statistics and
Probability
Analyzing and presenting
data
Understanding the
probability of simple events
Determine the mean, median, and mode
Present data in tabular forms, bar chart,
line graph and pie chart
Determine the sample space of an
experiment
Determine the probability of a simple
event
SMA IPA
Grade 11
Semester 1
Statistics and
Probability
Using the rules of statistics,
counting rules, and
properties of probability in
problem solving
Read data in tabular forms, bar chart, line
graph, pie chart, and ogive
Present data in forms of table, bar chart,
line graph, pie chart, and ogive, and
interpret them
Compute measures of centre, location
and dispersion, and interpret them.
Use the rules of multiplication,
permutation and combination in problem
solving
Determine the sample space of an
experiment
Determine the probability of an event
and interpret it (the meaning)
Table 1. The standard curriculum of statistics and probability.
6
SMA IPS
Grade 11
Semester 1
Statistics and
Probability
Using rules of statistics,
counting rules, and properties
of probability in problem
solving
Read data in forms of table, bar chart,
line graph, pie chart, and ogive
Present data in forms of table, bar
chart, line graph, pie chart, and ogive
and interpret them
Compute measures of centre, location
and dispersion, and interpret them.
Use the rules of multiplication,
permutation and combination in
problem solving
Determine the sample space of an
experiment
Determine the probability of an event
and interpret it (the meaning)
SMA Bahasa
Grade 11
Statistics and
Probability
Semester 1
Analyzing, presenting,
and interpreting data
Read data in forms of table, bar chart,
line graph, pie chart, and ogive
Present data in forms of table, bar
chart, line graph, pie chart, and ogive
and interpret them
Compute measures of centre, location
and dispersion, and interpret them.
Semester 2
Using counting rules to
determine the probability of
an event and interpret it.
Use the rules of multiplication,
permutation and combination in
problem solving
Determine the sample space of an
experiment
Determine the probability of an event
and interpret it (the meaning)
Table 1. (continued).
1.4. The Purpose of the Study
In my research, I chose one main topic in the secondary school curriculum: measure of
dispersion/variation. Unlike central measures such as mean, research on the notion of measure
of variation is rather limited. This is unfortunate since variation can be considered as one of
the basic concept of statistical thinking (Cobb & Moore, 1997, p. 801).
I conducted a small teaching experiment about the notion of variation and the measure
of dispersion (specifically standard deviation) in one school in a class of IPS students.
Another class of IPS students at the same school acted as a control group in which the usual
teacher-centred approach was applied. My research was in essence a case study, the results of
which depended much on the characteristics of the students in this particular school where
this experiment took place. This implies that at this stage of the research no easy
generalization of the results could be obtained. I designed the research study such that the
teaching experiment can be repeated in other regular classes of SMA in Indonesian schools in
order to obtain in future more general results.
7
In my research, I investigated and compared students‟ statistical reasoning about
variation prior and after teaching the topic of variation and measure of dispersion. One of my
other objectives was that the results would lead to recommendations to mathematics teachers
regarding statistics teaching and learning, and hopefully would have a positive effect on
statistics teaching in Indonesia.
8
9
2. Theoretical background In my study, I wanted to investigate to what extent my teaching and learning approach
affected students‟ statistical reasoning, especially reasoning about variation. In this section, I
present a definition and assessment framework of statistical thinking and reasoning taken
from research literature and that I used for my research. I also summarize suggestions from
research literature regarding teaching statistics, in particular teaching about variation, that
were taken into account in my study.
2.1. Statistical Literacy, Statistical Reasoning and Statistical Thinking
Research in statistics education in the last two decades has changed direction from research
on misconceptions about statistical concepts or ideas into research focusing on how students
learn and reason about statistical concepts (Shaughnessy, 2007). Many researchers in the field
of statistics education classify this research into three big ideas: statistical literacy, statistical
reasoning, and statistical thinking. There is no formally agreed definition yet of these three
ideas. Ben-Zvi & Garfield (2004, p. 7) defined them as follows:
“Statistical literacy includes basic and important skills that may be used in under-
standing statistical information or research results. These skills include being able to
organize data, construct and display tables, and work with different representations of
data. Statistical literacy also includes an understanding of the concepts, vocabulary,
and symbols, and includes an understanding of probability as a measure of uncer-
tainty.
Statistical reasoning may be defined as the way people reason with statistical
ideas and make sense of statistical information. This involves making interpretation
based on sets of data, representation of data, or statistical summary of data. Statistical
reasoning may involve connecting one concept and another (e.g., center and spread),
or it may combine ideas about data and chance. Reasoning means understanding and
being able to explain statistical processes and being able to fully interpret statistical
results.
Statistical thinking involves an understanding of why and how statistical investiga-
tions are conducted and the „big ideas‟ that underlie statistical investigations. These
ideas include the omnipresence nature of variation and when and how to use appro-
priate methods of data analysis such as numerical summaries and visual display of
data. Statistical thinking involves an understanding of the natures of sampling, how
we make inferences from samples to populations, and why design experiments are
needed in order to establish causation…”
Chance (2002) reviewed the literature on the definition of statistical thinking and
concluded that: “Perhaps what is unique to statistical thinking, beyond reasoning and literacy, is the
ability to see the process as a whole (with iteration), including „why,‟ to understand
the relationship and meaning of variation in this process, to have the ability to
explore data in ways beyond what has been prescribed in texts, and to generate new
questions beyond those asked by the principal investigator. While literacy can be
10
narrowly viewed as understanding and interpreting statistical information presented,
for example in the media, and reasoning can be narrowly viewed as working through
the tools and concepts learned in the course, the statistical thinker is able to move
beyond what is taught in the course, to spontaneously question and investigate the
issues and data involved in a specific context.”
Mooney (2002) adopted the definition of statistical thinking from (Shaughnessy,
Garfield, & Greer, 1996) where it means the cognitive actions that students engage in during
the data-handling processes of describing, organizing and reducing, representing, and ana-
lyzing and interpreting data.
From the above definitions, it seemed to me that the definition of statistical literacy,
reasoning and thinking are not mutually exclusive. By statistical reasoning about variation, I
mean the way people reason with variation (as Ben-Zvi and Garfield defined this) and the
way people make use of the concept of variation to investigate issues and data (as Chance
defined statistical thinking). However, to be honest, taking the review of Chance (2002) into
deeper account and carefully reading of Ben-Zvi & Garfield‟s (2004) definition of statistical
thinking, led me to the conclusion that the students in my research project were not much
involved in such type of activity, that is, they did not have to learn about nor carry out a
statistical inquiry. In my teaching experiment I engaged students in data exploration activities
that mainly involved statistical literacy and statistical reasoning in the sense of Ben-Zvi and
Garfield (2004), the two ideas which in my point of views constitute Mooney‟s (2002)
definition of statistical thinking. To avoid confusion in terminology, I prefer in this thesis to
adopt the distinction between literacy and reasoning. When I refer to Mooney‟s statistical
thinking I understand it mostly as statistical reasoning.
2.2. Assessment of Statistical Reasoning
Because I wanted to investigate in my research whether my teaching and learning
approach would improve students‟ reasoning, I searched the literature to find an appropriate
assessment tool, preferably one suitable for classroom practice (i.e., not only suitable in a
small group setting or laboratory setting, and furthermore easy to use by teachers in their
practice). The following subsections are about the assessment framework of statistical
thinking and reasoning from existing literature that I selected for use.
2.2.1. The Structure of the Observed Learning Outcome (SOLO) Taxonomy
The SOLO taxonomy is a neo-Piagetian framework proposed by Biggs and Collis (1982) to
analyze the complexity level at which students carry out tasks and answer questions. The
SOLO taxonomy is not specifically designed for statistics or mathematics, but I discovered
11
that many researchers in statistics education use this taxonomy for characterizing and
assessing the students‟ statistical thinking and reasoning. For this reason I briefly review the
SOLO taxonomy; further details about the general model can be found in Biggs & Collis
(1982) and for its application in statistics education I refer to (Jones et al., 2004; Shaughnessy
et al., 2007) and references therein.
The SOLO taxonomy posits five modes of functioning (similar to Piaget‟s
development stages: preoperational, early concrete, middle concrete, concrete generalization,
and formal operation) and five hierarchical levels of complexity at which tasks can be carried
out in principle (prestructural, unistructural, multistructural, relational, and extended abstract).
The development stage sets mainly the upper limit for the cognitive level that can be reached,
but this does not mean that at thus stage of functioning lower levels of complexity cannot be
observed anymore.
The five levels of complexity of students‟ responses to tasks, usually referred to as the
SOLO levels, are as follows: at the prestructural level, a student avoids the question (denial),
repeats the question (tautology), or engages in the task but is distracted or misled by irrelevant
aspects belonging to an earlier mode of functioning. For the unistructural level, the student
focuses on the relevant domain and picks up on one relevant aspect of the task, running in this
way the risk to come to a limited conclusion or a dogmatic answer only. At the multistructural
level, the student picks up several disjoint and relevant aspects of the task but does not
integrate them and ignores inconsistencies or conflicts in the provided information. At the
relational level, the student integrates the various aspects and produces a more coherent
understanding of the task. At the extended abstract level, the student recognizes that a given
example is an instance of a more general case, that is, (s)he generalizes the structure to take in
new and more abstract features that represent thinking in a higher mode of functioning. As
noted already by Biggs and Collis, only the first four cognitive levels are encountered up to
and including secondary education; one hardly notices the extended abstract level in
classroom practice.
Biggs & Collis (1982) described certain crucial characteristics of each SOLO level in
terms of the dimensions of capacity (the required amount of working memory or attention
span), relating operation (between cue and response), and consistency and closure (no contra-
dictions in the final conclusion). See Table 2, taken from Biggs & Collis (1982, p. 24-25).
12
Developmental
Base Stage with
Minimal Age
SOLO Description
Capacity Relating Operation Consistency and Closure
Extended Abstract
(16+ years)
Extended
Abstract
Maximal: cue +
relevant data +
interrelations +
hypotheses
Deduction and Induction. Can
generalize to situations not
experienced
Inconsistencies resolved. No felt need to give
closed decisions-conclusions held open, or
qualified to allow logically possible
alternatives.
Concrete
generalization
(13-15 years)
Relational High: cue + relevant
data + interrelations
Induction. Can generalize
within given or experienced
context using related aspects
NO inconsistency within the given system, but
since closure is unique so inconsistencies may
occur when he goes outside the system.
Middle Concrete
(10-12 years) Multistructural
Medium: cue +
isolated relevant data
Can “generalize” only in terms
of a few limited and
independent aspects
Although has feeling for consistency can be
inconsistent because closes too soon on basis of
isolated fixations on data, and so can come to
different conclusions with the same data
Early Concrete
(7-9 years) Unistructural
Low: cue + one
relevant datum
Can “generalize” only in terms
of one aspect.
No felt need for consistency, thus closes too
quickly; jumps to conclusions on one aspect,
and so can be very inconsistent.
Pre-operational
(4-6 years) Prestructural
Minimal: cue and
response confused
Denial, Tautology,
transduction. Bound to specifics
No felt need for consistency. Closes without
even seeing the problem.
Table 2. Base stage of cognitive development and response description according to the SOLO Taxonomy (note that the SOLO description in the 2nd
column refers to the
maximum level at the given developmental stage in the corresponding entry in the 1st column).
13
2.2.2. Statistical Thinking
There are several studies in which the main goal was to develop a framework for characteriz-
ing and assessing statistical thinking (in the sense of Mooney, 2002). Below, I will discuss
three of them.
Jones et al. (2000) developed a framework for characterizing elementary children‟s
statistical thinking situated in the SOLO taxonomy. They focused on data handling and used
the following four constructs in their framework were: (1) describing; (2) organizing and
reducing; (3) representing; and (4) analyzing and interpreting data. To characterize students‟
statistical thinking in each of these constructs, they used four levels corresponding with the
first four levels in the SOLO taxonomy:
1) Idiosyncratic: idiosyncratic students are engaged in a task but they are easily distracted or
misled by irrelevant aspects;
2) Transitional : students focus on a single relevant aspect of a task;
3) Quantitative: students can focus on multiple relevant aspects of the task but have
problems in integrating them;
4) Analytical: students are able to make links between different aspects of the task
(demonstrate relational level of thinking).
Jones et al. (2000) conducted their study by analyzing the interviews of sixteen students
(grade 2-5) who responded to several data handling tasks with questions for every construct.
Statistics concepts like average, spread were probed at elementary level and the way children
would work with basic data displays like bar graphs.
Mooney (2002) used the framework of Jones et al. (2000) as his initial framework to
assess middle school students‟ statistical thinking in data handling tasks and extended it with
another level of statistical thinking: Extended Analytical, meaning that students can examine
data from more than one perspective. However, Mooney did not find data that support the
existence of the fifth level in middle school students and thus also used the four levels of
statistical thinking above in his result. The final framework Mooney (2002) is reproduced in
Figure 2. It was actually used for statistics concepts of measures of centre and spread. In this
study, I took an eclectic approach and selected suitable parts of Mooney‟s framework to
assess my students‟ statistical reasoning (see Chapter 4).
14
Figure 2. Mooney‟s framework of middle school students‟statistical thinking.
15
Groth (2003) sought a framework for describing high school students‟ statistical
thinking, when it comes to describing, organizing and reducing, representing, analyzing and
collecting data. Groth conducted a qualitative study to find out characteristics or patterns for
the four constructs that were used by Mooney (2002) and Jones et al. (2000). He developed a
set of statistical thinking tasks and used it in structured, task-based clinical interview sessions
with high school students and recent high school graduates. Students were asked to solve
these statistical thinking tasks and then the students‟ responses were analyzed to define
patterns of responses to questions regarding processes of data handling, applying the SOLO
taxonomy. In my study, I used a modification of Question 8 from his fifth task (Question 7 in
my pretest, see p.25), which was part of a set of questions Groth used to probe students‟
understanding about summarizing data through a measure of centre and measure of spread.
The pattern descriptors for using measures of centre and spread that Groth (2003, p.85, 90)
identified are listed in Table 3.
Four Pattern Descriptors for Using Measures of Centre
A student uses:
1. reasonable formal measures to locate centres of data sets
2. a combination of reasonable formal and visual measures to locate centres of data sets.
3. a combination of formal and visual measures to find centres of data sets, only some of which are reasonable for the given set of data
4. only visual approaches to find centres of data sets, only some of which are reasonable for
the given sets of data.
Three Pattern Descriptors for Using Measure of Spread
A student gives:
1. quantifications and subjective verbal descriptions of spread that are suitable for given sets of data.
2. quantifications and subjective verbal descriptions of spread. Some descriptions or quantifications are not suitable for given sets of data
3. subjective verbal descriptions of spread rather than quantifications
Table 3. Groth‟s pattern descriptors for using measures of centre and spread.
2.2.3. Statistical Reasoning about Variation
The term „variation‟ and „variability‟ are often used interchangeably in research literature (cf.,
Shaughnessy, 2007; Reading & Shaughnessy, 2004). However, Reading & Shaughnessy
(2004, p. 202) made the following distinction between the two terms:
16
“The term variability will be taken to mean the characteristics of the entity that is
observable, and the term variation to mean the describing or measuring that
characteristics. Consequently, … „reasoning about variation‟ will deal with the
cognitive process involved in describing the observed phenomena in situations that
exhibit variability, or the propensity to change. ”
In my study, I adopted the above definition of Reading & Shaughnessy (2004) of statistical
reasoning about variation. But it is noted that in the Indonesian language there is no word for
variation in this sense; only words such as variability, diversity and variety exist in every day
speech. In mathematics textbooks the term variability is also used for variation (in the sense
of Reading & Shaughnessy). Therefore it may happen that I use the words terms inter-
changeably.
In this section I briefly discuss literatures on reasoning about variation. To begin with,
Watson et al. (2003) conducted a study to measure understanding of variation in a chance
setting. They gave questionnaires to students in grades 3, 5, 7, and 9, and from the results of
their analysis initially based on the SOLO taxonomy, they defined four ability levels of
students‟ understanding of variation:
1) prerequisites for variation: working out the environment, table/graph reading,
intuitive reasoning for chance;
2) partial recognition of variation: putting ideas in context, tendency to focus on single
aspects and neglect others;
3) applications of variations: consolidating and using ideas in context, inconsistent in
picking salient features
4) critical aspects of variation: employing complex justification or critical reasoning.
Watson et al. (2003) did not give clear descriptors for each of these levels, but it seemed to
me that these four levels are equivalent to Mooney‟s four levels of statistical thinking (for
example, compare Mooney‟s transitional level with Watson et al.‟s level of partial recogni-
tion of variation). As mentioned in Subsection 2.2.2, I used part of Mooney‟s framework for
assessing students‟ reasoning about variation (see Chapter 4). My teaching design was also
about understanding measures of variation (range, interquartile range, average deviation, and
standard deviation), but I wanted to investigate students‟ reasoning about variation without
specific connection to chance processes or sampling. Nevertheless the above descriptors of
students‟ understanding of variation are of interest to me, but alas too general to apply fruit-
fully in my research. As mentioned in Subsection 2.2.2, I used part of Mooney‟s framework
for assessing students‟ reasoning about variation (see Chapter 4).
17
In addition to the above results of Watson and coworkers, Garfield & Ben-Zvi (2005,
pp. 93-95) identified seven areas of knowledge of variability and seven corresponding assess-
ment areas. The seven areas of knowledge are:
1) developing intuitive ideas of variability;
2) describing and representing variability;
3) using variability to make comparisons;
4) recognizing variability in special type of distributions;
5) identifying patterns of variability in fitting models;
6) using variability to predict random samples or outcomes;
7) considering variability as part of statistical thinking.
I used these areas of knowledge as a guidance to develop my assessment test. However, Areas
4-7 are not relevant to my study, which was situated in Indonesian curriculum. Thus I only
used the first three areas in this framework. For reasons of completeness and possible interest
of readers, the assessment items for all areas are presented in Appendix A.
2.3. Teaching Variation
In this section I present what I consider as the most relevant research literature about teaching
variation. This includes a summary of suggestions made in various studies about teaching
variation in various contexts. At the end of this section I list the principles that I incorporated
in my designed activities.
2.3.1. Conceptions about Variability
Shaughnessy (2007) summarized students‟ conceptions of variability into eight types:
(1) variability in particular values, including extremes or outliers;
(2) variability as change over time;
(3) variability as whole range (the spread) of all possible values;
(4) variability as the likely range of a sample;
(5) variability as distance or difference from some fixed point;
(6) variability as the sum of residuals;
(7) variation as covariation or association;
(8) variation as distribution.
These conceptions resemble the areas of knowledge about variability in the framework of
Garfield & Ben-Zvi (2005), which was discussed in Subsection 2.2.2. In my study I focused
on getting the students so far to develop the conception of variability as type 1, 3, 5, and 8.
18
2.3.2. Suggestions from Research Studies about Contexts
In the statistics education literature one can find several contexts to teach variability. Garfield
(2008, pp. 205-206) reviewed contexts used in mostly exploratory and qualitative research.
Below are several contexts that Garfield (2008, pp. 205-206) found:
measurement and natural context: students investigate the variety in height of plants
by measurement activities and comparing distribution of plants;
a „growing sample‟: students reason what happens to graphs if the sample size gets
larger;
measurement of minutes per day spent on various activities, e.g., time spent on study-
ing or talking in the phone: students make conjectures; students reason informally
about the distribution;
variability in data;
bivariate relationships;
comparing variability within and between data sets;
standard deviation and histogram: students explore the concept of standard deviation
through comparing histograms;
probability contexts;
sampling contexts.
As I have mentioned before, I decided not to use the context of probability and sampling for
my study. The focus in the statistics unit selected in my study was descriptive statistics and
thus the context „bivariate relationships‟, for example, was not suitable. The context
„comparing data‟ is a context I would like to have used, however not as the first introduction
to variability. In comparing data sets, there are at least two types of comparisons:
1. reading between data sets;
2. reading beyond data sets.
Curcio (1989, p. 384) defined that in reading between data, one makes comparison and use
mathematical concepts and skills. In reading beyond data, one makes extension, prediction
and inferences. As reasoning about variation between data needs higher level of statistical
thinking then of variation within groups (Mooney 2002; Jones, 2000), I decided to use the
context „comparing variability within data sets‟, meaning analyzing one data set, to introduce
measures of dispersion and hereafter to use „comparing variability between data sets‟.
19
2.3.3. Principles Underpinning the Design of My Lesson Activities
I originally planned to use the following principles in designing my activities:
1. Use ICT to change students‟ computational efforts into reasoning efforts.
The use of ICT enables the use of real and large data set (Reading & Shaugnessy, 2004,
p. 223) and helps students to visualize and explore data (Garvield & Ben-Zvi, 2007). I
considered a statistical software package or simply a calculator to ease the computational
efforts of my students. However, it is noted that the use of ICT in statistics education is
not the main focus of my research.
2. Foster the student‟s integration of concepts of central measures and variability during
data exploration (Reading & Shaughnessy, 2004, p. 223; Shaugnessy, 2007, p. 1002).
This is particularly important in my setting because textbooks separate four concepts of
statistics in Indonesian curriculum. Furthermore, shape of a distribution should also be in
a lesson package with centre and spread because “a brief description of a distribution
should include its shape and numbers describing its center and spread” (Moore &
McCabe, 2005, p. 40).
3. Discuss variability in various different contexts and using different questions. (Reading &
Shaugnessy, 2004, p. 223).
4. Use real data comparison to reason about variability (See, for example, Garfield & Ben-
Zvi, 2005; Konold & Pollatsek, 2002; Shaughnessy, 2007).
5. Link the following two kinds of measures of variation: range (range and interquartile
range) and deviation from mean (average deviation and standard deviation).
6. Combine student-centred teaching and lecture-based teaching.
The main approach of my teaching design was a constructivist approach. However, I will
combine it with the typical (but short) lecture plus exercise at the end, to make sure that
the formal measures were discussed.
20
21
3. Research Design and Methodology
In this chapter I explain my research question, research instruments and the methods I applied
in data collecting and data analysis.
3.1 Research Question
The traditional statistics teaching approach in Indonesia is usually the combination of
lecturing the statistical ideas and practicing with computational and procedural problems.
There are exercises that ask for interpretation, but the expected answers are just surface-deep.
This makes it possible and plausible that students use memorization of an acceptable answer,
e.g., the meaning of high standard deviation, as a successful strategy. This leads to a limited
understanding of statistical concepts. In this study, I designed a short teaching experiment
about variation, tried it out in a secondary school and then investigated how it helped the
students to develop statistical reasoning about variation. One goal was to check the effective-
ness of the approach or in other words, I wanted to answer the following question:
The teaching approach has three new characteristics:
student-centredness
use of real data
use of open-ended tasks and group work
Moreover, the nature of the teaching was also intended to be social constructivist because, in
solving the open-ended tasks, students were working in groups and their work was discussed
in the classroom. These components are rarely employed in statistics/mathematics education
in Indonesia.
I hypothesized that the student-centred teaching and learning approach would help
students reason better in statistics, compared to the traditional approach of using artificial data
and closed tasks. I investigated whether this hypothesis was true for IPS students in my case
study, who mostly had motivational problems and had not performed very well in mathemat-
ics. From this investigation, I expected to come up with recommendations for a teaching
approach in statistics education in Indonesia. Therefore my reflection question was the
following:
Research Question
To what extent did the student-centred teaching of variation using real data
and open-ended tasks help to improve Indonesian social science stream (IPS)
students‟ reasoning about variation?
22
3.2 Research Setting and Research Methodology
To answer the research question, I conducted a study in a secondary school classroom in a
rural area of Indonesia with students of the social science stream (IPS). A pretest-posttest
control-experimental group design was used to investigate the improvement of the students‟
statistical reasoning in the treatment. The original plan was that out of two parallel classes,
one class would be randomly selected as an experimental group and that the other would be-
come the control group. In reality, I had to accept in the end that one teacher did not want to
be the teacher of the control group. Therefore, this teacher‟s class was set as the experimental
group and the class of another teacher as the control group. I taught the students of the
experimental group with the new approach and the regular teacher taught the control group
following the regular program without my intervention. Before and after the teaching, pre-
and posttests were given to both classes. In the following, I explain the school setting and
describe in chronological order the methods that I used for data collection and data analysis.
3.2.1. The School Setting
The research was conducted in a public secondary school Sekolah Menengah Atas Negeri
(SMAN) No. 1 Lebong Tengah, Lebong, Bengkulu, Indonesia. The school had two parallel
grade 11 social science stream (IPS) classes and each class had 40 students and was taught by
different teachers. These two classes were used in the research.
The secondary school can be considered to belong to the better schools in Lebong,
although if compared to Bengkulu city, the capital of the province, this school would be
considered weak because the teaching and learning facilities are meagre. The computer lab
does not have enough computers for 40 students, the usual number of students per class, and
thus the subject „Information and Technology‟ is commonly taught and learned theoretically.
The science lab is present, but in reality can be considered non-existing due to insufficient
facilities.
Lebong is a mountainous, but not so prosperous agricultural area. It is a relatively new
district and thus still deals with shortage of teachers, especially mathematics teachers. I
observed that students coming to this secondary school apparently had not received good
prior mathematics education as their mathematics skills were not of what graduates of lower
secondary school should have mastered.
Reflection Question
What recommendations for the teaching of measures of variation in Indonesian
secondary school curriculum followed from the teaching experiment?
23
3.2.2. Research Methodology
Below I summarize the methods that I applied for data collection and data analysis.
Collecting Data
a. Classroom observations prior to the teaching experiment
The classes were taught by different teachers. In this phase, I talked to both teachers
about their students and checked to see if the students of the two classes were
comparable. I observed the two parallel classes before I started teaching the new
approach. The aim was to confirm that both classes were taught in the traditional
approach and to ensure that the prerequisite knowledge had been taught. It also could
give me the impression of the students‟ attitudes and performance. Furthermore, I
collected the semester reports of the students from the previous semester for later purpose
of an independent samples t-test to compare their ability in mathematics and language.
b. The teaching experiment and the lessons of the control group
I taught the experimental group and its regular teacher acted as an observer. The control
group was taught by the regular teacher and I acted as an observer. Both groups had the
same amount of time (4 lessons). I asked the regular teacher of the experimental group to
observe my teaching and to write an observation note after each lesson. I also talked to
the teacher after each lesson about how things went according to each of us. However,
the teacher did not have time to write her observation so I used our talks and final
interview as my data.
c. Pretest and posttest
Students were given a pretest before the teacher of the control group and I started to
teach variation. The same test was given as a posttest after the teaching was completed.
The idea was that this would give information about students‟ improvements of the
students in the both groups.
d. Giving the questionnaire
e. Doing the interview
Analyzing Data
a. Checking the comparability of the experimental and control group
I used a two-tailed independent samples t-test of the students‟ mathematics and language
marks from their semester report to check if there was an indication of different ability
levels between the two participating groups or that they could indeed be considered as
groups of students with similar competencies.
24
b. Investigating students‟ statistical reasoning
I analyzed the results from the pre- and posttests and the interview. I checked all students‟
written reasoning from the experimental group to get an overview of their responses. I
used this overview together with the theoretical framework to create descriptors of
identified categories for the reasoning in students‟ answers. The same descriptors were
used for the written responses of the students in the control group.
c. Checking students‟ improvement in statistical reasoning
I analyzed quantitatively whether there was an indication of improvement from the
pretest to the posttest in the two groups and compared the improvement between two
groups. If possible, I used a t-test implemented in the Minitab 15 software to analyze the
reasoning improvement quantitatively.
d. Analyzing all data to answer the research question
The class observation, the videotape of the teaching, the talk with the teachers, and the
students‟ questionnaire and interviews were analyzed to illuminate the findings from pre
and posttest and to answer the research question. Together with my journal of my own
teaching experiment, I tried to reflect on the strengths and weaknesses of my teaching
approach and formulated suggestions and recommendations for future use in the statistics
teaching in Indonesian secondary school classrooms.
3.2.3. The Teaching Materials
In accordance with the research-based principles I had chosen (see Subsection 2.3.3), I
designed a teaching experiment consisting of three activities, each 90-minute long. I selected
human growth as the first learning context and I used the original, raw data set of 434 subjects
in Jakarta collected in a recent growth study in Indonesia (Batubara et al., 2006). This raw
data set was reduced for the teaching experiment into the data of 20 boys and 20 girls of age
15-16 years. The context of human growth was used in the first two lessons.
In the first activity, students were given the reduced data of measured height and
weight of 20 boys and 20 girls of the same age (see Appendix B). Given this data set, the
students‟ tasks were to:
make histograms of the data and describe how the data were distributed;
develop a rule to decide whether a child‟s height and weight was: very common;
normal but need attention; and abnormal (and explain it);
make an easy visual aid to use and explain the rule to other people.
25
Students were to work together in groups of four and I encouraged the use of a basic (i.e., not
scientific1) calculator, as mentioned in Section 2.3.3 to save computational time during
students‟ data exploration. The aim of this activity was to provide a rich context for students
to intuitively think about deviation from the mean. Most people, including children have
conceptions and/or preferences about what height and weight they consider perfect. Thus I
thought this context could lead them to define a range of deviation from the perfect height and
weight that they have in mind when deciding what height and weight are normal or abnormal.
The concept of deviation from the mean was later discussed and brought to the formal
concept of standard deviation in a whole-class discussion.
The second lesson consisted of two parts: 2a and 2b. In the first part, I provided a
histogram of boys height and the task was to compute the mean and standard deviation of
boys height, based on the histogram. The students had learned how to approximate the mean
from grouped data prior to my intervention and I assumed that they would be able to find the
mean. For the standard deviation, the students and I should have discussed this formal
measure by the first lesson, so I expected students could transfer that formula for single-
valued data to grouped data (i.e., for a histogram). If they could not, I would help but the
experience of trying would presumably help them to understand the concept.
In part 2b, as planned (See Subsection 2.3.2), after comparing within groups, I asked
students to compare the height between two groups: height of boys in Jakarta and height of
boys in Bengkulu. Jakarta is the capital of Indonesia and Bengkulu is the province of the
place where I conducted my study. Given two histograms of the same range and bar width,
students were first asked to guess which group had larger standard deviation, without doing
any computation. Afterwards, they were asked to compute the value of the standard deviation
(the mean was given) and to check their previous response. The main task for students was to
answer the following question:
„Can you conclude that boys in Jakarta are taller that boys in
Bengkulu? Explain your reasons.‟
I wanted to check whether students applied some understanding of standard deviation in
comparing between the two groups of boys.
In the previous two activities, the mean as a central measure and standard deviation as
a measure of variation were the focus of the lesson activities. In the third activity, I focused
also on the median and interquartile range. The students were to compare data about hours
that they spent on several activities at home. These were collected prior the lesson; for this
purpose I had distributed a questionnaire (see Appendix B) and after return I had summarized
1 It is unlikely anyway for these students to own scientific calculators (see Subsection 3.3.1)
26
the data so that they could be used in the third lesson activity and compared with data that I
had collected before from another school in Bengkulu city. The students were to analyze
those summarized data. The task was for them to answer the following question:
“Who spends more time on activity X: students from Bengkulu city or
Lebong?2 Explain your reasons!”
There were data about hours spent on nine activities. The students chose one of these active-
ties as activity X to work on during the lesson. I had checked beforehand that for this kind of
data, the median and interquartile range were more appropriate measures than the mean and
standard deviation.
In summary, through all the activities, I aimed that students could learn to understand
and reason about the use of central measures and measures of variation in data analysis. In the
lesson activities, the students attempted to analyze real data and draw conclusions in two
contexts, which they could relate to. Therefore I hoped that the experience of analyzing real
data could promote understanding of measures of variation and foster statistical reasoning
about variation. Since the aim of my study is to check this hypothesis, I consider in this thesis
the teaching materials as part of my research setting.
3.3. Research Instruments
There were four main research instruments that I used to answer my research question and
reflection question: the pre- and posttest, a questionnaire, and interviews with the regular
teacher and some students of the experimental group.
3.3.1. The Pretest and Posttest
The pretest and posttest consisted of the same problem set in the same ordering (see Appendix
C). It consisted of two problems on the understanding of the term „variation‟ (referred to as
Subtest A), two computational problems that emphasize the computational skills (Subtest B),
and six more open-ended problems that emphasize the reasoning skills (Subtest C). From the
ten questions, only the two computational problems (subtest B) asked students explicitly to
find a measure of variation, in this case, standard deviation. For the questions in subtest C, I
expected students to consider variation by employing a measure of variation that they had
learned. Below, the questions are given in italics and afterwards the general purpose of each
question is explained.
2 Bengkulu is the capital city of Bengkulu Province and Lebong is a district in Bengkulu
Province (see Section Methodology for the research setting).
27
In the first question, I wanted to explore what students intuitively understand from the term
„variation‟ and whether their understanding before and after the teaching differed. The
teaching intervention itself did not include a formal definition of the term „variation‟. The
second question is also about students‟ understanding of what variation means in practical
cases. Knowing when to minimize or maximize variation is one of the signs of understanding
and statistical reasoning ability.
Subtest A
1. Based on your experience, what does variation mean to you? Give an explanation
and/or an example.
2. For each of the following cases, answer the following question: “Which is more
desirable: high variation or low variation?” Add your reason.
a. Age of trees in a national forest.
b. Diameter of new tires coming off one production line.
c. Scores on an aptitude test given to a large number of job applicants.
d. Weight of a box milk of the same brand.
Subtest B
3. Given the data: 11, 32, 17, 34, 24, 15, 28
For the data above, fill in the table below.
Range
Mean
Median
Standard Deviation
Interquartile Range
4. Below are the data of monthly income.
Monthly Income
( in hundred thousand rupiah )
Number of
People
3 – 5
6 – 8
9 – 11
12 – 14
15 – 17
3
4
9
6
2
The mean of the data above is ________.
The standard deviation is _______.
28
Question 3 and question 4 are related to computational skills. Since a comparison between
two different teaching approaches was attempted, it was fair to the control group to include
„traditional‟ questions in the pre- and posttest. Besides this, regarding my reflection question,
I hoped that the teaching intervention would not only improve the statistical reasoning skills,
but also the computational skills that are needed in the nationwide examination. Question 3
deals with single data and question 4 deals with grouped data.
One aspect of the teaching intervention was to teach variation by letting students analyze data
sets. In data analysis, looking at a graphical display of the data, especially in the form of
histograms, is important. Thus, the ability to read and interpret graphical displays, especially
histograms, was assessed through this question. In addition, this question assessed students‟
ability to describe the variation of a distribution.
Subtest C
5. Four histograms and two descriptions of data are displayed below.
i. A data set of test scores where the test was very easy
ii. A data set of wrist circumferences (measured in centimetres) of
newborn female babies.
a. Which histogram best matches the data in description (i)? Give your
reason.
b. Which histogram best matches the data in description (ii)? Give your
reason.
29
In question 6, I wanted to find out whether students use any measure of variation in
comparing two data sets. This question can be considered as a question that assesses the
ability to read beyond the data (in the sense of Curcio, 1987) in the process of analyzing and
interpreting data. Based on the two data sets, this question implicitly asked students to predict
the future performance of students A and B and thus I could verify whether students used any
consideration of variation in their reasoning.
Given a data set, students were asked in question 7 to make a summary and to draw a
conclusion. I wanted to find out what measures of centre and variation the students would use
to draw an informal inference and whether their views on the measure of centre were adjusted
when an outlier is present. This question can also be considered as a question to assess the
ability to make comparisons within a data.
7. One day Dedi caught a very big catfish from his rice field. He wanted to be
sure of the weight of the fish and therefore he weighed it 7 times on the same
scale/balance. Below are the measurements (in kilogram) that he found:
2.9; 2.7; 5.1; 3.1; 3.0; 2.8; 3.0 kg.
a. How spread out are the measurements he obtained?
b. How many kilograms do you think the true weight of the catfish was? Give
your reason.
6. Two students who took mathematics tests received the following scores (out of
100):
Student A: 60, 90, 80, 60, 80
Student B: 40, 100, 100, 40, 90
If you had an upcoming mathematics test, who would you prefer to be your
study partner, A or B? Why?
30
The third part of this question was indirectly assessing students‟ understanding of variation.
The prominent emphasis was an assessment of the students‟ knowledge of the statistics that
they have learned prior to the intervention, i.e., measure of centre, and whether this knowl-
edge has improved after intervention. Part a and b were explicitly about central measures.
However, in part c, students need to be able to decide which measure of centre can explain the
nature of the data, taking into account the variation or the shape of the graphical display.
Herein was the indirect assessment of any consideration on variation.
8. The below histogram shows the number of hours of exercising per week by
marketing staffs of a bank.
a. Compute the median. _______________
b. Compute the mean. ________________
c. Based on the histogram, how many hours do the staffs in this company
usually exercise per week? Give your reason.
31
9. Forty college students participated in a study of the effect of sleep on test scores. Twenty
of the students studied all night before the test in the following morning (no-sleep group
while the other 20 students (the control group) went to bed by 11.00 pm on the evening.
The test scores for each group are shown in the diagrams below. Each dot on the
diagram represents a particular student‟s score. For example, the two dots above the 80
in the bottom diagram indicate that two students in the sleep group scored 80 on the test.
• •••
•••
•••
•••
•••
• • • •
30 40 50 60 70 80 90 100
Test Scores: No-Sleep Group
• • • • •••
•••
•••
••
••
••
•
30 40 50 60 70 80 90 100
Test Scores: Sleep Group
Examine the two diagrams carefully. Which group is better: the sleep group or the no-
sleep group? Explain your reasons.
Then circle one of the 6 possible conclusions listed below that you mostly agree with.3
a. The no-sleep group did better because none of these students scored below 35 and
the highest score was achieved by a student in this group
b. The no-sleep group did better because its average appears to be a little higher
than the average of the sleep group.
c. There is no difference between the two groups because there is considerable
overlap in the scores of the two groups.
d. There is no difference between the two groups because the difference between their
averages is small compared to the amount of variation in the scores.
e. The sleep ground did better because more students in this group scored 80 or
above.
f. The sleep group did better because its average appears to be a little higher than
the average of the no-sleep group.
3 In the actual pre- and posttest, the multiple-choice part was formatted to be on the back page
of the open part.
32
In this question, I wanted to investigate whether students would use the combination of meas-
ure of centre and variation when comparing two data sets. The multiple choices show the
misunderstanding that students usually have, for example, paying attention either to the ex-
treme values only or the average. I wanted to test whether students realized that in comparing
two data sets, they need to consider not only central measures but also measures of variation.
Finally, in the last question, students were again asked to compare two data sets, in the form
of graphical displays, namely histograms. The ability to understand histograms was essential
here and I wanted to test whether the intervention improved this ability.
3.3.2. The Questionnaire
I used a questionnaire for all students in the experimental group (see Appendix D). The ques-
tionnaire was designed to look for students‟ opinion regarding the teaching intervention. The
questionnaire consisted of the following four parts:
(i) the use of real data (Questions 1-3);
(ii) group work (Questions 4-7);
(iii) the teaching approach (Question 8);
(iv) feedback about the lesson (Questions 9-10).
10. Below is the histogram of the scores of a mathematics test in two classes.
Scores
9585756555
Class A
Fre
qu
en
cy o
f sco
res
24
21
18
15
12
9
6
3
0
Scores
9585756555
Class B
Fre
qu
en
cy o
f S
co
res
24
21
18
15
12
9
6
3
0
a. Comparing the two histograms, one could infer
i. Variability of scores in Class A is higher variability than in class B.
(The scores in class A vary more than the scores in class B)
ii. Variability of scores in Class B is higher than in class A (The scores in
class A vary more than the scores in class B)
iii. Class A and class B have equal variability.
iv. I don‟t know.
b. Why? Give your reason.
33
3.3.3. The Interview
I interviewed the regular teacher of the experimental group to get feedback about the teaching
experiment. I also interviewed several students from the experimental group. I intended to
interview students of the control group, but due to time constraints this was not possible. The
plan was to explore in more depth the answers from several students. I chose a number of
students of different range of abilities from the experimental group, helped by the regular
teacher in selecting, and I went through the answers to the pre- and posttests. I tried to get a
better impression of the reasoning behind their answers.
34
35
4. Results and Analysis of the Teaching Experiment
I present the results related to the classroom observation before the experiment, the teaching
experiment itself, and the feedback about it from the questionnaire and the interview with the
regular teacher of the experimental group. The data collection was conducted in about 4-5
weeks, from the second week of November 2009 to the second week of December 2009.
Below is the timeline of the teaching experiment.
Date Activities
2nd week of November Observation of the regular teachings in the experimental
and control group
November 18, 2009
November 21, 2009
Pretest of the experimental group
Pretest of the control group
November 20 – December 1 2009
November 26 –December 2 2009
Teaching of variation in the experimental group
Teaching of variation in the control group
December 2, 2009
December 3, 2009
Posttest in the experimental group
Posttest in the control group
1st - 2nd week of December Interviews with the students and teachers
Table 4. Timeline of the experiment.
4.1. Classroom Observations Prior to the Teaching Experiment
One week before the teaching experiment, I talked to the teachers from the control and
experimental group. I discussed the students‟ mathematical ability and the topics that had
been taught so far. I also discussed my planned activities and the pretest material, but only
with the teacher of the experimental group to keep the control group‟s teaching from being
influenced.
Regarding the mathematics topics that the teachers had taught, the experimental group
had almost finished the learning of central measures while the control group was behind. It
turned out that both teachers had not taught histograms in the lessons about data
representation. I then asked them both to teach histograms prior to my lesson about variation.
This also gave me a chance to observe their style of teaching. I observed two lessons and one
of those was about learning of histograms.
From my observations, I concluded that both teachers taught in a teacher-centred
approach. The main teaching activities were cycles of:
lectures about how to construct a histogram (procedural knowledge);
working out examples;
giving students exercises and/or homework.
Students were listening without observable active participation and replicating the examples
in exercises. However, I noticed in my second observation of the experimental group that the
teacher showed a slight change of approach when she lectured. She tried to engage her stu-
36
dents by giving questions before presenting the procedural knowledge. However, it is noted
that this second observation happened after our discussion about the planned activities and the
pretest material, when I explained my approach and aims. It seemed that our discussion had a
small influence on her teaching approach. She showed interest in more student involvement.
Regarding the students‟ mathematical ability, students from both groups had been
taught by the current teacher of the control group in the previous academic year, grade 10.
Students‟ selection of streams was primarily decided by their performance in mathematics and
science subjects. Thus, the students in my study, the social science (IPS) students, both had
not performed well in mathematics and science in grade 10. I conducted two two-tailed
independent samples t-tests to evaluate both the students‟ mathematics grades and their
Bahasa (language) grades. In both tests the null hypothesis was that there was no difference in
mean scores of grades of students in the experiment and control group. Regarding both
subjects, the null hypothesis was supported (t = 0.306, df = 76, two-tailed p-value = 0.761 for
mathematics, and t = 1.828, df = 76, two-tailed p-value = 0.071 for Bahasa). These results
indicated that the students of the experimental group and the control group did not differ
significantly regarding their ability in mathematics and language.
4.2. The Teaching Experiment
In the original plan, I had planned to give the students three activities within three lessons (90
minutes each). These three lessons were designed to teach measures of variation, namely
standard deviation and the interquartile range (see Section 3.2.3). However, it turned out that
the students needed more time to finish the first two planned activities and I ended up
teaching the first two activities, which focused on the concept of standard deviation, in four
lessons (90 minutes each). There was no time for the third activity. Therefore, the concept of
interquartile range was introduced at the end of the last lesson through my explanation in a
discussion session in the end of the last lesson.
Before I give an impression of how the lessons went, I first want to describe the nature
of group work in the class. The students were divided into 10 groups of 4 students each. I
asked the students to arrange their seating prior to the beginning of the class. This was
possible because the mathematics lesson was the first lesson in the day so that students could
easily arrange their seating in few minutes before the school started (in this way no time was
wasted in moving chairs and desks) and because the classroom was large enough for a
different arrangement of desks and chairs. Figure 3 shows the seating arrangement in the
37
classroom. Students are normally4 always seated in the same seats in the classroom. The
groups were formed only based on these seating positions. Two adjacent rows of pairs
became one group (see Figure 2).
Figure 3. Picture of the classroom with students working in groups of four.
The group work did not go as well as I had hoped for. I did not observe lively
discussion or argumentation within groups. Students tended to work individually and only
consulted their group mates if they had difficulties or to check their answer. My design to
give all students their own worksheet unfortunately gave them this freedom to work
individually. I urged them to discuss first within groups what to do with the questions or at
least divide the jobs and discuss in pairs. In the third lesson, the observing teacher told me
that the group work started to improve according to her. However, I still observed not many
discussions within groups. The group on the right-hand side of the picture shown in Figure 3
(only two members are visible) was one group that was functioning a little better as a team. I
tried to use the video recording of the teaching for analyzing discussions within groups, but
alas the audio quality was not good enough to get much detailed information out of it.
Lesson 1: Activity 1
According to the lesson plan, students were expected to do three tasks. The first task was to
draw histograms (see Section 3.2.3). In the realization of the lesson, I decided to reduce the
4 Sometimes some homeroom teachers decide to rotate students‟ seating few times during the
year, but generally students sit in the same seats.
38
students‟ tasks because only few students had calculators and most students were slow in
computing. I decided to omit the third question in Activity 1 and the students had only to
work on one data set (boys data or girls data), instead of both. This was to save time without
sacrificing the purpose of the designed lesson.
Introduction: talking about data creation
The lesson started with an introduction of the context of human growth. I wanted to engage
students with the context of the real data they were about to work with by talking about how
the data were created. The intention was that by understanding well how the real data were
created, students would be engaged more in the later activities and could reason using the
knowledge about the context. I asked questions, for example about the growth measurements
of the students‟ siblings that they had seen, about why the students thought we need to do
growth measurements, or about students‟ idea of the detail of how the growth measurements
were done. Unfortunately, I observed a lack of responses from students in this introductory
session and my introduction became more of a short lecture. It seemed that the students were
unfamiliar with the context of human growth measurements. Later after the lesson, the teacher
confirmed this.
Task a: Making a histogram and describing the shape of the data
I distributed the activity sheets to the students and asked them to read it first and to ask me if
there was anything they did not understand. I then explained the tabular representation of the
data and asked them to start working. The first task was to draw histograms of data sets and to
describe the shape of the data. While students were working, I was walking around the class-
room and observed the students‟ progress.
I immediately saw that the students did not know how to construct the histogram albeit
they had studied it not long before my lesson. The students did not order nor group the data. I
asked one group a question whether histograms are used for individual or grouped data. The
group could not answer. Walking around and asking the same question, I noticed that the
students did not know what to do. One group even answered that the frequency would all be
one because there were no modes. Finally, I asked the whole class for attention and gave the
clue that they should group the data first.
After several minutes of letting them group the data without my instruction, I again
asked the whole class for attention and decided to lead a question-and-answer session5 on how
5 A question-and-answer session can be defined as an interactive lecture. I give students questions as a
scaffolding or as a hint to get students to the intended learning path.
39
to group the data. I used the height data of boys as an example. I tried to urge students to elicit
ideas on what interval (class width) we should use, by prompting: “how many centimetres of
height difference do you think are needed to be able to differentiate easily people‟s height?”
There was no response. I continued the question-and-answer session with stimulating
questions but still there was lack of responses and in the end I cut the knot to use 5 cm as the
interval. I finally continued on explaining how to draw histograms and let the students work
on it. After the first period (45 minutes) of the lesson, the progress was still very slow.
Bothered by time limits, after several more minutes to continue work, I urged the students to
start doing the second question in the activity as I was worried about the time constraint.6
In my expectation, this histogram part of the activity would be just an easy procedural
task and would not be time-consuming, especially because the use of a calculator was allowed
and I had reduced the tasks. I assumed that the related task on what could be said about the
shape of histograms was the more challenging part of this first question. However, it turned
out that the one lesson on histograms, which the students previously had gotten, was not
enough to be able to carry out the task or that the students perhaps were confused by the
different tabular representation of the data. I also had not anticipated that the students could
have difficulties with non-integer data, but unfortunately they had never dealt with non-
integer data before and this slowed down their pace. As a result of this mismatch between my
expectations about the students‟ ability levels and their real competences, I had to spend
significant amount of time out of this 90-minute lesson on teaching how to make histograms.
Task b: Finding a rule to decide normal and abnormal height and weight
Students also had difficulties with the second task of making a rule that allows to classify a
given height. I explained that abnormal height could mean that a person is too short or too
tall, and I asked them to play the role of a doctor and comment on given data. While walking
around, I saw groups deciding for each datum whether it was very common, still normal, or
abnormal. However, when I asked one group about the basis of their decision, they could not
answer.
At some point, I drew the whole-class attention and tried to help students by giving an
analogy in the context of Durian7 fruits‟ weight: “If you are a Durian seller, and you are
harvesting Durian, what will be your answer if I ask you the typical weight of Durian that is
considered big?” In this question-and-answer session, mode and average came up and I
6 This might not have been a good decision but I do not think it did much harm. Even if the students
could not finish all questions, I wanted them at least to try doing all tasks. 7 Durian is a popular seasonal fruit in Indonesia.
40
concluded that if the data are in small sizes, it is better to use the average. I pointed out that
the task was about the same thing. I urged them to try using statistics they had learned already
and connect it to the task. I found that my hints did not help much.
When the second period of the lesson was almost over, I decided to give more help.
The question-and-answer session was started by the following question: “If the common
height is the average, how do we decide what is too short and what is too tall?” The average
was computed and I wrote it on the whiteboard. I repeated the question and there was no
response. In general, the students gave very little response in the discussion. I tried to lead the
discussion to the concept of deviation from the mean. I wrote the minimum value, the
average, and the maximum value on the whiteboard and suggested to find ranges or intervals
for „too short‟ or „too tall‟. Two students gave an interval but when I asked the reason behind
the choice, they could not explain. I finally suggested that for deciding the rule we would use
the fact of how far a datum is away from the average. The teaching became more and more
teacher-centred. I introduced the formal notation for the distance of a datum from the mean
value and showed some examples of calculation.
Because the time was already over, I asked them to finish the histogram at home and
also to compute the distance of each datum from the average as homework. I ended the lesson
by answering the first question about the shape, centre and spread of the histogram.
What is interesting from research point of view is what rules students came up with in
this task. From the description above, most groups did not come up with any rule but there
were some groups who decided the normality for each datum. However, they were unable to
explain their reasoning or mechanism. I concluded that students were unable to make a
independently general rule for categorizing the data or that maybe they just had difficulty to
express in words any rules they had in mind.
Lesson 2: Activity 1 revisited
In the second lesson, I started with reintroducing the context of human growth. I felt that I had
not done it as elaborate as I had wanted to in the first lesson. Also, in this first lesson, many
students were late and interrupted my introduction and there was lack of responses. In this
second lesson, my teaching became more teacher-centred. I reintroduced the context in a
lecture way from the need of statistics to summarize data, sampling the population, the need
of visual summaries of the data and the purpose of a growth chart. After that, I let the students
do the second task again and limited the task to boys data only.
The lesson then continued with discussion of the students‟ results. Two groups drew
their histograms on the whiteboard (See Figure 4) and I led a question-and-answer session to
discuss their answers to the first question and the second question. I explained the use of
41
deviation from the mean as one way to make the rule. Based on the histogram on the
whiteboard we decided about proper intervals for very common height and abnormal height
(Figure 5, left-hand side). After this informal rule, I tried to bring the students to the formal
standard deviation by mentioning that we can use one number to describe the spread of the
data. I drew a table on the whiteboard and we computed deviation of each datum from the
mean (Figure 5, right-hand side). Students completed the table and through a question-and-
answer session we came to a formal formula of mean deviation and standard deviation. I
explained how the standard deviation can be used as a measure of spread. I also did a
question-and-answer session about quartiles, mentioning that in practice, quartiles are also
used in creating growth charts. To end the discussion, I distributed the Indonesian growth
chart from Batubara et al. (2006) and also the Dutch growth chart for fun comparison. I
explained how to read the growth chart. Finally, I gave student homework to find quartiles
and standard deviation of some data for the purpose of practicing procedural skills.
Figure 4. Students' histograms drawn on the whiteboard.
Figure 5. Intervals for the rule (left-hand side) and derivation of the formulas of standard deviation.
42
Lesson 3: Activity 2
I planned to start the third lesson by checking the students‟ homework. However, it turned out
that most of the students had not finished their homework. Therefore, to check their proce-
dural skills I made up a small-sized data set and asked them to find the standard deviation.
Later the results were checked. This was because, in accordance with one of my design prin-
ciples (Subsection 2.3.3), I wanted that the students could also master procedural knowledge.
As an introduction to the second activity, I reviewed the content of the previous lesson
and did a question-and-answer session with the students regarding interpretation of large or
small standard deviation of data. For example, I gave the case of the basketball players‟
height. I told the students that the average height of students of the basketball team in their
classroom is smaller than the average height of the basketball team from another school. I
then asked the students who would win the basketball match. Most students answered that the
team from the other school would win. I then showed a cartoon showing 4 players of about
the same height and one really tall player and asked if the students changed their mind about
who would win. They did and so I pointed out that it is not good enough to compare data sets
by the value of their means only.
Hereafter I distributed the activity sheets to work on. Students were working in groups
and I walked around to assist when needed. Unfortunately, students could not finish in the
allocated time and I decided to do the discussion in the next lesson.
Lesson 4: Activity 2
In this final lesson, I asked the students to present their results of Activity 2 and I tried to
encourage discussions. Answer sheets were hung up on the whiteboard and four groups pre-
sented their result (see Figure 6). Unfortunately, one group lost their answer sheet and could
not manage to present their results without it.
It was not so surprising that the students‟ answers were not very sophisticated. The
students did not explain much the reasoning behind their answers and this affected to some
extent how the discussion went. The discussion was not as lively as I had expected. Groups or
students rarely volunteered to comment. Although I tried to induce the discussion as much as
possible, in general, I felt that there was not so much discussion going on. I believe it was
partly because presenting and discussing were a new experience for the students in mathe-
matics classrooms and partly because of my limited experience in leading classroom discus-
sion. A more experienced teacher knowing the students in the classroom might do a better
job.
43
Figure 6. A group was presenting their answers.
In Activity 2, the students were comparing two data sets of height of boys from
Bengkulu and Jakarta. Despite my emphasis in the previous lesson that it is insufficient to
make a comparison based on the mean values only, the majority of students still based their
conclusions on the mean values only. In Table 5 I present the answers (some are quotations)
for the question of comparing two data sets (2.b.3 in the activity sheet): „Can you conclude
that Jakarta‟s boys are taller than Bengkulu‟s boys?‟.
Answer Number of
Groups
1.
2.
3.
4.
Yes, because the mean height of Jakarta‟s boys is bigger than that of
Bengkulu‟s boys
“No, because boys in Bengkulu are taller than (boys in) Jakarta and
the average in Bengkulu is more varied than the average (in) Jakarta”
No, because there were more boys in Bengkulu whose height was
160 cm.
“No, because there were no 16-year old boys in Jakarta whose height
was under 150 (cm) and none above 175 (cm)
5
1
2
1
Table 5. Students‟ reasoning in comparing two data sets.
Afterwards, when analyzing the students‟ answer sheets, I did not understand the
second answer in Table 5 and it did not come up in the presentation and discussions. On the
other hand, the third and the fourth answer in Table 5 are of the same type, in the sense that
the students counted the frequency of certain values and compared that frequency. This also
44
gave me the opportunity to comment on this type of answer during the discussion with
students. I got a chance to point out that the different size of the two data sets was a factor to
be considered here. I then introduced another measure of variation: the interquartile range,
which can deal with this issue.
In conclusion of the lesson, the students and I reviewed all formal measures of varia-
tion that had been studied, namely: range, mean deviation, standard deviation, and interquar-
tile range. I talked about sources of variability: natural variation or variation due to measure-
ment (I made several students measure height of flowers and discussed the results). We also
talked about high and low variation in data; an issue brought up by me by me was that some-
times higher variation is more desirable than low variation, depending on the context of data.
4.3. The Teaching in the Control Group
The students in the control group got the same period of teaching and learning, which was
four lessons of 90 minutes. The teacher has finished giving all the materials after the third
lesson and therefore the last lesson was dedicated for students to do more exercises on the
subject of variation. I observed the first three lessons and I did not go to the exercise lesson.
During my observations, I sat on the back of the classroom and did not interfere with the
teaching and learning process.
Based on my observations, I concluded that the teaching usually followed the same
sequence: the teacher lectured the materials, the students did exercise problems individually
and if the students were able to finish the exercises in time, the results were checked before
the class ended, and finally students were given homework. I also observed that students did
not engage in the teacher‟s questions. For example, in the first lesson during the introduction,
the teacher gave two small data sets of the same average and asked the students “Which data
set is better? Which data set is more stable?” The students did not volunteer to give answers.
Furthermore, the students did not engage well in the exercise session and most of the students
did not do their homework. The students did not finish their exercise problems, which were
computing standard deviation based on the formula that the teacher had given earlier. The
other lessons proceeded in more or less the same style.
From research point of view, how comparable were the students of the experimental
group and the students of the control group? There are never two classes that are exactly the
same, so I want to report here only about the main similarities and differences that I noticed.
First, the groups had different teachers and each teacher had his/her own style of teaching.
Both teachers‟ instructions were teacher-centred and to some extent their teaching style in my
opinion shaped the way the students behaved toward tasks. However, the differences in
behaviour of the students in the two classes were small. In general, I did observe the same
45
attitude toward mathematics learning in the two classes, but the students of the control group
were more careless about their learning. For example, there were more students who did not
do their exercise problems in the class. There were students who skipped the mathematics
lesson as well. The strictness of the teachers regarding tasks or homework completion might
play some role here. But also, the fact that the teacher of the experimental group was also the
homeroom teacher8 made a difference. In Indonesian schools, homeroom teachers played an
important rule to decide a student‟s advancement to the higher grade. Second, the experi-
mental group‟ mathematics classes were in the morning (the first two 45-minutes period of
school) while the control group‟s classes were later in the afternoon.
Having said these two facts, in general the students‟ motivation and behaviour were
comparable. And regarding the aspects of teaching that I studied, the control group‟s teaching
did not use real data, did not use open-ended questions, and did not use group work.
4.4. The Questionnaire
I gave a questionnaire, which can be found in Appendix D, to the students of the experimental
groups to learn about their opinions regarding the teaching intervention. It consisted of ten
items: seven statements with 5-point Likert scales; one statement for which students had to
mark one or more options; and two open ended questions for students‟ comments and feed-
back. The findings from the questionnaire are summarized by grouping the questions into the
four themes mentioned in Subsection 3.3.2.
The Use of Real Data
The first three items of the questionnaire concerned the use of real data. All three items were
designed to enable me to check my assumptions of the positive effect of using real data in
statistics lessons, instead of using artificial data. The assumptions were that the use of real
data in statistics lesson will make the learning more interesting and fun (Statement 1), easier
to understand (Statement 2), and application-oriented (Statement 3). The result showed that
more than 75% students agreed or strongly agreed with the three statements (see Table 6). It
is interesting to note that no students disagreed or strongly disagreed to the third question.
8 Homeroom teachers in Indonesian schools are teachers who usually know the most about the
students in their class. Their tasks include monitoring daily attendances, organizing students for school
events, writing the end of year/semester report, or dealing with behavioural problems and parents.
46
Statement
Number of Students
Strongly
Agree
Agree Neutral Disagree Strongly
Disagree
1 The use of real data makes learning
Statistics more interesting and fun
8 26 3 2 0
2 Analyzing real data makes it easier for
me to understand statistical concepts;
for example standard deviation.
5 24 5 5 0
3 The use of real data shows me the
application of statistics in real life. 9 25 5 0 0
Table 6. Result for Statements 1-3: the use of real data.
Group Work
Group work was an important part of the teaching in the experimental group and I wanted to
investigate how students felt about working in groups, considering that this was their first
experience with group work in mathematics class according to the teacher. As mentioned in
the description of the teaching intervention, group work did not run very well. Despite this,
from Table 7, in total 31 out of 39 students believed that they were active participants in the
group discussion (Statement 7). A positive attitude toward working in groups also can be seen
from students‟ responses to Statement 5 and 6: 36 students wrote that they liked working in
groups and 35 agreed that it helped them to understand the concepts.
One quite interesting result is about Statement 4. Based on the teacher‟s explanation,
the students were not used to group work in the mathematics lessons before, yet only 6
students disagree. I assume that this is due to the fact that the students were not considering
these statements in regards of the mathematics lesson only.
Statement
Number of Students
Strongly
Agree Agree Neutral Disagree
Strongly
Disagree
4 I am used to working in groups.
8 16 9 6 0
5 I like working in groups.
15 21 0 3 0
6 The group‟s discussion helps me
understanding statistical concepts.
13 22 1 3 0
7 I actively participate in contributing
ideas and in group discussions.
6 25 7 1 0
Table 7. Results for Statement 4-7: group work.
The Teaching Approach
Through this questionnaire, I tried to find out whether the new things I tried to incorporate in
my teaching intervention were indeed recognized by the students. The results are shown in
47
Table 8. Note that in the real form, I did not use any labelling for the statements. I label the
statements here with a-e for referencing purposes. Regarding the things that I believed were
quite obvious to see in the experimental lessons, such as choices a, b and c, more than 64%
students thought so too. For question d and e, I intended to confirm that there was some
degree of constructivist spirit in the teaching. Due to the difficulties I had in the lesson and the
improvised direction to more teacher-centred teaching that I undertook, I thought that perhaps
it would be a bit difficult for students to see these two aspects. However, 32 out of 39 students
agreed with statements d and e. There was one choice where students could freely write
comments in the questionnaire. However, the comments were irrelevant to the issue in state-
ment 8.
Statement 8 Number of
Students
I think that the way of learning and teaching in the last two lessons differs
from the way of learning mathematics I usually experience in the following
sense:
a Using real data, not only artificial numbers. 26
b Using group work 31
c Using calculator is allowed 25
d Demanding students to develop their ideas and then defend those ideas
through correct arguments 32
e Giving students the chance to explore the solving of problems, not just
“telling” the correct ways. 32
Table 8. Results for Statement 8 concerning recognition of components in the new teaching approach.
Students’ Free Feedback
In the last three questions on the questionnaire, I asked the students for comments and sug-
gestions. Unlike what I had hoped, students´ comments were mostly not specific about some
aspect of the teaching. The comments were close to general statements about liking or dislik-
ing without giving specific reasons. Many comments were not informative. Two examples
showing like and dislike are: “fun, because students were taught to be brave to give opinions
and suggestions” and “the materials were difficult to understand.”
I received quite many suggestions regarding my way of speaking in teaching. I got
suggestions to “not speak too fast” or “not to use difficult and high9 language.” There were
four suggestions that indicated preference to the traditional teaching, such as “Give an
explanation and formula first.” Finally, there were two students who suggested using data
9 I am not sure about the meaning of „high‟ here.
48
from Indonesia, not abroad. I found this surprising since the data they analyzed were
Indonesian growth data and I only used a Dutch growth chart in the last minutes of one lesson
as an informative comparison.
4.5. The Interview with the teacher of the experimental group
After the first lesson, I immediately talked about the lesson to the regular teacher of the
experimental group. She expressed her disappointment about her students that they did not
automatically think of ordering the data while she had emphasized this aspect in her teaching.
She told me that the natural-science students would have immediately ordered the data and
would have been able to do the tasks better. When I asked what she thought about the lesson,
she told me that when she observed the lesson, it had occurred to her that maybe it would be
better if the students did also data collection by measuring their own heights. I agreed but
suggested that, due to time limitation, it would be more effective to integrate collecting data
to the teaching of the first part of statistics chapter, namely data representation. From the
teacher, I found out that the notion of growth chart is unfamiliar to people in this district.
Growth measurements are rarely done, even to babies. Therefore, growth measurement was
unfortunately not a familiar topic for the students.
The teacher was too busy to write observation notes. She only filled in the observation
sheet once after the first lesson. There was also one written interview, when she asked me to
write down my questions, because she could not provide time at that moment, and she would
write down her answers later. Therefore, for the next lessons, I wrote down important issues
from our talk after each lesson and summarized it in the interview after the teaching
experiment done. The summary of my interviews with the teacher of the experimental group
is given below.
Students’ social economic background
The teacher was also the homeroom teacher of the experimental classroom. Therefore she
knew the students background well enough. In general, students in this school were from poor
families. Students with better economy situation mostly go to the oldest school in this district,
which is considered better by parents. Most parents were farmers who did not have their own
rice fields and thus, are employed in other farmers‟ rice fields. Parents‟ attention and support
to their children‟s schooling was very low. In my opinion, this lack of parental support to
schooling was one factor that caused students‟ poor basic computational skills (skills that
should have been obtained in the precious level of education).
49
Students’ comment on researcher’s language in classroom
Since I received many comments from students that I spoke too fast and used „high‟ language,
I asked the teacher‟s opinion about this. She pointed out that actually I did not do much
talking in the class (I confirmed, I talked mostly in the discussion part), but she said that I did
seem to speak a little bit fast.
“(you did) speak a little bit fast. By fast I mean direct „O this, this, this, topic,
this, this. Right?”
She commented also that maybe I was too fast when I was explaining how to read the growth
chart. I used formal statistical terms like standard deviation there and she said that it was
already difficult for the students to read charts and that perhaps this episode with formal
language was why the students gave the comment on the language use.
Regarding the level of language, she disagreed with the students. Based on her own
experience, the students‟ language skills were not so good. Students from her early career in
teaching once asked her to not use the good and correct Bahasa Indonesia (language).
“… based on my experience, „Bu, dont use Bahasa Indonesia‟, meaning don‟t use
good and correct Bahasa Indonesia. Use some Bengkulu language, Lebong
language10
. That is the language (issue).”
Discussion within groups
I concluded that the group work was not working as well as I hoped. Although the teacher
told me that the students got better in the third and fourth lesson in working cooperatively, I
observed that the cooperative working mainly concerned procedural skills. It is unfortunate
that I could not recheck this issue from the audio stream of the video recording of the teaching
and learning. The audio quality of the video recording was unfortunately too bad to enable me
to catch even discussions within groups nearest to the video camera. From my observation, I
only saw students discussing how to compute this and that, or checking their computational
results. Therefore I asked whether the teacher had observed students discussing the essence of
the question.
Researcher (R): In the first activity the task was to create a rule to decide whether
a child‟s height is very normal, normal, and abnormal. Did you
hear any discussion about that?
Teacher (T): No
R: No, right? Well, yes, I did not eith- (interrupted)
T: The (students‟) discussion happened in your discussion. In the discussion,
(you asked) how would the answer be? Then (the students) started to think
about it. But when the students worked (in group), they did not discuss what
the task was about.
R: No?
T: They just did the computing.
10
Language of instruction in schools is Indonesian national language: Bahasa Indonesia. Bengkulu
language is used in provincial level and this district has its own language (the students‟ mother
language).
50
The teacher also stated that the students were able to compute the average (mean) and that this
actually was the statistics they used when dealing with data.
Willingness to teach
I told the teacher that perhaps if she taught the lesson instead of me, the students could do
better since they were familiar and comfortable with her. I then asked her, what kind of
preparation she would need if she was going to teach the lesson. She commented that she
would need to prepare herself on box-plot diagrams and how to read a growth chart.
The teacher also told me after a lesson, that she was of opinion that the teaching
approach in the experiment is the right approach to use in the class because it makes the
students active. However, she said she would need ready materials because she would not
have time to prepare such activity. If the materials were provided, she was willing to try the
approach. In fact, she asked me to develop materials for the whole academic year and then she
would use it in her teaching.
4.6. Analysis of the Teaching Experiment
It was my personal aim to introduce an alternative approach in teaching statistics in an
Indonesian secondary school classroom. The teaching experiment was designed to use the
concept of social constructivism. By using the term social constructivism, I mean to introduce
the formal statistical terms from students‟ informal understanding, which they jointly have
developed after trying to analyze real data in small teams. I analyze here the extent of the
social constructivist nature of the teaching experiment by analyzing the three components that
I focused on and by reflecting on things went differently or not as well as expected.
The positive
First of all, to some extent, the teaching and learning was in my opinion to some extent social-
constructivist. The key components of the teaching approach that I wanted to try out were
present and the results from the questionnaire seemed to confirm this. I also observed that the
teacher was interested in applying this approach in her daily teaching. She suggested that I try
the lessons also with the natural science students and said that the results would be better.
However there was no time to do this during my stay. I even observed a change in her
teaching after I explained her my goals and the ideas that underpinned my activities.11
In our
11
The teacher of the control group was uninfluenced by my approach because I only explained the lesson to the
teacher of the experimental group. The control group teacher asked for a copy of the pretest after the experiment
but I did not have time to discuss it with him.
51
discussion, we were both optimistic that, if we tried this approach right from the beginning of
the statistics unit, the results would be better.
The less positive
The quality of discussion within group work was not as good as I expected. Students were so
used to being passive and individual learners that they did not seem to know what to do when
they were given tasks without being taught a procedural knowledge beforehand. Moreover,
the tasks were open questions and it seemed to be difficult for them to verbalize their
reasoning and to express their opinions. Given time and guidance, I believe that students
would work better in groups and express their reasoning well.
Next, I found out after the lesson that the context of human growth was not familiar to
the students and this seemed to be a factor that reduced the engagement of students to the
activity. The students did not engage well in my efforts to discuss the creation of data the
task. Yes, they were really working in their groups (so to some extent they were engaged), but
in the first activity, for example, students did not produce any clear rules that I could use as a
foundation or bridge to the formal statistics. There was no specific rule for deciding the
normality of height that being developed by groups. There were groups that seemed to have
some rule to classify whether a datum was normal or not, but no rule was explicated or
externalized.
It does not mean that students had to be able to create sophisticated rules in Activity 1.
I think it means that the teaching experiment was not as social constructivist as I had planned
it to be because in the end I frequently used question-and-answer sessions. Nevertheless, I
tried to avoid being too teacher-centred in my instructions by asking questions that make
students think, giving hints and encouraging students to express their opinions.
Reflection on my own teaching
I acknowledge my limitations as the teacher of the experimental approach. My teaching
experience is limited in years and I have never taught social science students before. This
might not have prepared me to be sensitive enough to the needs of this type of students, for
example to slow down my talking speed a little. An Indonesian experienced teacher once told
me that based on her experience, low-achieving students tend to easily complain that their
teacher explains too fast. In my teaching experiment, the students were unfamiliar with me
and thus perhaps felt hesitant to interrupt me and ask me to slow down, and I was not
sensitive. I am also not an expert in constructivist teaching and learning. Although I have
always encouraged my students to be active learners, I had applied mostly a teacher-centred
52
approach. All these limitations mean that the teaching approach has in my opinion potential to
work better when replicated in similar situations.
53
5. Results and Analysis of the Pretest and Posttest There were 39 students in the experimental group and 39 students in the control group who
did the pre- and posttest. In this section, I present the results of the pretest and posttest as the
main source to answer my research question:
To what extent did the student-centred teaching of variation using real data and open-ended
tasks help to improve Indonesian social science stream (IPS) students‟ reasoning about variation?
There are three subtests of the pre- and posttest questions and the results are presented in the
order of the questions. For the ease of reading, I repeat the questions and their designed
purpose which I have described in general terms in Subsection 3.3.1 and which I will detail
occasionally. Anticipated answers are also given.
First, I describe the results of pretest and posttest separately. Next, I present the results
of students‟ interview about the pretest and posttest. Finally, I give my analysis of the
findings of the pre- and posttest.
5.1. Subtest A: Question 1 and 2
Questions in Subtest A were intended to probe students‟ intuitive understanding of the term
„variation‟ and its use in statistical contexts. The questions were taken from Meletiou (2000).
Question 1
Based on your experience, what does variation mean to you? Give an explanation and/or an
example.
Purpose
In the first question, I wanted to explore the students‟ intuitive understanding of the term
„variation‟, and whether their understanding had changed after the teaching. The teaching
intervention itself did not include a formal definition of the term „variation‟. I also wanted to
test if the Indonesian word for variation that I chose made sense to the student. In textbooks,
the statistical term of „variance‟, i.e., the square of standard deviation, is „ragam‟ in Indone-
sian language and the everyday word „variation‟ is „variasi‟. However, several textbooks refer
to „measure of variation‟ in Indonesian language also as „measure of keragaman‟12
instead of
„measure of variasi‟. I support this use of „keragaman‟ because for me „variasi‟ is just an
adoption. Therefore, I suspect that the meaning would not be immediately familiar to stu-
12
„Keragaman‟ is a noun which is formed from the infinitive „ragam‟. Both are nouns. „Keragaman‟
means the state or condition of being varied and „ragam‟ is the quality of being variable. In my study, I
use „keragaman‟ for „variation‟ and „variability‟ (interchangeable), and I use „ragam‟ for „variance‟ in
accordance with wording in Indonesian mathematics textbooks.
54
dents. Therefore, I decided to use the word „keragaman‟ (which can also means „diversity‟)
for „variation‟.
Anticipated answer
The word „keragaman‟ is similar in sound to the words „keanekaragaman‟ (translation:
diversity). I predicted that the students‟ answers would be close to the meaning of
„keanekaragaman‟. I did not consider the answer of definition of „keragaman‟ in the sense of
„keanekaragaman‟ (diversity) as a wrong answer. This question was intended to be
exploratory and informing. However, after the lesson on variation, I hoped students would
consider the meaning of variation in a statistical sense, which is to consider data sets as
distributions not as collection of individual values. A meaning of variation that I anticipated is
related to distribution of data such as „differences of values from measurement results‟.
Results
During the pretest, I could see that the students were very much clueless of what to do for
Question 1 and 2, especially in case of Question 2. Hence, in order to explain Question 2, i.e.
what is meant by high and low variation, without giving away the answer for the first ques-
tion, I asked the students what they knew about the meaning of variation. After some silence,
a student answered that it means „bermacam-macam‟ (translation: varied), followed by an-
other student‟s „variasi‟ (translation: variation). I justified both answers and then explained
that the higher the variation the more variable it becomes and the lower the variation is, the
more similar it becomes.
In summary, there were three types of responses observed (see Table 9):
A. Variation means varied or having many kinds
B. Variation means sameness or similarity
C. Blank or unclear answer
Perhaps due to my justification, the students‟ answers were mostly „bermacam-
macam‟ (See Table 9). There are 25 out of 39 students in the experimental group who
consistently answered in their pre- and posttest that it means „varied‟ or „a number of kinds‟
(Meaning A). I observed similar results in the pretest of the control group. The teaching
seemed to give little or no effect on students‟ understanding of the term „variation‟.
55
Meaning of ‘Variation’ Experimental Group Control Group
Pretest Posttest Pretest Posttest
Meaning A 25 25 24 28
Varied or having many kinds
Meaning B 8 3 2 1
Sameness or similarity
Meaning C 6 11 13 10
Blank or unclear
Table 9. Three types of answers for the meaning of variation and the distribution of the students‟ answers.
However, I analyzed the examples that the students gave and how the examples
changed from pretest to posttest. Since the question did not ask strongly for examples, not all
students gave an example. For the experimental group, out of 39 students, 12 students in the
pretest and 9 students in the posttest gave no example. For the control group, out of 39
students, in the pretest 4 students gave blank answers and 9 other students gave answers that
either made no sense to me or were clearly deviated from the intention of the question.
Table 10 presents some of the students‟ examples of variation. In the pretest, the
examples given by both groups tended to reflect the students‟ understanding of variation as
the state of having many different kinds, such as variation of cultures, fishes, ethnics and
human personalities. I was hoping that in the posttest, students could give examples that
indicated their ability to view data as aggregates. This expectation did not happen in the
control group. Students of the control group seemed to still see data as collection of individual
values. For the experimental group, there was a small positive change. Few students in the
experimental group gave examples that reflected my expected answer, such as human height,
sizes of Durian fruit and body weight. It was not much surprising because the lesson activities
involved data of human height and weight. Nevertheless, this was a small improvement
illustrating that there was a small numbers of students in the experimental group starting to
see the statistical meaning of variation.
56
Samples of Students' Examples of Variation
Experimental Group Control Group
Pretest Posttest Pretest Posttest
animals human height “Some trees are tall,
some others are short”
Plants
short, tall, thin, fat body weight Human personalities Houses, ways of
talking
brain capability
(cleverness)
sizes of Durian Ethnics Animals and plants
variation of cultures variation of marks Kinds of plants Marks
shapes of trees facial shape Culture
variation in opinion clothing brand
Table 10. A sample of students' examples of variation.
How about the individual change of answers? In the experimental group, six students
consistently wrote that variation means similarity or sameness (meaning B) and eight students
changed their pretest answer in the posttest; five of them changed either from Meaning B or
Blank to Meaning A. Out of this five, two gave body height as an example (see Table 11).
For the control group, there were more individual changes (see Table 12) that were not
all to be considered positive.
Pretest Posttest
Meaning B without an example Meaning A, example: height
Blank Meaning A, example: height
Meaning B without an example Meaning A without an example
Meaning A, example: table of histogram Blank
Blank Meaning A without an example
Blank Example : variation in marks
Blank Meaning B without an example
Meaning A without an example Blank
Table 11. Changes of students‟ answers for Question 1 from pretest to posttest for the experimental group.
57
Pretest Posttest
Meaning B Not Clear
Meaning A, example: trees Not clear
Blank Meaning A, no example
Meaning A, example: colors Not clear
Not clear Meaning A ** , example: house, ways of talking
Meaning A, example: ethnics Not clear
Not clear Meaning A, no example
Meaning A, example: books, trees Meaning A *, example: marks of tests
Not clear Meaning A, example: marks from tests
Not clear Meaning A, no example
Blank Meaning A, Example: weight, colors, ages
Blank
Meaning A, example marks of a test
Blank Not clear
Table 12. Changes of students‟ answer for Question 1 from pretest to posttest for the control group.
** concluded from the examples as the answer was not clear: “that we often see”
* seemed to have a consideration of variation in data sets: “varied values”
Question 2
For each of the following cases, answer the following question: “Which is more
desirable: high variation or low variation?” Add your reason.
a. Age of trees in a national forest.
b. Diameter of new tires coming off one production line.
c. Scores on an aptitude test given to a large number of job applicants.
d. Weight of a box milk of the same brand.
Purpose
The second question is about students‟ understanding of what variation means in practical
cases. Knowing when to minimize or maximize variation is one of the signs of understanding
and statistical reasoning ability.
Anticipated Answer
a. high variation
b. low variation
c. high variation
d. low variation
58
Results
Despite my effort to explain what is meant by high and low variation, the students seemed to
keep having difficulties. Most students seemed to have difficulty in understanding the term
„the desirable variation‟. I classified the students‟ answers into four types:
0. No Answer or answers without any explanation
1. No consideration of variation.
In this type, the students only described the common or the real situation of the case
being asked, such as in Question 2a: the age of trees in a national forest, a student
(MU) wrote „high, because old trees can protect from national disaster,‟ or they
described the desirable value of the case (instead of the desirable variation of the
case).
2. Wrong understanding of high or low variation and no consideration of what variation
is „desirable‟.
Students considered variation but did it wrongly. This happened mostly with students
who gave answer Meaning B in Question 1. For example, for the case the age of trees
in a national forest, a student answered “low, because some trees are old, some have
just been planted.”
3. Fair understanding of high or low variation, but still no consideration of what varia-
tion is „desirable‟; such as for the case scores on an aptitude test: “high, so that we
can assess the ability of the job seeker.”
4. Good understanding of high or low variation and what variation is „desirable‟; such
as for the case scores on an aptitude test, “high, so that it won‟t be difficult for us to
select.”
Table 13 showed the results of the identified students‟ level of reasoning using my categori-
zation. I group Question 2a and 2c, and I also group Question 2b and 2d because the results
showed that students were having problems in understanding Question 2b and 2d. One student
did not know what a „diameter‟ is and the phrase „coming off one production line‟ in the
Question 2b was misinterpreted by most of the students. The same thing happened with
Question 2d for the phrase „of the same brand‟. Due to this language issue, in categorizing
students‟ answers for 2b and 2d I followed students‟ interpretations of the question and
checked whether there was any consideration of variation in their reasoning. Table 13 illus-
trates that the number of students who were at level 3 and 4 increased very little from pretest
to posttest. The majority of students were at Level 1 (No consideration of variation).
59
Level of answers
Experimental Group Control Group
2a 2c 2a 2c
Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest
0 7 4 11 4 4 2 9 6
1 18 12 17 17 26 26 22 25
2 2 5 2 6 0 0 0 0
3 10 14 8 9 5 6 4 6
4 2 4 1 3 4 5 4 2
Level of answers
Experimental Group Control Group
2b 2d 2b 2d
Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest
0 11 7 14 6 10 9 16 12
1 14 15 15 14 18 18 16 17
2 4 5 4 8 0 1 0 2
3 9 6 5 7 7 8 4 5
4 1 6 1 4 4 3 3 3
Table 13. Distribution of the students‟ answers for Question 2.
Was there any individual improvement from pretest to posttest in both groups? Since
the levels are somewhat hierarchical, to investigate this, I checked the changes in answers
from individuals. In Table 14, we can see that there were 16, 12, 13, and 17 students in the
experimental group who showed a positive change in Question 2a, 2b, 2c, and 2d respec-
tively. While in the control group, there were 8, 10, 8, and 12 students who showed positive
change in the posttest. It is noted that there was more positive improvement than decrease as
well. The students from the experimental group showed a greater tendency to change their
answers between pre- and posttest, while the students from the control group showed very
few changes. It seemed that the control group students‟ reasoning prior to the lessons re-
mained least affected by the teaching and learning, either good or weak. All this indicated that
students in the experimental group improved more than students in the control group.
Change
of level
2a 2b 2c 2d
Experimen-
tal Group
Control
Group
Experimen-
tal Group
Control
Group
Experimen-
tal Group
Control
Group
Experimen-
tal Group
Control
Group
-4 1
-3 1 3 1 1 1
-2 1 1 1 0 0 0
-1 2 0 5 3 1 2 2 4
0 19 29 21 23 24 28 20 22
1 8 3 8 6 8 6 12 8
2 5 3 1 2 2 2 2 2
3 2 1 1 2 1 1 2
4 1 1 2 2 2
Table 14. The distribution of individual changes from pre- to posttest for Question 2 for students in the
experimental group and the control group.
60
5.2. Subtest B: Question 3 and 4
The questions in this subtest measured students‟ computational and procedural skills.
Question 3 and 4
3. Given the data: 11, 32, 17, 34, 24, 15, 28
For the data above, fill in the table below.
Range
Mean
Median
Upper Quartile
Standard Deviation
Interquartile Range
4. Below are the data of monthly income.
Monthly Income
( in hundred thousand rupiah )
Number of
People
3 – 5
6 – 8
9 – 11
12 – 14
15 – 17
3
4
9
6
2
The mean of the data above is ________.
The standard deviation is _______.
Purpose
Question 3 and Question 4 were related to computational skills. Since a comparison between
two different teaching approaches was attempted, it was fair to the control group to include
„traditional‟ questions in the pre- and posttest. Besides, regarding my reflection question, I
hoped that the teaching intervention would not only improve the statistical reasoning skills,
but also the computational skills that are needed in the nation-wide examination. Question 3
dealt with individual data and Question 4 dealt with grouped data.
Anticipated Answer
For Question 3, the correct answer is:
Range 23
Mean 23
Median 24
Upper quartile 32
Standard Deviation 8.2
Interquartile Range 17
For Question 4, the mean = 10 and the standard deviation = 6.2.
61
Results
The questions in this subtest were meant to check the computational and procedural skills of
the students. This kind of question was the typical question that the students were used to. I
wanted to ensure that the students in the experimental group could perform at least equally
well compared to students from the control group. Table 15 and Table 16 show the results for
Question 3 and 4 for both groups, respectively.
Initially, I expected students to use calculators to do these questions and I did ask and
urged them to bring calculators to their mathematics classes. However, it turned out that most
students did not own a calculator. This fact gave some idea to why students from both groups
did not perform so well in items that asked students to compute standard deviation. These
students were social science students whose arithmetic skills were not so well and perhaps
suffered from mathematical anxiety.
So, were the performances from both groups comparable? Yes. First of all, comparing
the number of blank answers in the experimental and the control group and between that in
the pre- and the posttest (Table 15 and 16), I could conclude that the students of the control
group showed less effort to even just try doing the questions. Secondly, an increase in the
number of correct answers from pre- to posttest in the experimental group was slightly better
than that of the experimental group.
Experimental Group Control Group
Question Items Pretest Posttest Pretest Posttest
Correct Blank Correct Blank Correct Blank Correct Blank
Range 19 9 26 3 3 25 9 29
Mean 16 5 21 2 6 23 9 23
Median 20 4 26 0 13 15 11 27
Upper Quartile 9 12 7 12 2 23 2 24
Standard Deviation 0 26 2 12 0 12 0 30
Interquartile Range 9 15 13 7 1 19 0 24
Table 15. Distribution of the correct answers for Question 3.
Experimental Group Control Group
Question Items
Pretest Posttest Pretest Posttest
Correct Blank Correct Blank Correct Blank Correct Blank
Mean 1 9 2 3 3 26 1 34
Standard Deviation 0 23 0 10 0 19 0 36
Table 16. Distribution of students‟ answers for Question 4.
5.3. Subtest C: Question 5-10
The questions in this subtest were mainly related to statistical reasoning about variation.
Students were expected to use their understandings of histograms and central measures and
62
consideration of variation in data, simultaneously with drawing conclusions based on data and
graphical displays. There was also a question about understanding of central measures.
Question 5
Four histograms and two descriptions of data are given below.
i. A data set of Mathematics test scores where the test was very easy
ii. A data set of wrist circumferences of newborn female babies (measured in
centimeters).
a. Which histogram best matches the data in description (i)? Give your reason.
b. Which histogram best matches the data in description (ii)? Give your reason.
Purposes
One aspect of the teaching intervention was to teach variation by letting students analyze real
data sets. In data analysis, looking at and interpreting the graphical data display of the data,
especially in the form of histograms, is important. The ability to read and interpret graphical
displays, especially histograms, was assessed through this question. In addition, this question
assessed students‟ ability to describe the variation of a data distribution.
Anticipated Answer
This question was taken from the Comprehensive Assessment of Outcomes in a First
Statistics course (CAOS) test (Web ARTIST Project, 2005) and the answer key provided is:
a. Histogram B
b. Histogram A.
For sub-question a, students needed to reason that the marks will mostly be high and thus the
correct suitable histogram should be Histogram B. For sub-question b, the anticipated
reasoning was that the measures vary as the weights of new-born babies vary. Therefore,
63
Histogram C was also an answer that would reflect good consideration about variation.
However, the large majority of babies normally have more or less the same weight so the
distribution is more symmetrical. Therefore, histogram A is considered the most suitable.
Results
I considered Question 5a as more straightforward than Question 5b in the reasoning. The
context in 5a is more familiar to the students than the context in 5b. Both questions needed
consideration of variation but for Question 5b, in addition, students also needed to consider
the tendency to normality for a distribution of data such as the wrist circumferences of
newborn female babies. The different level of difficulty for both questions was reflected in
the distribution of the students‟ answers (See Table 17). Out of 39 students, only 5 and 7
students answered Histogram A in Question 5b in the pre- and posttest, respectively. While
there were more students who chose the correct Histogram B in Question 5a.
In general, even when students managed to consider the variation in the data, students‟
difficulty in reading histograms hindered their reasoning to answer this question correctly. For
example, the modal answer for Question 5a was Histogram D. Students‟ explanation in
choosing Histogram D for 5a reflected one common misconception in understanding
histograms, namely, that students did not see the height of the bars as frequency but as case
value. For Question 5b, the modal answer was Histogram B. Students‟ explanation showed
that they still had the misconception of not seeing a data set as an aggregate but as individual
value. Many of them reasoned that since a baby grows, the wrist circumference increases.
Combined with reading bars as case values, Histogram B became the modal answer for
Question 5b.
Answer
Experimental Group Control Group
5a 5b 5a 5b
Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest
A 6 7 5 7 9 6 6 7
B 9 11 12 13 13 12 11 10
C 2 1 6 1 3 4 4
D 17 18 11 8 11 14 9 8
Blank 3 2 8 4 3 3 8 9
Multiple Answer 2 1 2 1 2 1 1 1
Table 17. Distribution of the students‟ answers of Question 5.
I used the SOLO taxonomy (Biggs & Collis, 1982) to categorize students‟ reasoning. I
created the rubric shown in Table 18, which is based on students‟ reasoning from both
experimental and control groups. There were many common aspects in the reasoning of
students the experimental and control groups. Clearly, students of both the experimental and
the control group still had difficulties understanding the histogram. Judging from the amount
64
of time spent in the lesson about histograms and the traditional style of teaching about
histograms that I observed in both groups, this was not really a surprise. In fact, both teachers
were ready to leave out teaching histograms if I had not asked them to teach histograms prior
to the teaching of variation. Several students showed some ability to read the histogram in one
question but failed to do so in the other question. This indicated that the sophistication of
reasoning about variation also depended much in the context of the question.
65
Level General
Indicators
Specific
Indicators Example Students’ answer
1 Pre
structural
The task is not
attacked
appropriately;
the student
hasn‟t really
understood the
point and uses
too simple a
way of going
about it
No consideration
of variation in the
data; e.g. for 5a,
many high scores
and no low scores
in the test and for
5b, the
measurements
vary but most of
them are around a
central measure.
Inability to read
histograms.
Just stating that
the chosen
histogram
fitted the
description of
the data.
Explaining the
shape of the
chosen
histogram.
Blank or no-
sense answers
“Histogram D
because it best
matches (the data)”
(5a)
“Histogram B
because the scores
increased
drastically” (5a)
“Histogram C
because wrist
circumference of
newborns are very
small” (5b)
“Histogram B
because babies
grow so that the
circumferences
increase” (5b)
2 Uni
structural
The student's
response only
focuses on one
relevant aspect.
Recognition of
variation in the
data (mentioned
in level 1) but
focused only on
this.
Inability to read
histograms.
Choosing a
histogram
based on the
height of the
bars.
“Histogram C
because there were
many that got high
scores” (5a)
“Histogram C
because it showed
variation” (5b)
3 Multi
structural
The student's
response
focuses on
several relevant
aspects but they
are treated
independently
and additively
Able to consider
variation in the
data and to read
histogram but
independently.
Choosing
histogram
(either correct
or not) based
on the two
main aspects.
“Histogram B
because most of the
scores are high”
(5a)
“Histogram D
because the values
vary” (5b)
4 Relational The different
aspects have
become
integrated into a
coherent whole.
This level is
what is
normally meant
by an adequate
understanding
of some topic
Able to consider
variation in the
data and to
choose the correct
histogram in the
context.
Giving good
explanation of
the variation in
data and
connect it to
the shape of
histogram.
None
Table 18. Levels of students‟ reasoning for Question 5 based on SOLO Taxonomy.
66
So, how did the students of the experimental group perform compared to the students
of the control group? The results listed in Table 19 do not give the correct picture of students
reasoning, as I found that they might choose the correct histogram but for wrong reasons. The
distribution of levels of reasoning based on the rubric in Table 18 can be found in Table 19.
From this table, I inferred that students of the experimental group performed better than the
students of the control group. For both questions, there were some improvements in level of
reasoning in the posttest for the experimental group. On the other hand, students in the control
group barely showed any improvement.
Level
5a 5b
Experimental Group
Control Group Experimental Group
Control Group
Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest
1 18 11 26 27 32 24 33 32
2 18 21 12 11 5 9 6 7
3 3 7 1 1 2 6
4
Table 19. Distribution of the students‟ levels of reasoning for Question 5.
I also looked at the individual change of levels to compare the performance. From Table 20
for Question 5a, it can be seen that there were more students from the experimental group that
improved their reasoning level (11 students) and fewer students that decreased in perform-
ance. For Question 5b, the students of the control group almost did not change at all.
Change of level
Experimental Group Control Group
5a 5b 5a 5b
-2 1 1
-1 1 4 5
0 26 24 28 38
1 9 9 4 1
2 2 2 1
Table 20. Distribution of the individual changes of reasoning level for Question 5.
Comparison of the pretest results and the mean gains between groups
Firstly, I conducted for each sub-question a two-tailed independent samples t-test to evaluate
via the pretest that there was no difference in students‟ ability levels of reasoning at the start
of the intervention. In both cases, the null hypothesis was supported. The pretest mean scores
of students in the control group were not statistically different from those of students in the
experimental group: in Question 5a (t = 1.59, df = 76, two-tailed p-value = 0.115) and in
Question 5b (t = 0.54, df = 76, two-tailed p-value = 0.592).
67
Secondly, I conducted a paired-samples t-test for both the experimental and control
group (each group size equal to 39) to evaluate via the pretest and posttest the null hypothesis
that there was no difference in students‟ levels of reasoning before and after the teaching. For
the experimental group, the null hypothesis was not supported: in Question 5a (t = 2.24, two-
tailed p-value = 0.031) and in Question 5b (t = 2.04, two-tailed p-value = 0.048). For the
control group, the null hypothesis was supported: in Question 5a (t = -0.24, two-tailed p-value
= 0.812) and in Question 5b (t = 1.00, two-tailed p-value = 0.324). These results indicated that
there was a significant gain in the experimental group, but not in the control group.
Finally, I conducted for each sub-question a two-tailed independent samples t-test to
evaluate the null hypothesis that the there was no difference in the mean gains between the
experimental and the control group. The null hypothesis was supported: for both Questions 5a
(t = 1.80, df = 76, two-tailed p-value = 0.076) and for Question 5b (t = 1.77, df = 76, two-
tailed p-value = 0.081). It seemed that although there was statistically significant gain in the
experimental group, the gain of the experimental group did not differ too much from the gain
of the control group to be statistically significant.
Question 6
Two students who took mathematics tests received the following scores (out of 100):
Student A: 60, 90, 80, 60, 80
Student B: 40, 100, 100, 40, 90
If you had an upcoming mathematics test, who would you prefer to be your study partner,
A or B? Why?
Purpose
In this question, I wanted to find out whether students use any measure of variation in
comparing two data sets. This question can be considered as a question that assesses the
ability to read beyond the data (in the sense of Curcio, 1987) in the process of analyzing and
interpreting data. Based on the two data sets, this question implicitly asked students to predict
the future performance of students A and B and thus I could verify whether students used any
consideration of variation in their reasoning.
Anticipated Answer
This question was taken from Meletiou (2000). She found the following: (i) out of 30
university students, 50 % of the responses chose option „student A‟ because there was less
variation (consistent); (ii) 33 % of the responses chose „student B‟, viewing the perfect score
as potential; and (iii) 15 % of the responses said “it wouldn‟t really matter who you study
68
with since they both are essentially the same grade standing of 74%. They compensate each
other” (p. 131). The three kinds of answers could also be found in my study. However, I
realized that the answer to this question might depend on the students‟ experience and
expectations. Students might consider their own situation in answering the question. This is
not a problem because there is no correct and wrong answer. I wanted to check if students
considered variation in their answer and not only informally invented measures or central
measures. However, because student A and student B in the question have the same average
score, I considered answer „Student A‟ as the correct answer because his/her scores are less
variable and so more reliable as a study partner.
Results
For this question, there were three types of students‟ answers that I observed (Table 21):
Student A;
Student B;
Anyone is fine.
The modal answer was „Student A.‟
Although the answer „Student A‟, which was the answer I considered as the correct
answer, was the modal answer, the students‟ reasoning behind it was mostly not what I had
expected. Students mostly based their conclusion on an informal measure such as the extreme
values or a particular standard value. In the pretest, none of the students seemed to consider
standard deviation in their reasoning.
Answer Experimental Group Control Group
Pretest Posttest Pretest Posttest
Blank/Unclear 2
Student A 35 34 28 24
Student B 1 3 6 11
Anyone 3 2 3 4
Table 21. Distribution of the students‟ answers of Question 6.
To investigate the students‟ reasoning, I used the framework of Mooney (2002) to
categorize students‟ answers. There are four levels of statistical reasoning in Mooney‟s
framework and I defined the descriptors for each level based on students‟ answers. The
descriptors of the four levels (see also, Subsection 2.2.2) that I identified are:
1. Idiosyncratic : using blank or unclear answers
Examples:
“A because his marks are greater than B‟s.”
“A because (he) has satisfying marks.”
69
“I chose A because his marks are little (numbers) so it is easy to compute without a
calculator.”
2. Transitional: using informal or invented central measures, such as the extreme values
or some values based on certain life experience (e.g., 60 as the standard passing
score), or an informal measure of variation.
Examples:
“A because A‟s scores are more satisfying. Although B got 100 twice but all A‟s
scores are above the standard.”
“A because his marks are higher than B‟s and the differences are not too far.”
“In my opinion, I will choose B because his results are satisfying. Although there is a
40 but the 100 is enough to compensate the deficiency.”
“B, because the standard deviation of B‟s marks is higher than A‟s.”
3. Quantitative: using formal central measures, namely the mean or median. Since the
numbers of data on the two sets are equal, I include the sum as a formal measure here.
Example: “Because the average score of the two students are the same, so I choose
student A because all his marks are above passing mark.”
4. Analytical: using measures of variation in combination of the central measures.
Example: “A because his typical score is not to far away from the sum of his marks.”
The distribution of the students‟ levels of reasoning is presented in Table 22. The
majority of students from both the experimental and the control group were at level 2. As
mentioned earlier, most students drew their conclusions from an informal measure, most
notably based on whether all marks passed 60, the standard passing mark. Another typical
reasoning was to base the answer on extreme values, e.g., the perfect mark. Not many changes
happened in the posttest although the number of students in level 1 decreased in both groups.
Table 23 also showed not much level changes.
Level Experimental Group Control Group
Pretest Posttest Pretest Posttest
1 9 5 12 8
2 25 29 23 25
3 5 4 4 6
4 1
Table 22. Distribution of the students‟ levels of reasoning for Question 6.
70
Change of Level Experimental Group Control Group
-1 4 1
0 29 32
1 4 5
2 1 1
3 1
Table 23. Distribution of the individual changes of levels of reasoning for Question 6.
Comparison of the pretest results and the mean gains between groups
Firstly, I conducted a two-tailed independent 2-sampled t-test to evaluate via the pretest that
there was no difference in students‟ ability levels of reasoning at the start of the intervention.
The null hypothesis was supported. The pretest mean scores of students in the control group
were not statistically different from those of students in the experimental group (t = 0.75,
df = 76, two-tailed p-value = 0.457).
Secondly, I conducted a two-tailed paired t-test for each group to evaluate via the
pretest and posttest the null hypothesis that there was no difference in students‟ levels of
reasoning before and after the intervention. For the experimental group, the null hypothesis
was supported: (t = 1.09, two-tailed p-value = 0.281). For the control group, the null hypo-
thesis was also supported (t = 1.97, two-tailed p-value = 0.057). These results indicated that
there was no statistically significant gain in either the experimental group or the control
group.
Finally, I conducted a two-tailed independent samples t-test to evaluate the null
hypothesis that the there was no difference in the mean gains between the experimental and
the control group. The null hypothesis was supported: (t = -0.18, df = 76, two-tailed p-value =
0.856). Thus, there was no statistically significant difference between the mean gain of the
experimental group and the control group.
Question 7
One day Dedi caught a very big catfish from his rice field. He wanted to be sure of the
weight of the fish and therefore he weighed it 7 times on the same scale/balance. Below
are the measurements (in kilogram) that he found:
2.9; 2.7; 5.1; 3.1; 3.0; 2.8; 3.0 kg.
a. How spread out are the measurements he obtained?
b. How many kilograms do you think the true weight of the catfish was? Give your
reason.
71
Purpose
This question was a modified version of one of the questions used by Groth (2003) as a tool to
describe high school students‟ statistical thinking. In his study Groth used the question to
investigate students‟ thinking in using measures of centre (7b) and measures of spread (7a). In
my pretest, I used the same question but I changed the context and the data. In Groth‟s study,
the measurements were all different but in mine, I deliberately put two same measurement
values to find out whether students immediately used the formal measure of mode. I wanted
to find out what measures of centre and variation the students possibly used to draw an
informal conclusion and whether their views on the measures of centre were adjusted when an
outlier is present.
Anticipated Answer
For the first question, I expected in the pretest that most students answered it by mentioning
the easiest measure of variation, namely range, for example, the measurements are spread out
from 2.7 kg to 5.1 kg. Besides this, since there is an outlier, I also expected that after the
lessons, students might also come up with the interquartile range, which is “around 1-2
ounces from 3.0 kg.”
For the second question, I thought most students would think of using the notion of
mode. However, I preferred that students either used the median for the estimation of the true
weight or just used the mean by first omitting the outlier.
Results
Question 7a. was perhaps the most misunderstood question in the pretest as well as in the
posttest. The phrase “how spread is…” that I assumed to be easy to be grasped turned out to
be confusing for the students. I assumed that the most probable answer, that the students
would immediately think of, was to answer the question in one of the measure of spread
namely range. Range has been taught to the students of the experimental group previously by
the regular teachers as a part of 5-number summary and I assumed the same case for the
students of the control group.
However, none of the students from both groups gave the range as an answer in the
pretest and some of them did not answer at all (See Table 24). In fact, for the control group,
even in the posttest no measure of spread was used in their answer. Their explanations did not
indicate any understanding of the question. I concluded that this was due to the limited
linguistic skills. Therefore, I decided to exclude this Question 7a from my analysis.
72
Answer
7a
Experimental Group Control Group
Pretest Posttest Pretest Posttest
Blank 12 3 9 5
Not use any formal measures of spread 27 29 30 34
Use the range 7
Table 24. Distribution of the students‟ answers of Question 7a.
For the second question, Table 25 shows that when the data are not all different, many
students would use mode as the central measure. My expectation that students considered the
effect of the presence of an outlier to the central measure was not met. No students computed
the mean by first excluding the outlier. However, more students in the experimental group
used the mean (of all data) or the median for the central measure (from 3 in the pretest to 11
students in the posttest). In contrast, there seemed to be almost no change in the control group
(from 4 in the pretest to 5 in the posttest). The modal answer in the control group was still not
to make use of any central measures. No students in the control group used the median. It
indicated that the teaching intervention stimulated students to use formal measures.
Answer
7b
Experimental Group Control Group
Pretest Posttest Pretest Posttest
Blank or no explanation 8 2 6 3 Not use any formal central measure 12 8 19 19
Use the mode 15 16 10 12
Use the mean of all data 3 11 4 5
Use the median 1 2
Table 25. Distribution of the students‟ answers for Question 7b.
Question 8
The below histogram shows the number of hours of exercising per week by marketing
staffs of a bank.
876543210
9
8
7
6
5
4
3
2
1
0
Number of Hours
Nu
mb
er
of
Pe
op
le
Histogram of Number of Exercising Hours Per Week
73
a. Compute the median. _______________
b. Compute the mean. ________________
c. Based on the histogram, how many hours do the staffs in this company usually
exercise per week? Give your reason.
Purpose
The third part of this question was indirectly assessing students‟ understanding of variation.
The prominent emphasis was an assessment of the students‟ knowledge of the statistics that
they have learned prior to the intervention, i.e., measure of centre, and whether this knowl-
edge has improved after intervention. Part a and b were explicitly about central measures.
However, in part c, students need to be able to decide which measure of centre can explain the
nature of the data, taking into account the variation or the shape of the graphical display.
Herein was the indirect assessment of any consideration on variation.
Anticipated Answer
The shape of histogram is skewed, so it is better to use the median instead of the mean as the
answer to the question. To compute the median and the mean, students must be able to read
histograms. Otherwise, the misconception such as using the mid value of x-axis as the median
could occur.
Results
The purpose of Question 8 was to probe students‟ understanding of the meaning of central
measures, specifically their understanding of the use of mean versus median and their ability
to compute the arithmetic values for the histogram. Because my study focused on students‟
reasoning about variation, I did not delve deep in the first two questions.
Research on students‟ understanding and misconceptions about the central measures,
in particular the mean and the median are quite abundant (See, for example, Garfield & Ben-
Zvi, 2008, Chapter 9 and references therein) and although I saw the misconceptions
mentioned in the research literature, I do not want to discuss it here. My focus for Question 8a
and 8b was actually whether students could compute the mean and the median. And the
results in Table 26 indicated that students had difficulties in computing them. I suspected that,
similar to the result from Question 5, students‟ understanding of the notion of histogram was
poor and that this was the main reason that they could not compute the mean and median.
74
Answer
8a 8b
Experimental Group
Control Group Experimental
Group Control Group
Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest
Blank 10 7 16 10 15 8 17 13
Wrong 28 29 20 21 24 29 22 26
Correct 1 3 3 8 2
Table 26. Distribution of the students‟ answers for Question 8a and 8b.
For Question 8c, I checked which central measure the students used and the reasoning
behind. The results for Question 8c are listed in Table 27. The performance in the pretest was
really poor. Most of the students did not answer or answered without using any measures of
centre. The lack of understanding of the histogram again played a role here. The title of the
histogram clearly mentioned that the data was about number of exercising hours per week
and thus the horizontal axis meant the number of hours. Students‟ answers revealed that many
considered the horizontal axis as numbers of hours spent by one person in different week. I
also found in the given answers that the students misunderstood the horizontal axis with the
number of hours per day, so that when the students used the mode, they would multiply it by
7 and also sometimes, 24. In the posttest, the numbers of students using mode and median
increased, and Table 28 shows that there was more improvement for students in the experi-
mental group (17 students) than in the control group (9 students).
Categories
8c
Experimental Group Control Group
Pretest Posttest Pretest Posttest
0 Blank 17 10 18 14
1 No Central Measures 15 9 13 14
2 The Mean 9
3 The Mode 7 9 8 11
4 The Median 2
Table 27. Distribution of the students‟ answers for Question 8c.
Change of Level
8c
Experimental Group Control Group
-3 1
-2 2
-1 3 3
0 18 25
1 7 4
2 5 2
3 5 3
Table 28. Distribution of the individual changes of levels of reasoning for Question 8c.
75
Question 9
Forty college students participated in a study of the effect of sleep on test scores. Twenty
of the students studied all night before the test in the following morning (no-sleep group
while the other 20 students (the control group) went to bed by 11.00 pm on the evening.
The test scores for each group are shown in the diagrams below. Each dot on the diagram
represents a particular student‟s score. For example, the two dots above the 80 in the
bottom diagram indicate that two students in the sleep group scored 80 on the test.
• •••
•••
•••
•••
•••
• • • •
30 40 50 60 70 80 90 100
Test Scores: No-Sleep Group
• • • • •••
•••
•••
••
••
••
•
30 40 50 60 70 80 90 100
Test Scores: Sleep Group
Examine the two diagrams carefully. Which group is better: the sleep group or the no-
sleep group? Explain your reasons.
(Note: In the actual test sheet, the following question was on a different page at the
back of the page of the above question. This was designed so that students would give
their own reasoning before choosing one of the multiple choices below).
Then circle one of the 6 possible conclusions listed below that you mostly agree with.
a. The no-sleep group did better because none of these students scored below 35 and
the highest score was achieved by a student in this group
b. The no-sleep group did better because its average appears to be a little higher
than the average of the sleep group.
c. There is no difference between the two groups because there is considerable
overlap in the scores of the two groups.
d. There is no difference between the two groups because the difference between their
averages is small compared to the amount of variation in the scores.
e. The sleep ground did better because more students in this group scored 80 or
above.
76
f. The sleep group did better because its average appears to be a little higher than
the average of the no-sleep group.
Purpose
In this question, I wanted to find out whether students would use the combination of measure
of centre and variation when comparing two data sets. This problem was taken from the
Statistical Reasoning Assessment (SRA) designed by Garfield (2003). The correct answer is
(d) and other choices are common misunderstandings of students, for example, paying
attention either to the extreme values only or to the average. I wanted to test whether students
realized that in comparing two data sets, they needed to consider not only central measures
but also measures of variation.
Anticipated Answer
Garfield (2003) designed the question to find out wether students took variation of the scores
into consideration when they compared the two data sets. A correct reasoning skill, namely
„understanding sampling variability‟, corresponds with option d and a misconception,
„comparing groups based on their averages‟, corresponds to option b and f (Garfield, 2003, p.
27). I predicted that quite some students would choose the misconception, but I hoped that in
the posttest many students would choose option d.
Results
This question was formatted such that students were first expected to freely explore the open-
ended question before choosing one of the multiple choices which were provided on the next
page. This question design was unsuccessful. From now on, I refer to the open-ended question
as the first part and the closed question (the multiple-choice question) as the second part. Due
to the unfamiliarity of the students with the first part, most students just browsed through the
pages and so saw the multiple choices. This resulted in students only copying one of the
multiple choices in the second part into the first part.
Adopting levels of statistical thinking for „Analyzing and Interpreting Data‟ from Mooney
(2002), I categorized students‟ answers into the following four ability levels and descriptors
for each level:
1. Idiosyncratic: students make no inferences or inferences that are not based on the data
or based on irrelevant contextual issues.
Blank answers or answers that do not make sense and answers based on anecdotal
experience are categorized into this level.
77
2. Transitional: students make inferences that are primarily based on the data.
Options a, c, and e correspond with this level. Students base their answers on extreme
values, certain values or simple graphical properties. One typical answer from students
was one that compares the number of high or low marks, for example “Sleep group,
because the sleep group has more high marks than the no-sleep group” and “Sleep
group, because in sleep group only 10 people got marks below 60.”
3. Quantitative: students make reasonable inferences based on data and the context.
I consider option b and f, where the mean is the only basis for comparing data sets to
belong to this level.
4. Analytical: students make reasonable inferences based on data and the context using
multiple perspectives.
Consideration of variation and mean is the underlying aim for this question and
therefore if students‟ reasoning shows consideration of both variation and the central
measures, for example option d, I categorize it in this level.
The modal answer for the second part was option e in the experimental group and
option f in the control group (See Table 29). This indicated that most students of the
experimental group were in the Level Transitional, since they used informal measures to draw
conclusions. Students of the control group showed this indication as well, but there were more
students who chose option f and this means that they were using primarily the mean to draw
conclusions (Level Quantitative). However, the result from the open question part (See Table
30) showed that the students from the experimental group seemed to improve more compared
to those of the control group. Fewer students reasoned idiosyncratically and more students
reasoned quantitatively. In the control group, the majority of the students were at the
idiosyncratic level in the open question part and there was not as much change at the
quantitative level as in the experimental group.
Answer Experimental Control
Pretest Posttest Pretest Posttest
a 4 2 1 1
b 1 6 5 3
c 1 2 1 5
d 5 5 7 4
e 11 10 6 4
f 6 8 9 14
Blank 10 3 9 6
Multiple Answers 1 3 1 2
Table 29. Distribution of the students‟ answers for the closed part of Question 9.
78
Level
Experimental Group Control Group
Open Closed Open Closed
Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest
1-Idiosyncratic 12 6 10 3 22 25 9 7
2-Transitional 24 24 16 15 12 11 8 10
3-Quantitative 3 9 7 16 5 3 15 18
4-Analytical 6 5 7 4
Table 30. Distribution of the students‟ levels of reasoning for Question 9.
Comparison of the pretest results and the mean gains between groups
Firstly, I conducted for each part a two-tailed independent 2-sampled t-test to evaluate via the
pretest that there was no difference in students‟ ability levels of reasoning at the start of the
intervention. For the closed part, the null hypothesis was not supported (t = -3.87, df = 76,
two-tailed p-value = 0.000). For the open part, the null hypothesis was also not supported
(t = 3.36, df = 76, two-tailed p-value = 0.001). Thus, the experimental and control group
differed significantly in the statistical sense. This is the reason why I will below not compare
the two groups with each other, but only look at gain within a particular group of students.
Secondly, I conducted a two-tailed paired t-test for each group of students (N=39) to
evaluate via the pretest and posttest the null hypothesis that there was no difference in
students‟ levels of reasoning before and after the intervention. For the experimental group, the
null hypothesis was not supported in the closed part (t = 2.51, two-tailed p-value = 0.017), but
it was supported in the open part (t =1.97, two-tailed p-value = 0.056). This indicated that in
the experimental group, there was a statistically significant gain in the closed part, but there
was no statistically significant gain in the open part. On the other hand, for the control group,
the null hypothesis was supported for both the closed part (t = -0.16, two-tailed p-value=
0.872) and the open part (t = -1.22, two-tailed p-value = 0.230). This indicated that there was
no statistically significant gain in the control group, but this was not a big surprise for the
closed part because the students in the control group were already at a higher level compared
to the experimental group (In retrospect, normalized gain would have been a better variable to
use in the statistical analysis).
Question 10
Below is the histogram of the scores of a mathematics test in two classes.
79
Scores
9585756555
Class A
Fre
qu
en
cy o
f sco
res
24
21
18
15
12
9
6
3
0
Scores
9585756555
Class B
Fre
qu
en
cy o
f S
co
res
24
21
18
15
12
9
6
3
0
a. Comparing the two histograms, one could infer
i. Variation of scores in Class A is higher than in class B. (The scores in
class A vary more than the scores in class B)
ii. Variation of scores in Class B is higher than in class A (The scores in
class B vary more than the scores in class B)
iii. Class A and class B have equal variation of scores.
iv. I don‟t know.
b. Why? Give your reason.
Purpose
Finally, in the last question, students were asked to compare the variation in two data sets, in
the form of graphical displays, namely histograms. The ability to understand histograms was
essential here and I wanted to test whether the intervention improved this ability.
Anticipated Answer
The correct answer is in my opinion that the variation of scores in class B is higher than the
variation of scores in class A. This problem was taken from Cooper & Shore (2008), but I
decided to use the word „variation‟, instead of „variability‟. However, besides the ability to
consider both the case value and the frequency aspects of histograms, I thought that it was
also possible that students could still correctly answer this question when they consider the
standard deviation as a measure of variation. I think that this is especially true in the Indone-
sian curriculum where students learn how to estimate the mean and the standard deviation of
grouped data. Intuitively, as most scores for Class A are concentrated around the centre com-
pared to Class B, the standard deviation should be smaller and therefore the variation would
be less.
80
Results
Similar to Question 9, by adopting levels of statistical thinking for „Analyzing and Interpret-
ing Data‟ from Mooney (2002), I categorized students‟ answers into four levels and identified
descriptors for each level:
1. Idiosyncratic: students make no inferences or inferences that are not based on the data
or based on irrelevant contextual issues.
Blank or unclear answers and answers based on anecdotal experience are categorized
into this level.
2. Transitional: students make inferences that are primarily based on the data.
Students base their answers on extreme values, certain values or simple graphical
properties. A typical response was one that compared the number of high or low
marks, for example “because class A gets more mark 75 than class B” and “… and
also in class B there are many students who get pretty high marks.” A response based
on the range of values belongs to this level. Another example of reasoning in this level
was the following: “because both diagrams have symmetrical shapes.”
3. Quantitative: students make reasonable inferences based on data and the context.
I consider answers in which a formal statistical measure is the basis for comparing
data sets, belong to this level. Students in this level did some further exploration such
as considering a central measure or a measure of variation. One example was the
following answer: “because the average value in class B is a little larger than the
average value in class A.”
4. Analytical: students make reasonable inferences based on the data and the context
using multiple perspectives.
In this level, students consider both the central measure and measure of variation. One
simple example was the consideration of the deviation from the central value: “… and
also values in Class A are far from the average value.”
The modal answer for this question in both experimental and control group was that
the „variation of marks in Class B is higher than that in Class A‟ (See Table 31). From Table
37, regarding the distribution of answers, it can be concluded that there was no change
between the pretest and posttest results in both groups.
Although the modal answer was the answer I considered as correct, I could not
conclude that students had good understanding of variation or histograms. Most students of
both experimental and control groups gave explanations which did not make much sense at all
(The idiosyncratic level) or only based their reasons on some invalid informal measures (The
81
transitional level, see Table 32). Popular reasons were, for example, repeating the options,
comparing the frequency of the highest mark, or simply comparing the height of the middle
bar. And this happened in both groups.
Answers Experimental Group Control Group
Pretest Posttest Pretest Posttest
Variation of marks in Class A is higher than that
in Class B
14 15 10 8
Variation of marks in Class B is higher than that
in Class A
16 17 10 12
Class A and Class B have equal variation 7 5 5 7
I don't know 2
Blank 2 14 12
Table 31. Distribution of the students‟ answers of Question 10.
Level Experimental Group Control Group
Pretest Posttest Pretest Posttest
1 Idiosyncratic 10 12 28 31
2 Transitional 27 22 11 8
3 Quantitative 2 5
4 Pre-Analytical 0
Table 32. Distribution of the students‟ levels of reasoning for Question 10.
Comparison of the pretest results and the mean gains between groups
Firstly, I conducted a two-tailed independent samples t-test to evaluate via the pretest the null
hypothesis that there was no difference in students‟ ability levels of reasoning at the start of
the intervention. The null hypothesis was supported (t = 1.19, df = 37, two-tailed p-value =
0.240). Thus, the pretest mean scores of the students in the control group were not statistically
different from those of the students in the experimental group
Secondly, I conducted for each group a paired samples t-test to evaluate via the pretest
and posttest the null hypothesis that there was no difference in students‟ levels of reasoning
before and after the intervention. For the experimental group, the null hypothesis was
supported: (t = 0.22, two-tailed p-value = 0.831). For the control group, the null hypothesis
was also supported (t = -0.90, two-tailed p-value = 0.373). These results indicated that there
was no statistically significant gain in the experimental group and the control group.
Finally, I conducted a two-tailed independent samples t-test to evaluate the null
hypothesis that the there was no difference in the mean gains between the experimental and
the control group. The null hypothesis was supported: (t = 0.70, df = 76, two-tailed p-value =
0.487). Thus, there was no statistically significant difference between the mean gain of the
experimental group and the control group.
82
5.4. Result of the Interview
I interviewed six students of different levels of readiness from the experimental group after
the teaching experiment and the posttest ended. Because the examination period had started, I
could not interview students from the control group. In the interviews, I tried to find out if I
could gain more insight into students‟ reasoning in answering the pretest and posttest than
what the students have shown in their written responses. As mentioned in chapter 5, the
students seemed to have some difficulties in linguistic ability. I wanted to check if students‟
reasoning might be more sophisticated but could not be expressed due to difficulty in
expressing thoughts in a written form.
The interviews with the students did not give me more insight than what I had already
obtained from the pretest and posttest. I went through the students‟ answer in the pre- and
posttest and students did not seem to remember what their answers were without looking at
their written answers. This might indicate that the reasoning levels of the students were not so
high. One of my questions was if the students did any computation in any questions in the
test, for example in Question 6 for which pupils were to choose a study partner. It seemed that
if they did not indicate it in their answer, then indeed they did not do any computation.
I interviewed a student who was active in the teaching experiment. According to the
teacher, this student was an average learner in the class. However, his pretest result was
amongst the better ones in the class and during my interactive lecture and discussion, he was
one of the very few who come up with ideas. He did his posttest in the teachers‟ room
because he did not go to school at the day the posttest was given. I had the opportunity to
observe his effort in doing the posttest. Due to bad timing of the posttest (it was the last day of
the school before the exam period and school ended early), he was however impatient to
finish. I saw that in Question 2, for example, he gave his answer without his reasoning, So I
urged him to do it seriously because he still had much time. When finally I allowed him to
finish, he told me that the pretest was not a mathematics test, but more of a Bahasa test
(language test) and he never had gotten such test before. The result of his posttest was better
than the pretest and I considered him among the better students in the class. In my interview,
he showed good interpretation of standard deviation. I asked him the following question:
Researcher (R): Now, if for example you want to sell Durian13
. You are a
Durian farmer, just did harvesting, and you have plenty of
Durian. Because you do not have time to sell the Durians
yourself, you want to look for two people.
Student (S): yes.
13
A popular seasonal fruit. I made up different contexts for this kind of question for different
students to anticipate a student telling the next interviewee the questions I asked.
83
R: Now, suppose there are these two people. If- You hired them too last year,
let‟s name them A and B.
I explained that he observed these two persons‟ sales performance by
recording their daily sales number for a week. He then computed the mean
and standard deviation and found out that the means are equal but with
different standard deviation.
R: If A and B‟s daily-sales means are equal, for example their mean is
selling 50 Durian per day, but the standard deviation for A is 1 and
standard deviation for B is 10. Which of them will you hire, if you only
want to hire one person? A whose standard deviation is 1 or B whose
standard deviation is 10?
S: the one with standard deviation 1, Mam.
R: Why?
(The student paused (thought) for around a minute so I told him he could use
his native language if he had difficulty in Bahasa Indonesia)
S: because--- (almost two-minute pause/thinking)
S: because he can sell, in a day, Mam, for example (pause) 30. Maybe (he
can sell) the next day 30.
R: Okay. If let say today 30, tomorrow 30. How about the other person?
S: The other person, if today (he can sell) 20, maybe tomorrow (he can sell)
1, and the day after that (he can) reach 50.
From his answer, I infer that he could connect the values of standard deviation to making
prediction. I could not see this understanding of standard deviation in the posttest, for
example in Question 6. His answers for Question 6 were the same in the pre- and posttest:
Anyone could be a study partner because the average marks are the same. He did not seem to
consider any measure of variation. This was partly due to the different context of the
questions but this might also indicate that at early learning stage of statistical reasoning,
students need to be directly pointed to employ the notion of measure of variation. This
particular student had showed a glimpse of intuitive reasoning about the term „variation‟
(from Question 1 and 2) in the pretest and his reasoning improved in the posttest.
I asked all the other interviewed students the above question (in different contexts
sometimes) and found that students who did not perform well in the posttest could not answer
the question.
In summary, it seemed that the interviews did not give me more information about
their reasoning than what I had already observed in the pre- and posttest. My limited
experience in interviewing might be a factor. However, I considered the one particular student
84
I described above as one indication of positive effects of the teaching approach. He had some
behavioral problems (meaning that he did not really take school seriously, for example the
teacher told me he faked the parent‟s signature on his permission letter on the posttest day)
and I believe that traditional teaching could let his potential go unnoticed and undeveloped.
5.5. Summary and Analysis of the Findings from Pretest and Posttest
5.5.1. Subtest A
First of all, I noticed that students had difficulty with the wording of the questions of this
Subtest. In future use, the wording needs to be made more appropriate for pupils. Secondly,
the performance of students in the experimental and control groups was comparable.
However, the teaching intervention seemed to give students more opportunity to change their
prior understanding of the meaning of variation. For both questions in subtest A, although the
improvement in the experimental group was very limited, positive changes were present. The
traditional teaching seemed to have less impact on students‟ reasoning as there were fewer
students who showed positive improvement in the control group.
Regarding the meaning of the term „variation‟, the majority of students still intuitively
understood it as „diversity‟. From the examples of variation given, I found more examples of
variation from the experimental group that showed a reasoning of connecting „variation‟ and
data distribution. Although students from the experimental group who in their learning about
variation dealt with real data and therefore I expected many such examples in the posttest
result, the number was not as big as I hoped.
I found the two types of meaning of variation that were also present in the results of
Meletiou‟s study (2000), from which the questions were taken. Several university students in
Meletiou‟s study gave definition that indicated their viewing „variability‟ as „variety‟ or
something that takes multiple values. This is similar to Meaning A that I found: „varied or
having many kinds‟. Other students defined „variability‟ by equating it with the mathematical
notion of „variable‟. This is similar to the mistake that students in my study made: equating
„keragaman‟ with „keseragaman‟.
However, for Question 2, the university students in Meletiou‟s study gave reasonable
answers that indicated their understanding when to expect high or low variability. In my
study, the students seemed to misunderstand that the question asked for the desirable
variation. I suspect that this was because of the problem of language proficiency of the
students. I found out afterwards that there were students who did not know the meaning of
„diameter‟. It seemed that understanding a text was still a little challenge for most students. In
85
a future replication of this study, I would make the wording and/or layout of the question
easier for this type of students.
5.5.2. Subtest B
Students from both experimental and control groups did not perform well in Question 3 and 4,
especially in the task of finding the standard deviation. One possible reason was that the stu-
dents did not use calculators to ease the computational process. Qualitative analysis showed
that there was no significant difference in the performance of the students in the experimental
and control group. This indicated that the teaching experiment did not negatively affect the
computational and procedural skills of the students.
5.5.3. Subtest C
The results from the t-tests I performed in this subset are listed in Table 33. Based on the
independent two-sampled t-test I performed in the pretest results, I conclude that there was no
significant difference in the ability level of students in the two groups (3 out of 4 tests showed
no statistically significant differences). The two groups were more or less comparable.
The t-test results for comparing the mean gain between groups did not give strong
indication that the students in the experimental group gained more or improved more in their
reasoning level. However, the paired t-tests results showed that out of four questions, students
in the experimental group seemed to gain more in one and half question, Question 5 and the
closed part of Question 9. This is not a strong indication, but it showed positive potential.
Two-tailed t- test Statistical Significance for Question (Yes/No)
5 6 9 10
difference of mean pretest results between groups
N N Y N
Gain within groups: experimental/ control
Y/ N N/N Y (closed part only)/N N/N
Difference of mean gains between groups
N N not applied N
Table 33. Summary of the t-test results for Question 5, 6, 9, and 10.
Regarding Question 7 and 8, I chose not to perform any statistical tests because I
decided that the categories of reasoning that I employed were not exactly hierarchical. Again,
from these two questions, students from the control group seemed to keep their reasoning
intact. There were few students who changed their reasoning after the learning (See Table 28).
A striking difference is that there were more students in the experimental group who used
formal central measures in the posttest than in the control group. In fact, for Question 8c I
could not find any students‟ responses from the control group that used the mean or median in
their reasoning (See Table 27).
86
Finally, the results indicated that, regardless of the evidence of improvement, the
responses from the students showed that they were mostly at a low level of reasoning about
variation. Many of them were still in the idiosyncratic level or prestructural level (Level 1) of
reasoning after the learning process. The majority of students were found to be at level 2, the
level in which students only used one aspect of the data (see Table 19, 22, 30, and 32). I
hardly ever found a response that belonged to the highest level (relational level or pre-
analytical level), a level which shows good reasoning about variation. Moreover, students
rarely used any formal statistical measures when dealing with data.
One possible reason to why the teaching experiment did not show strong indication
was that the students in the experimental group were only exposed to one type of data: the
growth data. Jones et.al. (2004) concluded that students need to have experiences with differ-
ent kinds of data to help them move from their idiosyncratic descriptions. In the original plan,
I designed another activity with data on students spending time on activities. However, the
plan could not be executed due to time constraint.
Another possible cause was, as in my analysis of the teaching experiment indicated,
the unfamiliarity of the students with open-ended problems. Another research in a similar
setting seems to confirm this indication: Sharma (2006) studied 14 to 16 year-old Fijian
students‟ reasoning in understanding data in the form of tables and graphs through individual
interviews. In one of his interview tasks, students were asked to compare the temperature data
of two cities in Fiji and conclude which city is warmer. The question was similar to Question
6 in my pretest, the difference was that he presented the data in a table. Out of five students,
four students answered the question based on their everyday experiences (idiosyncratic or
prestructural level in my categories). The students in Sharma‟s study were also accustomed to
mathematics tasks that expect one single correct answer and were not accustomed in
expressing their reasoning verbally.
Finally, I think that students‟ insufficient reasoning about central measures and histo-
grams played a role too. It is not uncommon to find that students do not use any statistical
measures in comparing data, even after having it taught at class (cf., Gal et al., 1989; Sharma,
2006). Misconceptions that Lee & Meletiou-Mavrotheris (2003) found about students‟ rea-
soning in comparing two histograms, for example seeing height of bars as a case value instead
of a frequency, appeared also in my study. I hoped to also improve the understanding of the
central measures and histograms through the teaching experiment. But perhaps due to the
short teaching and the issues described in chapter four, the experiment has not revealed sig-
nificant improvement in students‟ understanding of central measures and histograms.
87
6. Conclusions and Discussions
I conclude this thesis by answering my research question and reflection question, describing
the limitation of my study, and making recommendation for future research.
6.1. Conclusions
Research Question
To what extent did the student-centred teaching of variation using real data and open-ended
tasks help to improve Indonesian social science stream (IPS) students‟ reasoning about
variation?
My overarching answer to this question is that the student-centred teaching of variation using
real data and open-ended tasks provided the social science students in this particular study a
more conducive learning opportunity to develop their reasoning about variation. In all
questions in the pretest and posttest, students in control group showed little change in their
answers. While in the experimental group, I observed more change between the pretest and
the posttest, albeit not as big as I had hoped. But this is most probably due to the short
duration of the teaching experiment. On the other hand, students of the control group seemed
to keep whatever ideas they had prior to the teaching of variation. The traditional teaching
seemed to neither add nor change students‟ reasoning.
To go in more details, I come back to my framework of two knowledge areas
(Garfield& Ben-Zvi, 2005) that I hoped the teaching experiment helps to develop (see
Appendix A).
Developing Intuitive Ideas of Variation
In this knowledge area of variation, I can focus on two things, namely whether students were
able to:
1. see that variation is present in both qualitative and quantitative variables and to see
data as an entity or aggregate.
2. see that variation can be expected to be high or low depending on the sources and
context of the variation.
In the first question of the pre- and posttest, I asked the students about their definition of
„variation‟ and/or an example. From the responses, developing ideas of data as an entity or
aggregate is indeed not an easy task for students. Partly due to the justification I gave in the
88
pretest (see Chapter 5, p. 52), the majority of students in the experimental group and the
control group defined „variation‟ as „varied or having many kinds‟, which indicates the idea
of variation in qualitative variables. Therefore I looked more closely to the examples the
students had given. In the control group, I only observed one variable: marks. I saw more
examples of quantitative variables in the posttest results of the experimental group, for
examples human height and weight, sizes of Durian fruit, and marks. However, those
variables are variables that the students and I talked about in the classroom. Therefore, I
conclude that the teaching approach gives more opportunity for teachers and students to
discuss data as an entity or aggregate. For example, sizes of Durian fruit came up in
discussion unplanned. In particular, the use of real data can start students‟ thinking of
„variation‟ as „an entity in a distribution‟, instead of only „variation of individual values‟.
Regarding the ability of students to reason about the desirable variation, the majority
of students from both groups did not show any consideration of variation. However, the
quantitative analysis indicated that the individual positive changes in the experimental group
were bigger than that in the control group.
Using Variation to Make Comparisons
Reasoning about variation is a long and gradual process. The majority of students from both
groups were at a low level of reasoning: Level Unistructural in the SOLO taxonomy of Biggs
and Collis (1982) or Level Transitional in Mooney‟s framework (2002). At this level, the
students were mostly using only one aspect of data sets in making comparisons within or
between groups, for example using the extreme values or some standard values. Despite this,
the results of the quantitative analysis showed that the mean gain of the experimental group in
some of the questions (Question 5 and 9) was statistically significantly while the mean gain of
the control group was not statistically significant. Thus, the students in the experimental
group seemed to gain more. From the qualitative analysis, the teaching approach seemed to
help students start using the mean in comparing data sets.
In summary, the teaching approach provided students a more conducive learning
opportunity to develop ideas of variation as an entity or seeing variation in quantitative
variables. Students also started to develop ideas of using the mean for comparisons within or
between groups. In more exposure to the teaching approach, I am optimistic that the
improvement of students‟ reasoning about variation would be more significant.
89
Reflection Question
What recommendations for the teaching of measures of variation in Indonesian secondary
school curriculum followed from the teaching experiment?
I answer this question through my own perspectives of both a researcher and a beginning
teacher. As a researcher, who designed this study, I identified and selected appropriate
teaching concepts and principles (see Subsection 2.3). Upon implementation I believed those
teaching principles would help students to develop their statistical reasoning, particularly
reasoning about variation. On the other hand, I am also a beginning teacher who had no
experience with students in a social science stream. In this conclusion, I select several
principles I have tested and reflect on my teaching experiment in order to come up with
recommendations.
1. The Use of Real Data: Human Growth
In my study, I tested the suggestion to use real data within a context instead of artificial data
which are meaningless and merely numbers without context. The feedbacks from the
questionnaires were positive. The majority of students did consider the use of real data as
follow:
makes the learning of statistics interesting and fun;
makes it easier to understand the concepts (of standard deviation);
makes the students see the real-life application of statistics.
Unfortunately the context of human growth was not a familiar context to the particular
students in my study. To some extent, this unfamiliarity was one of factors that affected the
students‟ engagement in the activities. This unfamiliarity problem had been less likely to
happen, had I known the students background (that is, if the context is chosen by the regular
teacher who knows the students‟ background very well). Therefore, I recommend the use of
real data in the teaching of statistics within a context which is close to the students‟
experience and background.
2. The Context of Teaching Variation: Doing Data Analysis
In my study I asked the students to do (real) data analysis of some given data sets in the first
two lesson activities and then to create their own data in their activities. I wanted to test which
sequence of activities to do in data analysis activities. Should we start with letting students
analyze their own data or with analyzing given data sets? Unfortunately, the third activity in
my plan was not realized so I cannot comment much on this sequence. However, after the first
90
lesson, the cooperating teacher was inspired to do similar activities with the data from the
students themselves: let students measure themselves. Will it work better?
From the teaching experience in the study, the main problem I faced was that the
students‟ prerequisite knowledge was not enough. First, the students were to some extent
capable in procedural knowledge of central measures, namely the mean and median, but they
were not able to reason with it. In comparing data sets, the students usually used extreme
values or some values related to their experience. The use of median did not appear.
Secondly, the students‟ understanding of histograms was poor. The usual teaching separates
data representation, central measures, and measures of variation and this is probably the cause
that students could not connect all these concepts when dealing with data.
In this study, the students had been taught the concepts of data representation and cen-
tral measure in a teacher-centred approach and started doing data analysis in the learning of
the concepts of variation. Doing data analysis for the first time in reasoning about the standard
deviation, which is one of the most difficult basic concepts, seemed to be problematic. This is
especially because I linked the central measures and measures of variation in the activities. I
recommend doing data analysis right from the beginning of the statistics unit without sepa-
rating the data representation, central measures, and measures of variation, and I recommend
having student collecting their own data to analyze.
3. The Teaching Approach: Student-Centredness, Open Tasks, Group Work and Linking Data
Representations, Central Measures and Measures of Variation
The first three components of the teaching approach were new to the students and to me (as
the teacher), and I believe this is why the lessons did not go as well as I had hoped and
expected. The participating students were students who had not performed well in mathemat-
ics. When they first dealt with open tasks, I could see that they were unsure of what to do be-
cause they were used to closed tasks. For example, the question “what is the mean of the fish
weight?” is more familiar and less confusing to them than the question “what is the true
weight of the fish?” In working in groups, students tended to work individually and went to
other groups for answers checking. In groups of students who are low-achievers, it was even
worse because there was not a member who was willing to be in charge. As their teacher, I
also had difficulties in managing 10 groups. As all groups did not work very well, I had diffi-
culties to give assistance or scaffolding to all groups in due time.
Despite the difficulties, the students in the experimental group did not perform worse
than the students in the control group, even regarding the procedural knowledge. In addition,
the group work was getting better in the third lesson. Reflecting upon my experience, I can
91
recommend this teaching approach. What must be taken into account is that teachers need to
be patient in the first time, in the sense that relaxing the time schedule to finish the tasks,
especially if the students suffer from mathematics anxiety. Regarding curriculum demands
and time constraints, teachers can stress on the procedural knowledge in the homework, as
long as they make sure students care about their learning and do the homework.14
6.2. Limitations of my Study and Suggestions for Future Research
There are two main limitations of my study. Firstly, the study was conducted in a short period
of time. In fact, it was conducted toward the end of semester, near to the exam period. I had to
borrow other teachers‟ classes to complete the activities, especially for the control group
which started the lessons about variation a little bit later than the experimental group.
Secondly, the study only involved students from a social science stream. It would give a more
complete picture of students‟ reasoning if the study also involved students from natural-
science stream or language-stream. Thirdly, the control and experimental group were taught
by different teachers. Cognitively, the students are comparable. However, based on my
observation, there seemed to be differences in behaviour, for example in how serious they
considered their mathematics learning in classes. Fourthly, the language of instruction in this
study was the second language of the students, not in their everyday speech used in the
district. The socio-economic condition of the students had not given them enough exposure to
the language of instruction used. All these limitations restrict a generalization in my
conclusions.
Therefore, to obtain generalization, a longitudinal study or a large cross-sectional
study would give much better information about aspects of the best teaching practice and
developments of students‟ statistical reasoning. The results of this case study have indicated
that using real data analysis in a socio-constructivist approach is promising, even to the
students in this study who could be categorized as students with no optimal prior education. It
would be enlightening to compare how different students from other streams would perform
with the students of the social science stream in this study or to do cross-sectional studies with
students from social science stream in many regions. Longitudinal study could also address
the issue I dealt with in the teaching when the students were new to the experienced socio-
constructivist approach.
The limitation of this study also prevented me from using intensive ICT in the
teaching. It would also be informative to see whether the teaching approach deployed here
combined with more intensive use of relevant ICT could lead to better results. Further study
14
In my teaching experiment, students did not finish their homework.
92
on the benefit of using real data analysis in a socio-constructivist approach plus ICT might
give us a better idea of students‟ reasoning about variation.
Finally, in this study I deliberately separated the statistics unit and probability unit in
the learning of variation. I did not use probability contexts in the teaching and learning due to
the structure of Indonesian curriculum. Trying out the approach I used here in probabilistic
contexts might be beneficial to the students in broadening their understanding of and
reasoning about variation.
93
References Badan Litbang Puskur. (2007). Kajian Kebijakan Kurikulum Mata Pelajaran Matematika. Indonesia:
Ministry of National Education. Retrieved 6 September 2009 from
www.puskur.net/download/prod2007/50_kajian kebijakan kurikulum matematika.pdf
Batubara, J., Alisjahbana, A.,Gerver-Jansen, A.J.G.M., Alisjahbana, B., Sadjimin, T., Juhariah, Y.T.,
Tririni, A., Padmosiwi, W.I., Listiaty, T., Delemarre-van de Waal, H.A., & Gerver, W.J. (2006).
Growth Diagrams of Indonesian Children. The Nationwide Survey of 2005. Paediatrice Indonesiana,
46 (5-6), 118-126.
Ben-Zvi, D., & Garfield, J. (Eds) (2004). The Challenge of Developing Statistical Literacy, Reasoning,
and Thinking. Dordrecht: Kluwer Academic Publishers.
Biggs, J.B., & Collis, K.F. (1982). Evaluating the Quality of Learning: The SOLO taxonomy
(Structured of the Observed Learning Outcome). London, UK: Academic Press.
Chance, B.L. (2002). Components of Statistical Thinking and Implications for Instruction and
Assesment. Journal of Statistics Education, 10 (3). Retrieved 27 July 2009 from
www.amstat.org/publications/jse/v10n3/chance.htm
Cobb, G.W., & Moore, D.S. (1997). Mathematics, Statistics, and Teaching. The American
Mathematical Monthly, 104 (9), 801-823.
Cooper, L. L., & Shore, F. S. (2008). Students‟ Misconceptions in Interpreting Center and Variability
of Data Represented via Histograms and Stem-and-Leaf Plots. Journal of Statistics Education, 16 (2).
Retrieved 28 July 2009 from www.amstat.org/publications/jse/v16n2/cooper.html
Curcio, F.R. (1987). Comprehension of Mathematical Relationships Expressed in Graphs. Journal for
Research in Mathematics Education, 18(5), 382-393.
Gal, I., Rothschild, K., & Wagner, D.A. (1989). Which group is better? The Development of Statistical
Reasoning in School Children. Paper presented at the meeting of the Society for Research in Child
Development, Kansas City, KS. Retrieved 21 July 2009 from www.eric.ed.gov/PDFS/ED315270.pdf
Garfield, J. (2003). Assessing Statistical Reasoning. Statistics Education Research Journal, 2 (1), 22-
38.
Garfield, J., & Ben-Zvi, D. (2005). A Framework for Teaching and Assessing Reasoning about
Variability. Statistics Education Research Journal, 4 (1), 92-99.
Garfield, J., & Ben-Zvi, D. (2007). How Students Learn Statistics Revisited: A Current Review of
Research on Teaching and Learning Statistics. International Statistical Review, 75 (3), 372-396.
Garfield, J., & Ben-Zvi, D. (2008). Developing Students‟ Statistical Reasoning. New York: Springer
Verlag.
Groth, R.E. (2003). Development of A High School Statistical Thinking Framework. Dissertation.
Retrieved from: www.stat.auckland.ac.nz/~iase/publications/dissertations/03.groth.disertation.pdf
Jones, G.A., Langrall, C.W., Mooney, E.S., & Thornton, C.A. (2004). Models of Development in
Statistical Reasoning. In J. Garfield, & D. Ben-Zvi (Eds.). The Challenge of Developing Statistical
Literacy, Reasoning, and Thinking (pp. 97-117). Dordrecht: Kluwer Academic Publisher.
94
Jones, G. A., Langrall, C. W., Thornton, C.A., Mooney, E.S., Wares, A., Jones, M.R., Perry, B., Putt,
I.J., & Nisbet, S. (2001). Using Students‟ Statistical Thinking to Inform Instruction. Journal of
Mathematical Behavior, 20 (1), 109-144.
Jones, G.A., Thornton, C.A., Langrall, C.W., Mooney, E.S., Perry, B., & Putt, I.J. (2000). A
Framework for Characterizing Children‟s Statistical Thinking. Mathematical Thinking and Learning,
2(4), 269-307.
Lee, C., and Meletiou-Mavrotheris, M., (2003). Some Difficulties of Learning Histograms in
Introductory Statistics. In 2003 Proceedings of the American Statistical Association, Statistics
Education Section , pp. 2326 - 2333. Alexandria, VA: American Statistical Association. Retrieved
from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.8456&rep=rep1&type=pdf
Konold, C. & Pollatsek, A. (2002). Data Analysis as the Search for Signals in Noisy Processes.
Journal for Research in Mathematics Education, 33 (4), 259-289.
Meletiou, M.M. (2000). Developing Students‟ Conceptions of Variation: An Untapped Well into Sta-
tistical Reasoning. Dissertation. Retrieved from:
www.stat.auckland.ac.nz/~iase/publications/dissertations/00.Meletiou.Dissertation.pdf
Mooney, E.S. (2002). A Framework for Characterizing Middle School Students' Statistical Thinking.
Mathematical Thinking and Learning, 4 (1), 23-63.
Moore, D.S. (1997). New Pedagogy and New Content: The Case of Statistics. International Statistical
Review, 65 (2), 123-137.
Moore, D.S., & McCabe, G.P. (2005). Introduction to The Practice of Statistics (5th Ed.). New York:
W.H. Freeman & Company.
Reading, C., & Shaugnessy, M.J. (2004). Reasoning About Variation. In J. Garfield, & D. Ben-Zvi
(Eds.). The Challenge of Developing Statistical Literacy, Reasoning, and Thinking (209-226).
Dordrecht: Kluwer Academic Publisher.
Sembiring, R.K., Hadi, S., & Dolk, M. (2008). Reforming Mathematics Learning in Indonesian
Classrooms through RME. ZDM Mathematics Education, 40 (6), 927-939.
Sharma, S. (2006). High School Students Interpreting Tables and Graphs: Implications for Research.
International Journal of Science and Mathematics Education, 4 (2), 241-268.
Shaugnessy, J.M. (2007). Research on Statistics Learning and Reasoning. In F.K. Lester, Jr. (Ed.),
Second Handbook Of Research On Mathematics Teaching And Learning (pp. 957-1009). Charlotte,
NC: Information Age Publishing.
Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data Handling. In A.J. Bishop, K. Clements, C.
Keitel, J. Kilpatrick, & C. Laborde (Eds.), International Handbook of Mathematics Education (pp.
205–237). Dordrecht: Kluwer Academic Publishers.
Sumanto, Y.D., Kusumawati, H., & Aksin, N. (2008). Gemar Matematika 6 untuk SD/MI Kelas VI.
Jakarta: Pusat Perbukuan Departemen Pendidikan Nasional.
Watson, J.M., Kelly, B.A., Callingham R.A., & Shaugnessy, J.M. (2003). The Measurement of School
Students‟ Understanding of Statistical Variation. International Journal of Mathematical Education in
Science and Technology, 34 (1), 1-29.
Web ARTIST Project. (2005). Comprehensive Assessment of Outcomes for a first course in Statistics
(CAOS) 4. Retrieved 28 July 2009 from https://app.gen.umn.edu/artist/
95
Extended Bibliography Ben-Zvi, D. (2004). Reasoning about Variability in Comparing Distributions. Statistics Education
Research Journal, 3 (2), 42-63.
Cobb, P. (1999). Individual and Collective mathematical Development: The Case of Statistical Data
Analysis. Mathematical Thinking and Learning, 1 (1), 5-43.
Delmas, R., Garfield, J., Ooms, A., & Chance, B. ( 2007). Assessing Students‟ Conceptual Under-
standing After a First Course in Statistics. Statistics Education Research Journal, 6 (2), 28-58.
Garfield, J. (2002). The Challenge of Developing Statistical Reasoning. Journal of Statistics Educa-
tion, 10 (3).
Groth, R.E. (2005). An Investigation of Statistical Thinking in Two Different Contexts: Detecting a
Signal in a Noisy Process and Determining a Typical value. Journal of Mathematical Behavior, 24 (2),
109-124.
Hancock, C. (1992). Authentic Inquiry with Data: Critical Barriers to Classroom Implementation.
Educational Psychologist, 27 (3), 337-364.
Lehrer, R., & Schauble, L. (2000). Inventing Data Structures for Representational Purposes:
Elementary Grade Students‟ Classification Models. Mathematical Thinking and Learning, 2 (1&2),
51-74.
Lehrer, R., & Schauble, L. (2004). Modeling Natural Variation through Distribution. American
Educational Research Journal, 41 (3), 635-679.
Lehrer, R., & Kim, M. (2009). Structuring Variability by Negotiating its Measure. Mathematics
Education Research journal, 21 (2), 116-133.
Pfankuch, M. (2005). Thinking Tools and Variation. Statistics Education Research Journal, 4 (1), 83-
91.
Makar, K., & Confrey, J. (2005). “Variation-Talk”: Articulating Meaning in Statistics. Statistics
Education Research Journal, 4 (1), 27-54.
Moore, D.S. (1990). Uncertainty. In L. Steen (Ed.) On the Shoulders of Giants: New Approaches to
Numeracy (pp. 95-137). Washington, DC: National Academy Press.
Reading, C. (2004), Student Description of Variation while Working with Weather Data. Statistics
Education Research Journal, 3 (2), 84-105.
Reading, C., & Reid, J. (2006). An Emerging Hierarchy of Reasoning about Distribution: From a
Variation Perspective. Statistics Education Research Journal, 5 (2), 42-68.
Torok, R., & Watson, J. (2000). Development of the Concept of Statistical variation: An Exploratory
Study. Mathematics Education Research Journal, 12 (2), 147-169.
Watson, J.M., & Kelly, B.A. (2002). Variation as part of chance and data in Grades 7 and 9, in B.
Barton, K.C. Irwin, M. Pfannkuch and M.O.J. Thomas (Eds.). Mathematics Education in the South
Pacific: Proceedings of the 25th Annual Conference of the Mathematics Research Group of
Australasia, Auckland, NZ, Vol. 2, MERGA Sydney, 682-689.
96
97
List of Appendices
Appendix A. Garfield and Ben-Zvi‟s Framework for Assessing Reasoning about Variability.
Appendix B. Students‟ Activity Sheets.
Appendix C. Pretest/Posttest.
Appendix D. The Questionnaire
98
99
Appendix A. Garfield and Ben-Zvi’s Framework for Assessing Reasoning about Variability
Garfield, J., & Ben-Zvi, D. (2005). A Framework for teaching and Assessing Reasoning about
Variability, Statistics Education Research Journal, 4(1), 92-99.
1. Assessment - Developing intuitive ideas of variability
Items that provide descriptions of variables or raw data sets (e.g., the ages of children in a grade school, or the height of these children) and asking students to describe variability or shape of distribution.
Items that ask students to make predictions about data sets that are not provided (e.g., if the students in this class were given a very easy test, what would you predict for the expected graph and expected variability of the test scores?).
Given a context, students are asked to think of ways to decrease the variability of a variable (e.g., measurements of one students’ jump).
Items that ask students to compare two or more graphs and reason about which one would have larger or smaller measures of variability (e.g., Range or Standard Deviation).
2. Assessment - Describing and representing variability
Items that provide a graph and summary measures, and ask students to interpret it and write a description of the variability for each variable.
Items that ask students to choose appropriate measures of variability for particular distributions (e.g., IQR for skewed distribution) and select measure of center that are appropriate (e.g., median with IQR, mean with SD).
Items that provide a data set with an outlier that ask students to analyze the effect of different measures of spread if the outlier is removed. Or, given a data set without an outlier, asking students what effect adding an outlier will have on measures of variability.
Items that ask students to draw graphs of distributions for data sets with given center and spread.
3. Assessment - Using variability to make comparisons
Items that present two or more graphs and ask students to make a comparison either to see if an intervention has made a difference or to see if intact groups differ. For example, asking students to compare two graphs to determine which one of two medicines is more effective in treating a disease, or whether there is a difference in length of first names for boys and girls in a class.
Items that ask students which graph shows less (or more) variability, where they have to coordinate shape, center, and different measures of spread.
4. Assessment - Recognizing variability in special types of distributions
Items that provide the mean and standard deviation for a data set that has a normal distribution and students are asked to use these to draw graphs showing the spread of the data.
Items that provide a scatterplot for a specific bivariate data set and students have to consider if values are outliers for either the x or y variables or for both.
Items that provide graphs of bivariate data sets where students are asked to determine if the variability in one variable (y) can be explained by the variability in the other variable (x).
100
5. Assessment - Identifying patterns of variability in fitting models
Items that ask students to determine if a set of data appear normal, or if a bivariate plot suggests a linear relationship, based on scatter from a fitted line.
6. Assessment - Using variability to predict random samples or outcomes
Items that provide students choices of sample statistics (e.g., proportions) from a specified population (e.g., colored candies) for a given sample size and ask which sequence of statistics is most plausible.
Items that ask students to predict one or more samples of data from a given population.
Items that ask students which outcome is most likely as a result of a random experiment when all outcomes are equally likely (e.g., different sequences of colors of candies)
Items that ask students to make conjectures about a sample statistic given the variability of possible sample means.
7. Assessment - Considering variability as part of statistical thinking
Items that give students a problem to investigate along with a data set, that requires them to graph, describe, and explain the variability in solving the problem.
Items that allow students to carry out the steps of a statistical investigation, revealing if and how the students consider
101
Appendix B. Students’ Activity Sheets
Activity 1 Am I Normal?
As we have discussed, it is a common practice to check a child‟s height, weight, head
circumference, etc to see whether his or her growth is normal.
In this activity, you will use the following data to create an easy rule to decide whether a boy
or girl of your age is growing normally, based on his or her height and weight.
Below is the data of height and weight of 16-year-old boys and girls in Jakarta. These data
were collected in 2005 by a PhD student of VU University, Amsterdam, for the making of
Indonesian growth chart.
Boys Girls Height (cm) Weight (kg) Height (cm) Weight (kg)
170.2 49.7 156.4 61.5 175.3 43.9 153.5 44.8 168.0 89.7 155.7 46.8 161.4 48.8 160.4 58.7 175.9 62.3 164.0 53.6 169.7 51.4 150.9 45.5 166.2 55.5 154.4 45.2 148.9 82.8 152.8 53.2 163.2 52.2 150.7 38.7 164.4 40.2 157.4 38.7 162.8 62.9 142.7 38.6 159.3 42.7 164.8 48.3 158.9 63.7 165.1 40.2 165.0 46.9 158.6 42.4 172.8 55.0 150.8 44.6 159.9 50.8 146.6 50.1 159.2 46.6 149.1 63.5 169.7 73.3 150.2 44.2 167.3 42.1 157.4 37.4 167.3 40.0 146.9 43.8
a. Make a histogram of the boys height and weight data. From that histogram, what can
you say about boys‟ height and weight? How is the data spread out?
b. Make a rule that allows you to determine whether a 16-year-old boy or girl has a
height that is :
- Very common;
- Still normal, but needs attention;
- Abnormal, does not mean there is a health problem, only need to be checked
up.
102
Explain how your rule works, why it might be a good one, and how it could work in
practice.
If your rule uses numbers, you must explain how you compute that numbers.
c. Make an easy visual aid (for example, a table or a diagram) that allows you to:
- Quickly apply your rule;
- Explain it to others, for example, to your classmates or your parents.
103
Activity 2a Who is Taller?
In 2005, a PhD student conducted a study about the growth of Indonesian children and he
created a growth chart. Below is the histogram of boys height data collected in Jakarta for this
study.
174168162156150
12
10
8
6
4
2
0
Height (cm)
Fre
qu
en
cy
of
he
igh
t
Histogram of Boys' Height in Jakarta
2.a. Compute the mean and standard deviation of the boys height in Jakarta, based on the
histogram above. Show your work/computation.
104
Activity 2b Who is Taller?
In 2007, the Ministry of Health had a social survey carried out in all provinces of Indonesia.
This survey covered many topics in public health. The histogram below shows the height data
of boys in Bengkulu obtained from this survey.
177
174
171
168
165
162
159
156
153
150
147
144
141
138
135
132
129
126
123
120
117
114
111
108
105
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
Height (cm)
Fre
qu
en
cy
of
He
igh
t
Histogram of Boys' Height in Bengkulu
The mean of the raw data of boys’ height in Bengkulu is 154.7.
On the next page are shown two histograms of boys height in Bengkulu and Jakarta. Use
these histograms to answer the following questions.
2b.1. Without doing any computation, based only two histograms in the next page, is
the standard deviation of Bengkulu’s data higher than the standard deviation of
Jakarta’s data (6.1)? Give your reasons.
2b.2. Now check your answer for (a) by computing: What is the standard deviation of the
boys’ height in Bengkulu?
2b.3. Can you conclude that boys in Jakarta are taller that boys in Bengkulu? Explain your
reason.
2b.4. What makes the histogram of boys’ height in Bengkulu looked like the above
histogram?
105
177
174
171
168
165
162
159
156
153
150
147
144
141
138
135
132
129
126
123
120
117
114
111
108
105
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
Height (cm)
Fre
qu
en
cy
of
He
igh
t
Histogram of Boys' Height in Bengkulu
177
174
171
168
165
162
159
156
153
150
147
144
141
138
135
132
129
126
123
120
117
114
111
108
105
30
28
26
24
22
20
18
16
14
12
10
8
6
4
2
0
Height (cm)
Fre
qu
en
cy
of
He
igh
t
Histogram of Boys' Height in Jakarta
106
Where does the time fly? Homework
I have collected from your questionnaires the data on the time that you spend for various activities.
Analyze the data (on the next page) and work in groups to answer the questions below:
a) On which activity do students in your class spend most time per week? Give your reasons.
b) Which activity is the most popular, that is, the one the most students participate in? Is this the same activity as the one identified in a)?
c) On which activity do students in your class spend the least amount time per week? Give your reasons.
d) Which activity is the least popular? Is this also the one on which students in your class spend the least amount of time in a week?
107
Activity 3
Where does the time fly?
Rural Vs Urban?
In this activity you are to analyze similar data (on the next page) which are collected from SMAN No.5 Bengkulu.
There are 9 activities on the data sheet. Choose just one activity to be analyzed, for example doing homework.
We name the activity you have chosen as activity X.
Compare this data (of activity X) with the data of activity X from your class and answer the following question:
“Who spends more time on activity X: students from Bengkulu city or Lebong?”
Explain your reasons!
108
No Activities Number of Hours Per
Week
Frequency
1. Doing Homework 0-2
3-4
5-6
6-8
8-10
11-12
13-14
2. Reading (not school work) 0-2
3-4
5-6
6-8
8-10
11-12
13-14
3. Playing computer, Play Station or video
games
0-2
3-4
5-6
6-8
8-10
11-12
13-14
4. Watching TV, videos or movies 0-2
3-4
5-6
6-8
8-10
11-12
13-14
5. Playing or listening to music 0-2
3-4
5-6
6-8
109
8-10
11-12
13-14
6. Doing jobs at home 0-2
3-4
5-6
6-8
8-10
11-12
13-14
7. Working for pay outside the home 0-2
3-4
5-6
6-8
8-10
11-12
13-14
8. Participating in sports 0-2
3-4
5-6
6-8
8-10
11-12
13-14
9. Hanging out with friends 0-2
3-4
5-6
6-8
8-10
11-12
13-14
110
Questionnaire Name : __________________ School: _________________________ Age :__________________ Grade : ___________________ In the last week, approximately how much time did you spend on each of the following activities?
No Activities Number of Hours Per Week
1. Doing Homework
2. Reading (not school work)
3. Playing computer, Play Station or video games
4. Watching TV, videos or movies
5. Playing or listening to music
6. Doing jobs at home
7. Working for pay outside the home
8. Participating in sports
9. Hanging out with friends
111
Appendix C. Pretest/Posttest Name: __________________________
Class : __________________________
Do the following problems as carefully and as best as you can.
You may use a calculator if needed.
1. Based on your experience, what does variation mean to you? Give an explanation and/or
an example.
2. For each of the following cases, answer the following question:
“Which is more desirable: high variation or low variation?”
Add your reason.
a. Age of trees in a national forest.
b. Diameter of new tires coming off one production line.
c. Scores on an aptitude test given to a large number of job applicants.
d. Weight of a box milk of the same brand.
112
3. Given the data: 11, 32, 17, 34, 24, 15, 28
For the data above, fill in the table below.
Range
Mean
Median
Standard Deviation
Interquartile Range
4. Below are the data of monthly income.
Monthly Income
( in thousand Euros )
Number
of People
3 – 5
6 – 8
9 – 11
12 – 14
15 – 17
3
4
9
6
2
The mean of the data above is ________.
The standard deviation is _______.
5. Four histograms and two descriptions of data are displayed below.
i. A data set of Mathematics test scores where the test was very easy
ii. A data set of wrist circumferences of newborn female babies (measured in
centimeters).
113
a. Which histogram best matches the data in description (i)? Give your reason.
b. Which histogram best matches the data in description (ii)? Give your reason.
6. Two students who took mathematics tests received the following scores (out of 100):
Student A: 60, 90, 80, 60, 80
Student B: 40, 100, 100, 40, 90
If you had an upcoming mathematics test next week, who would you prefer to be your
study partner, A or B? Why?
7. One day Jeroen caught a very big catfish. He wanted to be sure of the weight of the fish
and therefore he weighed it 7 times on the same scale/balance. Below are the
measurements (in kilogram) that he found:
2.9; 2.7; 5.1; 3.1; 3.0; 2.8; 3.0 kg.
e. How spread out are the measurements he obtained?
114
f. How many kilograms do you think the true weight of the catfish was? Give your
reason.
8. The histogram below shows the number of hours of exercising per week by marketing
staffs of a bank.
876543210
9
8
7
6
5
4
3
2
1
0
Number of Hours
Nu
mb
er
of
Pe
op
le
Histogram of Number of Exercising Hours Per Week
g. Compute the median. _______________
h. Compute the mean. ________________
i. Based on the histogram, what is the typical number of hours of exercising per
week of the staffs in this company? Give your reason.
115
9. Forty college students participated in a study of the effect of sleep on test scores. Twenty
of the students studied all night before the tests in the following morning (no-sleep group)
while the other 20 students (the control group) went to bed by 11.00 pm on the evening.
The test scores for each group are shown in the diagrams below. Each dot on the diagram
represents a particular student‟s score. For example, the two dots above the 80 in the
bottom diagram indicate that two students in the sleep group scored 80 on the test.
• •••
•••
•••
•••
•••
• • • •
30 40 50 60 70 80 90 100
Test Scores: No-Sleep Group
• • • • •••
•••
•••
••
••
••
•
30 40 50 60 70 80 90 100
Test Scores: Sleep Group
Examine the two diagrams carefully.
Which group is better: the sleep group or the no-sleep group? Explain your reasons.
116
Then circle one from the 6 possible conclusions listed below the one you most agree with.
a. The no-sleep group did better because none of these students scored below 35 and the
highest score was achieved by a student in this group
b. The no-sleep group did better because its average appears to be a little higher than the
average of the sleep group.
c. There is no difference between the two groups because there is considerable overlap in
the scores of the two groups.
d. There is no difference between the two groups because the difference between their
averages is small compared to the amount of variation in the scores.
e. The sleep ground did better because more students in this group scored 80 or above.
f. The sleep group did better because its average appears to be a little higher than the
average of the no-sleep group.
10. Below is the histogram of the scores of Mathematics test in two classes.
Scores
9585756555
Class A
Fre
qu
en
cy o
f sco
res
24
21
18
15
12
9
6
3
0
Scores
9585756555
Class B
Fre
qu
en
cy o
f S
co
res
24
21
18
15
12
9
6
3
0
b. Comparing the two histograms, one could infer
i. Variation of scores in Class A is higher variation than in class B. (The scores
in class A vary more than the scores in class B)
ii. Variation of scores in Class B is higher than in class A (The scores in class A
vary more than the scores in class B)
iii. Class A and class B have equal variation.
iv. I don‟t know.
c. Why? Give your reason.
117
Appendix D. The Questionnaire
For statements no1-6, choose one answer that is suitable to your opinion.
1. The use of real data makes the learning of Statistics more interesting and fun.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
2. Analyzing real data makes it easier for me to understand statistical concepts; for example
standard deviation.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
3. The use of real data shows me the application of Statistics in real life.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
4. I am used to like working in groups.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
5. I like working in groups.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
6. The group‟s discussion helps me understanding statistical concepts.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
7. I actively participate in contributing ideas and in group discussions.
a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree
For statement no. 8, give a tick mark √ in the box besides the statement that you agree
with. If you have something to add, please write it in the provided box.
8. I think that the way of learning and teaching in the last two lessons differs from the way of
learning mathematics I usually experience in the following sense:
Using real data, not only artificial numbers.
Using real data, not only artificial numbers.
Demanding students to develop their ideas and then defend those ideas through
correct correct arguments.
Giving students the chance to try solving problems, not directly “telling” the correct
ways. Ways.
Using calculator is allowed.
Others (please write it in the box below)
118
For question 9-11, please fill in your answers in the provided boxes.
9. What are according to you the strengths and/or weaknesses of the last two lessons?
10. What suggestions do you have for future improvement of the last two lessons?
11. Do you have any other comment about the last two lessons? If so, please write it down.
Thank you for filling in the questionnaire!