Exploring Student-Centred Teaching, Open-Ended Tasks, and ... · Exploring Student-Centred...

Exploring Student-Centred Teaching,

Open-Ended Tasks, and Real Data Analysis

to Promote Students’ Reasoning about Variation

Thesis submitted for the MSc in Mathematics and Science Education

Dian Kusumawati

Research Supervisor: drs. André Heck

AMSTEL Instituut

Universiteit van Amsterdam

Science Park 904

1098 XH Amsterdam

The Netherlands

July 2010

ii

iii

Abstract

In this master research thesis I report about a study in which I explored the influence of a

specific approach in teaching variation on the progress and development of students‟

statistical reasoning about variation. A socio-constructivist teaching and learning approach

was designed and tried out in a pretest-posttest experimental-control-group research design.

This was done with students of a social science stream in a secondary school in a rural area of

Indonesia. The teaching approach contained three new key elements, namely, the use of

(1) real data within a context instead of the use of artificial data; (2) open-ended tasks; and

(3) group work. The research results indicated that the experimental teaching approach pro-

vided students a more conducive learning environment for developing statistical reasoning.

Although students from both experimental and control groups were mostly at a low level of

reasoning, the quantitative and qualitative analysis of their response indicated that there were

more students in the experimental group that improved regarding the level of statistical

reasoning. Qualitatively, students in the experimental group began to use central measures in

making their conclusions. Regarding the procedural knowledge, there was no statistically

significant difference in the performance between the two groups. These results and the fact

that the cooperating teacher was ready to adopt the teaching approach have encouraged me to

conclude that the chosen teaching approach has potential to help students develop and

progress with statistical reasoning about variation. Based on the analysis of the teaching

experiment, recommendations for adopting the teaching approach in future practice are given.

iv

v

Acknowledgement

I would like to express my deep gratitude to my supervisor, drs. André Heck, whose valuable

advice, supervision, flexibility, and motivational support have enabled me to complete this

thesis. Many thanks are directed to dr. Mary Beth Key, who has helped me go through this

master program without much troubles.

I thank the cooperating teachers who have accommodated me to do my teaching experiment

in an unusual time in their school plan. I also would like to thank dr. Willem Jan Gerver from

Maastricht University who gave me raw data of a recent Indonesian growth survey for use in

the teaching and learning activities in my study.

Not least importantly, I also thank my fellow students at the AMSTEL Institute and my

friends in Amsterdam for all support they have given me, especially Clea Matson, Lilia

Ekimova, and Budi Mulyono.

Finally, by nature I tend to be easily worried and self-negative. This trait played tricks on me

when finishing the master‟s thesis. I wish to thank my parents, my brother, my sisters, and my

dear friend Dharma, for listening to my worries when the self-negativity engulfed me and

cheering me up all the way. To my mother and father, this thesis is simply dedicated to you.

vi

vii

Table of Contents

ABSTRACT III

ACKNOWLEDGEMENT V

1. INTRODUCTION 1

1.1. Statement of the Problem 1

1.2. The Indonesian Education System 2

1.3. Statistics in the Indonesian Mathematics Curriculum 4

1.4. The Purpose of the Study 6

2. THEORETICAL BACKGROUND 9

2.1. Statistical Literacy, Statistical Reasoning and Statistical Thinking 9

2.2. Assessment of Statistical Reasoning 10 2.2.1. The Structure of the Observed Learning Outcome (SOLO) Taxonomy 10

2.2.2. Statistical Thinking 13

2.2.3. Statistical Reasoning about Variation 15

2.3. Teaching Variation 17 2.3.1. Conceptions about Variability 17

2.3.2. Suggestions from Research Studies about Contexts 18

2.3.3. Principles Underpinning the Design of My Lesson Activities 19

3. RESEARCH DESIGN AND METHODOLOGY 21

3.1. Research Question 21

3.2. Research Setting and Research Methodology 22 3.2.1. The School Setting 22

3.2.2. Research Methodology 23

3.2.3. The Teaching Materials 24

3.3. Research Instruments 26 3.3.1. The Pretest and Posttest 26

3.3.2. The Questionnaire 32

3.3.3. The Interview 33

4. RESULTS AND ANALYSIS OF THE TEACHING EXPERIMENT 35

4.1. Classroom Observations Prior to the Teaching Experiment 35

4.2. The Teaching Experiment 36 Lesson 1: Activity 1 37

Lesson 2: Activity 1 revisited 40

viii

Lesson 3: Activity 2 42

Lesson 4: Activity 2 42

4.3. The Teaching in the Control Group 44

4.4. The Questionnaire 45 The Use of Real Data 45

Group Work 46

The Teaching Approach 46

Students‟ Free Feedback 47

4.5. The Interview with the teacher of the experimental group 48

4.6. Analysis of the Teaching Experiment 50

5. RESULTS AND ANALYSIS OF THE PRETEST AND POSTTEST 53

5.1. Subtest A: Question 1 and 2 53

5.2. Subtest B: Question 3 and 4 60

5.3. Subtest C: Question 5-10 61

5.4. Result of the Interview 82

5.5. Summary and Analysis of the Findings from Pretest and Posttest 84 5.5.1. Subtest A 84

5.5.2. Subtest B 85

5.5.3. Subtest C 85

6. CONCLUSIONS AND DISCUSSIONS 87

6.1. Conclusions 87

6.2. Limitations of my Study and Suggestions for Future Research 91

REFERENCES 93

EXTENDED BIBLIOGRAPHY 95

LIST OF APPENDICES 97

Appendix A. Garfield and Ben-Zvi’s Framework for Assessing Reasoning about

Variability 99

Appendix B. Students’ Activity Sheets 101

Appendix C. Pretest/Posttest 111

Appendix D. The Questionnaire 117

1

1. Introduction In this chapter, I explain the background of my research, including my motivation. Firstly, I

state the background problem in general. Secondly, I describe the Indonesian education

system. Thirdly, I give an overview of the Indonesian statistics curriculum at primary and

secondary level. Finally, I explain the purpose of my research. I hope that my personal aims

and motivation for doing this research can be grasped from the contents of this chapter.

1.1. Statement of the Problem

Statistics has become part of primary and secondary mathematics education curriculum across

the world, although the breadth and depth of its content differ from country to country.

Statistics being considered by many people as a part of mathematics, it is no surprise that

statistics teaching in school practice does not differ much from mathematics teaching.

Therefore, the recent reform efforts in mathematics education based on a constructivist view

of education have also influenced statistics education (cf., Moore, 1997).

Recent reforms in statistics education promote the idea that the focus of teaching and

learning statistics should be on the understanding of statistical concepts, rather than on the

procedural knowledge and skills. As Moore & Cabe (2005, p. xxxi) wrote: “The goal of

statistics is to gain understanding from data.” Thus, students should not merely be able to

compute statistical measures. It is recommended that the focus of the statistical content to be

learned by students is the understanding of statistical ideas and concepts. To this end,

statistics should be less taught by lectures, but more through real data investigation carried out

by students. Cobb & Moore (1997, p. 801) pointed out that the role of context in statistics and

mathematics are different: “Statistics requires a different kind of thinking, because data are

not just numbers, they are number with context”. This requires the introduction of a real

world context in any interesting statistics problem.

What can be said about constructivist approaches in Indonesian mathematics

education? Since 2000, the Indonesian Ministry of Education has enforced a new curriculum

that promotes a constructivist view of education (Badan Litbang Puskur, 2002). Student-

centred teaching and learning is endorsed and the use of ICT in education is promoted.

However, the implementation of the top-down reform has not been successful yet. Teachers

still rely heavily on textbooks, and their teaching and learning still tends to emphasize

formulas and procedures (Sembiring et al., 2008). In other words, rote learning and teacher-

centred activities are still the dominant ways of knowledge transfer. In particular, the in-

service teacher training for this new curriculum has not been effective.

2

If current mathematics teaching in Indonesia still gives students the impression that

mathematics is only about plugging numbers into formulas to get the correct answer of a

problem, then I believe that the impression is even more strongly felt by students toward

statistics, as statistics teaching is still mostly based on textbooks which mainly deal with

formulas, computation and closed problems with artificial data. My belief is in line with Ben-

Zvi & Garfield (2004, p. 4) who wrote: “Students equate statistics with mathematics and

expect the focus to be in numbers, computations, formulas, and one right answer.”

In fact, I can still remember that statistics meant just number plugging to me when I

was a high school student. I first got interested in statistics when I did a course in Regression

Analysis during my bachelor study. Only then, when I got in contact with data analysis, I

started to see that statistics is useful in making decisions and drawing conclusions.

The above problem in statistics education in Indonesia at secondary school level

motivated me in my master‟s study to investigate a different approach of statistics

teaching that helps students improve their understanding of and reasoning about statistical

concepts and ideas, and not just learn how to do statistical computations. The usual teaching

sequence of (1) explaining the formulas, (2) working out examples, and (3) giving procedural

problems has not been a sufficiently successful approach to enable students to reason statisti-

cally at a proficient level. Exploratory data analysis by students seems to me more promising.

My personal experience as an Indonesian secondary school student and as a teacher-

student in mathematics education also motivated me to try out a student-centred approach in

which students would analyze real data. I saw and still see no reason why the students could

not learn how to draw conclusions based on real data and simple descriptive statistics which

they had learned before. The depth of the data analysis can be adjusted to the content of the

Indonesian curriculum.

In my study, I conducted an experiment in a secondary school class in which I tried

out a constructivist approach to learning about variation. I compared the results of the

experimental group with that of a control group, who received traditional teaching. I hoped

and expected that the results of my study could lead to recommendations for teachers and

future teachers and give them ideas and/or suggestions about better ways of teaching the

subject of variation.

1.2. The Indonesian Education System

Based on Law Number 20 [UU no. 20 year 2003] about the Indonesian education system, the

national education system consists of formal, non-formal and informal education. The system

of formal education consists of primary education (Grade 1-9), secondary education

3

(Grade 10-12) and higher education (see Figure 1). The primary education consists of 6 years

elementary school or Sekolah Dasar (SD) and 3 years lower (literally, first) secondary school

or Sekolah Menengah Pertama (SMP). It is free of charge and compulsory for every child of

age between 7 to 15 years. There are two types of secondary education: General Secondary

School or Sekolah Menengah Atas (SMA) and vocational secondary school or Sekolah

Menengah Kejuruan (SMK). In SMA, there are three streams:

Natural science or Ilmu Pengetahuan Alam (IPA)

Social science or Ilmu Pengetahuan Sosial (IPS)

Language or Bahasa.

A student graduates from secondary education through a nationwide examination.

Figure 1. The education system in Indonesia.

Regarding the curriculum, as mentioned in Section 1.1., the government introduced in

2000 a new curriculum, which is competency-based and promotes a constructivist student-

centred approach. In 2005, the government introduced another curriculum called Kurikulum

Tingkat Satuan Pendidikan (KTSP) or Curriculum of an Education Unit Level (Naskah

Akademik KTSP Pendidikan Dasar dan Menengah, Puskur, 2005). KTSP is basically an

extension and diversification of curriculum 2000 in the spirit of school autonomy and local

government autonomy. The curriculum is still competence-based but the central government

gives freedom to every school to develop its own implementation of the curriculum, based

Higher Education

Secondary Education (3 years)

SMA/SMK

Lower Secondary (3 years)

SMP

Elementary School (6 years)

SD

National Examination

National Examination

Primary Education

4

upon the potential of its own students, the social characteristics and the potential of the local

community. The KTSP implementation must conform to the basic structure of the formal

curriculum and the competency standards of graduates dictated by the Ministry of Education.

Personally, I really agree with this curricular scheme and really like the fact that this

means that teachers have freedom to develop their own subject curriculum, which is then later

together with all other subjects compiled into the KTSP implementation of the school. I also

agree that education should be tailored according to the potential of the students. However, in

practice, the teachers and the school still have trouble with design and implementation of their

own curriculum. As mentioned in the first section, the student-centred approach of curriculum

2000 has not yet been adopted by many teachers and in the end, the syllabus and KTSP of

many schools is produced from copying other school‟s KTSP or from an in-service teacher

training event held by the government (Kajian Kebijakan Kurikulum Matematika, Puskur,

2007). Usually only better-facilitated schools in the bigger cities are able to produce their own

curriculum. Efforts are still needed to improve teacher professionalism so that the goal of

accommodating each student‟s needs can be realized. With this research study I hope to

contribute to such standard of education regarding statistical notion of variation.

1.3. Statistics in the Indonesian Mathematics Curriculum

The Indonesian mathematics curriculum is somewhat modular in the sense that each big

mathematical concept is taught in a separate chapter of the textbook. Once it has been

completed, students are not likely to touch upon the subject again for a while, except for

reviewing or refreshing purposes when needed as prerequisites of subsequent book chapters.

The Indonesian curriculum is also a spiral curriculum in the sense that at every higher level of

education, the breadth and depth of a big concept are increased.

At each level of schooling, elementary, lower and upper secondary level, there is a

book chapter about statistics (see Table 1). In elementary school (SD), statistics is taught in

grade 6, in the first semester, under the topic of „Data Analysis‟. In this grade, students mainly

learn to analyze data in simple ways, to present data in simple graphs and tables and to

interpret them. I reviewed one book from the government (Sumanto et al., 2008) and in this

textbook; the measure of centre is indeed not present. However, my personal communication

with a primary school teacher revealed that the common measures of centre, namely mode,

median and mean, are taught in reality because they usually appear in the final school

examination.

In lower secondary school (SMP), statistics is taught in Grade 9, first semester, under

the topic „Statistics and Probability‟. The students learn more ways to represent data and, in

addition, they learn about central measures. Moreover, probability is introduced.

5

Finally, in upper secondary school (SMA/SMK), statistics is taught in grade 11, under

the topic of „Statistics and Probability‟. In Table 1, it is shown that measures of dispersion are

included in the contents of statistics in SMA. Furthermore, the probability content is more

advanced compared to that in SMP. Regarding the standard contents of statistics, the three

streams have the same statistics contents, but the contents of probability differ. Students in

natural science stream (IPA) learn more about probability. Another difference is the time allo-

cation for learning this topic. For IPA students, the topic „Statistics and Probability‟ is only

one out of three topics in the first semester. On the other hand, students in the social science

stream (IPS) only learn this topic for the whole semester and students in the language stream

(Bahasa) have the whole year to learn Statistics and Probability. I believe that the underlying

idea is to adjust the pace of mathematical learning of students from stream to stream.

Level Standard Competency Explanation of Standard Competency

SD Grade 6

Semester 1

Data Analysis

Collecting and analyzing

data

Collect and read data

Analyze and present data in table form

Interpret data representations

SMP Grade 9

Semester 1

Statistics and

Probability

Analyzing and presenting

data

Understanding the

probability of simple events

Determine the mean, median, and mode

Present data in tabular forms, bar chart,

line graph and pie chart

Determine the sample space of an

experiment

Determine the probability of a simple

event

SMA IPA

Grade 11

Semester 1

Statistics and

Probability

Using the rules of statistics,

counting rules, and

properties of probability in

problem solving

Read data in tabular forms, bar chart, line

graph, pie chart, and ogive

Present data in forms of table, bar chart,

line graph, pie chart, and ogive, and

interpret them

Compute measures of centre, location

and dispersion, and interpret them.

Use the rules of multiplication,

permutation and combination in problem

solving


experiment

Determine the probability of an event

and interpret it (the meaning)

Table 1. The standard curriculum of statistics and probability.

6

SMA IPS

Grade 11

Semester 1

Statistics and

Probability

Using rules of statistics,

counting rules, and properties

of probability in problem

solving

Read data in forms of table, bar chart,

line graph, pie chart, and ogive

Present data in forms of table, bar

chart, line graph, pie chart, and ogive

and interpret them




permutation and combination in

problem solving


experiment



SMA Bahasa

Grade 11

Statistics and

Probability

Semester 1

Analyzing, presenting,

and interpreting data

Read data in forms of table, bar chart,

line graph, pie chart, and ogive

Present data in forms of table, bar

chart, line graph, pie chart, and ogive

and interpret them



Semester 2

Using counting rules to

determine the probability of

an event and interpret it.


permutation and combination in

problem solving


experiment



Table 1. (continued).

1.4. The Purpose of the Study

In my research, I chose one main topic in the secondary school curriculum: measure of

dispersion/variation. Unlike central measures such as mean, research on the notion of measure

of variation is rather limited. This is unfortunate since variation can be considered as one of

the basic concept of statistical thinking (Cobb & Moore, 1997, p. 801).

I conducted a small teaching experiment about the notion of variation and the measure

of dispersion (specifically standard deviation) in one school in a class of IPS students.

Another class of IPS students at the same school acted as a control group in which the usual

teacher-centred approach was applied. My research was in essence a case study, the results of

which depended much on the characteristics of the students in this particular school where

this experiment took place. This implies that at this stage of the research no easy

generalization of the results could be obtained. I designed the research study such that the

teaching experiment can be repeated in other regular classes of SMA in Indonesian schools in

order to obtain in future more general results.

7

In my research, I investigated and compared students‟ statistical reasoning about

variation prior and after teaching the topic of variation and measure of dispersion. One of my

other objectives was that the results would lead to recommendations to mathematics teachers

regarding statistics teaching and learning, and hopefully would have a positive effect on

statistics teaching in Indonesia.

8

9

2. Theoretical background In my study, I wanted to investigate to what extent my teaching and learning approach

affected students‟ statistical reasoning, especially reasoning about variation. In this section, I

present a definition and assessment framework of statistical thinking and reasoning taken

from research literature and that I used for my research. I also summarize suggestions from

research literature regarding teaching statistics, in particular teaching about variation, that

were taken into account in my study.

2.1. Statistical Literacy, Statistical Reasoning and Statistical Thinking

Research in statistics education in the last two decades has changed direction from research

on misconceptions about statistical concepts or ideas into research focusing on how students

learn and reason about statistical concepts (Shaughnessy, 2007). Many researchers in the field

of statistics education classify this research into three big ideas: statistical literacy, statistical

reasoning, and statistical thinking. There is no formally agreed definition yet of these three

ideas. Ben-Zvi & Garfield (2004, p. 7) defined them as follows:

“Statistical literacy includes basic and important skills that may be used in under-

standing statistical information or research results. These skills include being able to

organize data, construct and display tables, and work with different representations of

data. Statistical literacy also includes an understanding of the concepts, vocabulary,

and symbols, and includes an understanding of probability as a measure of uncer-

tainty.

Statistical reasoning may be defined as the way people reason with statistical

ideas and make sense of statistical information. This involves making interpretation

based on sets of data, representation of data, or statistical summary of data. Statistical

reasoning may involve connecting one concept and another (e.g., center and spread),

or it may combine ideas about data and chance. Reasoning means understanding and

being able to explain statistical processes and being able to fully interpret statistical

results.

Statistical thinking involves an understanding of why and how statistical investiga-

tions are conducted and the „big ideas‟ that underlie statistical investigations. These

ideas include the omnipresence nature of variation and when and how to use appro-

priate methods of data analysis such as numerical summaries and visual display of

data. Statistical thinking involves an understanding of the natures of sampling, how

we make inferences from samples to populations, and why design experiments are

needed in order to establish causation…”

Chance (2002) reviewed the literature on the definition of statistical thinking and

concluded that: “Perhaps what is unique to statistical thinking, beyond reasoning and literacy, is the

ability to see the process as a whole (with iteration), including „why,‟ to understand

the relationship and meaning of variation in this process, to have the ability to

explore data in ways beyond what has been prescribed in texts, and to generate new

questions beyond those asked by the principal investigator. While literacy can be

10

narrowly viewed as understanding and interpreting statistical information presented,

for example in the media, and reasoning can be narrowly viewed as working through

the tools and concepts learned in the course, the statistical thinker is able to move

beyond what is taught in the course, to spontaneously question and investigate the

issues and data involved in a specific context.”

Mooney (2002) adopted the definition of statistical thinking from (Shaughnessy,

Garfield, & Greer, 1996) where it means the cognitive actions that students engage in during

the data-handling processes of describing, organizing and reducing, representing, and ana-

lyzing and interpreting data.

From the above definitions, it seemed to me that the definition of statistical literacy,

reasoning and thinking are not mutually exclusive. By statistical reasoning about variation, I

mean the way people reason with variation (as Ben-Zvi and Garfield defined this) and the

way people make use of the concept of variation to investigate issues and data (as Chance

defined statistical thinking). However, to be honest, taking the review of Chance (2002) into

deeper account and carefully reading of Ben-Zvi & Garfield‟s (2004) definition of statistical

thinking, led me to the conclusion that the students in my research project were not much

involved in such type of activity, that is, they did not have to learn about nor carry out a

statistical inquiry. In my teaching experiment I engaged students in data exploration activities

that mainly involved statistical literacy and statistical reasoning in the sense of Ben-Zvi and

Garfield (2004), the two ideas which in my point of views constitute Mooney‟s (2002)

definition of statistical thinking. To avoid confusion in terminology, I prefer in this thesis to

adopt the distinction between literacy and reasoning. When I refer to Mooney‟s statistical

thinking I understand it mostly as statistical reasoning.

2.2. Assessment of Statistical Reasoning

Because I wanted to investigate in my research whether my teaching and learning

approach would improve students‟ reasoning, I searched the literature to find an appropriate

assessment tool, preferably one suitable for classroom practice (i.e., not only suitable in a

small group setting or laboratory setting, and furthermore easy to use by teachers in their

practice). The following subsections are about the assessment framework of statistical

thinking and reasoning from existing literature that I selected for use.

2.2.1. The Structure of the Observed Learning Outcome (SOLO) Taxonomy

The SOLO taxonomy is a neo-Piagetian framework proposed by Biggs and Collis (1982) to

analyze the complexity level at which students carry out tasks and answer questions. The

SOLO taxonomy is not specifically designed for statistics or mathematics, but I discovered

11

that many researchers in statistics education use this taxonomy for characterizing and

assessing the students‟ statistical thinking and reasoning. For this reason I briefly review the

SOLO taxonomy; further details about the general model can be found in Biggs & Collis

(1982) and for its application in statistics education I refer to (Jones et al., 2004; Shaughnessy

et al., 2007) and references therein.

The SOLO taxonomy posits five modes of functioning (similar to Piaget‟s

development stages: preoperational, early concrete, middle concrete, concrete generalization,

and formal operation) and five hierarchical levels of complexity at which tasks can be carried

out in principle (prestructural, unistructural, multistructural, relational, and extended abstract).

The development stage sets mainly the upper limit for the cognitive level that can be reached,

but this does not mean that at thus stage of functioning lower levels of complexity cannot be

observed anymore.

The five levels of complexity of students‟ responses to tasks, usually referred to as the

SOLO levels, are as follows: at the prestructural level, a student avoids the question (denial),

repeats the question (tautology), or engages in the task but is distracted or misled by irrelevant

aspects belonging to an earlier mode of functioning. For the unistructural level, the student

focuses on the relevant domain and picks up on one relevant aspect of the task, running in this

way the risk to come to a limited conclusion or a dogmatic answer only. At the multistructural

level, the student picks up several disjoint and relevant aspects of the task but does not

integrate them and ignores inconsistencies or conflicts in the provided information. At the

relational level, the student integrates the various aspects and produces a more coherent

understanding of the task. At the extended abstract level, the student recognizes that a given

example is an instance of a more general case, that is, (s)he generalizes the structure to take in

new and more abstract features that represent thinking in a higher mode of functioning. As

noted already by Biggs and Collis, only the first four cognitive levels are encountered up to

and including secondary education; one hardly notices the extended abstract level in

classroom practice.

Biggs & Collis (1982) described certain crucial characteristics of each SOLO level in

terms of the dimensions of capacity (the required amount of working memory or attention

span), relating operation (between cue and response), and consistency and closure (no contra-

dictions in the final conclusion). See Table 2, taken from Biggs & Collis (1982, p. 24-25).

12

Developmental

Base Stage with

Minimal Age

SOLO Description

Capacity Relating Operation Consistency and Closure

Extended Abstract

(16+ years)

Extended

Abstract

Maximal: cue +

relevant data +

interrelations +

hypotheses

Deduction and Induction. Can

generalize to situations not

experienced

Inconsistencies resolved. No felt need to give

closed decisions-conclusions held open, or

qualified to allow logically possible

alternatives.

Concrete

generalization

(13-15 years)

Relational High: cue + relevant

data + interrelations

Induction. Can generalize

within given or experienced

context using related aspects

NO inconsistency within the given system, but

since closure is unique so inconsistencies may

occur when he goes outside the system.

Middle Concrete

(10-12 years) Multistructural

Medium: cue +

isolated relevant data

Can “generalize” only in terms

of a few limited and

independent aspects

Although has feeling for consistency can be

inconsistent because closes too soon on basis of

isolated fixations on data, and so can come to

different conclusions with the same data

Early Concrete

(7-9 years) Unistructural

Low: cue + one

relevant datum

Can “generalize” only in terms

of one aspect.

No felt need for consistency, thus closes too

quickly; jumps to conclusions on one aspect,

and so can be very inconsistent.

Pre-operational

(4-6 years) Prestructural

Minimal: cue and

response confused

Denial, Tautology,

transduction. Bound to specifics

No felt need for consistency. Closes without

even seeing the problem.

Table 2. Base stage of cognitive development and response description according to the SOLO Taxonomy (note that the SOLO description in the 2nd

column refers to the

maximum level at the given developmental stage in the corresponding entry in the 1st column).

13

2.2.2. Statistical Thinking

There are several studies in which the main goal was to develop a framework for characteriz-

ing and assessing statistical thinking (in the sense of Mooney, 2002). Below, I will discuss

three of them.

Jones et al. (2000) developed a framework for characterizing elementary children‟s

statistical thinking situated in the SOLO taxonomy. They focused on data handling and used

the following four constructs in their framework were: (1) describing; (2) organizing and

reducing; (3) representing; and (4) analyzing and interpreting data. To characterize students‟

statistical thinking in each of these constructs, they used four levels corresponding with the

first four levels in the SOLO taxonomy:

1) Idiosyncratic: idiosyncratic students are engaged in a task but they are easily distracted or

misled by irrelevant aspects;

2) Transitional : students focus on a single relevant aspect of a task;

3) Quantitative: students can focus on multiple relevant aspects of the task but have

problems in integrating them;

4) Analytical: students are able to make links between different aspects of the task

(demonstrate relational level of thinking).

Jones et al. (2000) conducted their study by analyzing the interviews of sixteen students

(grade 2-5) who responded to several data handling tasks with questions for every construct.

Statistics concepts like average, spread were probed at elementary level and the way children

would work with basic data displays like bar graphs.

Mooney (2002) used the framework of Jones et al. (2000) as his initial framework to

assess middle school students‟ statistical thinking in data handling tasks and extended it with

another level of statistical thinking: Extended Analytical, meaning that students can examine

data from more than one perspective. However, Mooney did not find data that support the

existence of the fifth level in middle school students and thus also used the four levels of

statistical thinking above in his result. The final framework Mooney (2002) is reproduced in

Figure 2. It was actually used for statistics concepts of measures of centre and spread. In this

study, I took an eclectic approach and selected suitable parts of Mooney‟s framework to

assess my students‟ statistical reasoning (see Chapter 4).

14

Figure 2. Mooney‟s framework of middle school students‟statistical thinking.

15

Groth (2003) sought a framework for describing high school students‟ statistical

thinking, when it comes to describing, organizing and reducing, representing, analyzing and

collecting data. Groth conducted a qualitative study to find out characteristics or patterns for

the four constructs that were used by Mooney (2002) and Jones et al. (2000). He developed a

set of statistical thinking tasks and used it in structured, task-based clinical interview sessions

with high school students and recent high school graduates. Students were asked to solve

these statistical thinking tasks and then the students‟ responses were analyzed to define

patterns of responses to questions regarding processes of data handling, applying the SOLO

taxonomy. In my study, I used a modification of Question 8 from his fifth task (Question 7 in

my pretest, see p.25), which was part of a set of questions Groth used to probe students‟

understanding about summarizing data through a measure of centre and measure of spread.

The pattern descriptors for using measures of centre and spread that Groth (2003, p.85, 90)

identified are listed in Table 3.

Four Pattern Descriptors for Using Measures of Centre

A student uses:

1. reasonable formal measures to locate centres of data sets

2. a combination of reasonable formal and visual measures to locate centres of data sets.

3. a combination of formal and visual measures to find centres of data sets, only some of which are reasonable for the given set of data

4. only visual approaches to find centres of data sets, only some of which are reasonable for

the given sets of data.

Three Pattern Descriptors for Using Measure of Spread

A student gives:

1. quantifications and subjective verbal descriptions of spread that are suitable for given sets of data.

2. quantifications and subjective verbal descriptions of spread. Some descriptions or quantifications are not suitable for given sets of data

3. subjective verbal descriptions of spread rather than quantifications

Table 3. Groth‟s pattern descriptors for using measures of centre and spread.

2.2.3. Statistical Reasoning about Variation

The term „variation‟ and „variability‟ are often used interchangeably in research literature (cf.,

Shaughnessy, 2007; Reading & Shaughnessy, 2004). However, Reading & Shaughnessy

(2004, p. 202) made the following distinction between the two terms:

16

“The term variability will be taken to mean the characteristics of the entity that is

observable, and the term variation to mean the describing or measuring that

characteristics. Consequently, … „reasoning about variation‟ will deal with the

cognitive process involved in describing the observed phenomena in situations that

exhibit variability, or the propensity to change. ”

In my study, I adopted the above definition of Reading & Shaughnessy (2004) of statistical

reasoning about variation. But it is noted that in the Indonesian language there is no word for

variation in this sense; only words such as variability, diversity and variety exist in every day

speech. In mathematics textbooks the term variability is also used for variation (in the sense

of Reading & Shaughnessy). Therefore it may happen that I use the words terms inter-

changeably.

In this section I briefly discuss literatures on reasoning about variation. To begin with,

Watson et al. (2003) conducted a study to measure understanding of variation in a chance

setting. They gave questionnaires to students in grades 3, 5, 7, and 9, and from the results of

their analysis initially based on the SOLO taxonomy, they defined four ability levels of

students‟ understanding of variation:

1) prerequisites for variation: working out the environment, table/graph reading,

intuitive reasoning for chance;

2) partial recognition of variation: putting ideas in context, tendency to focus on single

aspects and neglect others;

3) applications of variations: consolidating and using ideas in context, inconsistent in

picking salient features

4) critical aspects of variation: employing complex justification or critical reasoning.

Watson et al. (2003) did not give clear descriptors for each of these levels, but it seemed to

me that these four levels are equivalent to Mooney‟s four levels of statistical thinking (for

example, compare Mooney‟s transitional level with Watson et al.‟s level of partial recogni-

tion of variation). As mentioned in Subsection 2.2.2, I used part of Mooney‟s framework for

assessing students‟ reasoning about variation (see Chapter 4). My teaching design was also

about understanding measures of variation (range, interquartile range, average deviation, and

standard deviation), but I wanted to investigate students‟ reasoning about variation without

specific connection to chance processes or sampling. Nevertheless the above descriptors of

students‟ understanding of variation are of interest to me, but alas too general to apply fruit-

fully in my research. As mentioned in Subsection 2.2.2, I used part of Mooney‟s framework

for assessing students‟ reasoning about variation (see Chapter 4).

17

In addition to the above results of Watson and coworkers, Garfield & Ben-Zvi (2005,

pp. 93-95) identified seven areas of knowledge of variability and seven corresponding assess-

ment areas. The seven areas of knowledge are:

1) developing intuitive ideas of variability;

2) describing and representing variability;

3) using variability to make comparisons;

4) recognizing variability in special type of distributions;

5) identifying patterns of variability in fitting models;

6) using variability to predict random samples or outcomes;

7) considering variability as part of statistical thinking.

I used these areas of knowledge as a guidance to develop my assessment test. However, Areas

4-7 are not relevant to my study, which was situated in Indonesian curriculum. Thus I only

used the first three areas in this framework. For reasons of completeness and possible interest

of readers, the assessment items for all areas are presented in Appendix A.

2.3. Teaching Variation

In this section I present what I consider as the most relevant research literature about teaching

variation. This includes a summary of suggestions made in various studies about teaching

variation in various contexts. At the end of this section I list the principles that I incorporated

in my designed activities.

2.3.1. Conceptions about Variability

Shaughnessy (2007) summarized students‟ conceptions of variability into eight types:

(1) variability in particular values, including extremes or outliers;

(2) variability as change over time;

(3) variability as whole range (the spread) of all possible values;

(4) variability as the likely range of a sample;

(5) variability as distance or difference from some fixed point;

(6) variability as the sum of residuals;

(7) variation as covariation or association;

(8) variation as distribution.

These conceptions resemble the areas of knowledge about variability in the framework of

Garfield & Ben-Zvi (2005), which was discussed in Subsection 2.2.2. In my study I focused

on getting the students so far to develop the conception of variability as type 1, 3, 5, and 8.

18

2.3.2. Suggestions from Research Studies about Contexts

In the statistics education literature one can find several contexts to teach variability. Garfield

(2008, pp. 205-206) reviewed contexts used in mostly exploratory and qualitative research.

Below are several contexts that Garfield (2008, pp. 205-206) found:

measurement and natural context: students investigate the variety in height of plants

by measurement activities and comparing distribution of plants;

a „growing sample‟: students reason what happens to graphs if the sample size gets

larger;

measurement of minutes per day spent on various activities, e.g., time spent on study-

ing or talking in the phone: students make conjectures; students reason informally

about the distribution;

variability in data;

bivariate relationships;

comparing variability within and between data sets;

standard deviation and histogram: students explore the concept of standard deviation

through comparing histograms;

probability contexts;

sampling contexts.

As I have mentioned before, I decided not to use the context of probability and sampling for

my study. The focus in the statistics unit selected in my study was descriptive statistics and

thus the context „bivariate relationships‟, for example, was not suitable. The context

„comparing data‟ is a context I would like to have used, however not as the first introduction

to variability. In comparing data sets, there are at least two types of comparisons:

1. reading between data sets;

2. reading beyond data sets.

Curcio (1989, p. 384) defined that in reading between data, one makes comparison and use

mathematical concepts and skills. In reading beyond data, one makes extension, prediction

and inferences. As reasoning about variation between data needs higher level of statistical

thinking then of variation within groups (Mooney 2002; Jones, 2000), I decided to use the

context „comparing variability within data sets‟, meaning analyzing one data set, to introduce

measures of dispersion and hereafter to use „comparing variability between data sets‟.

19

2.3.3. Principles Underpinning the Design of My Lesson Activities

I originally planned to use the following principles in designing my activities:

1. Use ICT to change students‟ computational efforts into reasoning efforts.

The use of ICT enables the use of real and large data set (Reading & Shaugnessy, 2004,

p. 223) and helps students to visualize and explore data (Garvield & Ben-Zvi, 2007). I

considered a statistical software package or simply a calculator to ease the computational

efforts of my students. However, it is noted that the use of ICT in statistics education is

not the main focus of my research.

2. Foster the student‟s integration of concepts of central measures and variability during

data exploration (Reading & Shaughnessy, 2004, p. 223; Shaugnessy, 2007, p. 1002).

This is particularly important in my setting because textbooks separate four concepts of

statistics in Indonesian curriculum. Furthermore, shape of a distribution should also be in

a lesson package with centre and spread because “a brief description of a distribution

should include its shape and numbers describing its center and spread” (Moore &

McCabe, 2005, p. 40).

3. Discuss variability in various different contexts and using different questions. (Reading &

Shaugnessy, 2004, p. 223).

4. Use real data comparison to reason about variability (See, for example, Garfield & Ben-

Zvi, 2005; Konold & Pollatsek, 2002; Shaughnessy, 2007).

5. Link the following two kinds of measures of variation: range (range and interquartile

range) and deviation from mean (average deviation and standard deviation).

6. Combine student-centred teaching and lecture-based teaching.

The main approach of my teaching design was a constructivist approach. However, I will

combine it with the typical (but short) lecture plus exercise at the end, to make sure that

the formal measures were discussed.

20

21

3. Research Design and Methodology

In this chapter I explain my research question, research instruments and the methods I applied

in data collecting and data analysis.

3.1 Research Question

The traditional statistics teaching approach in Indonesia is usually the combination of

lecturing the statistical ideas and practicing with computational and procedural problems.

There are exercises that ask for interpretation, but the expected answers are just surface-deep.

This makes it possible and plausible that students use memorization of an acceptable answer,

e.g., the meaning of high standard deviation, as a successful strategy. This leads to a limited

understanding of statistical concepts. In this study, I designed a short teaching experiment

about variation, tried it out in a secondary school and then investigated how it helped the

students to develop statistical reasoning about variation. One goal was to check the effective-

ness of the approach or in other words, I wanted to answer the following question:

The teaching approach has three new characteristics:

student-centredness

use of real data

use of open-ended tasks and group work

Moreover, the nature of the teaching was also intended to be social constructivist because, in

solving the open-ended tasks, students were working in groups and their work was discussed

in the classroom. These components are rarely employed in statistics/mathematics education

in Indonesia.

I hypothesized that the student-centred teaching and learning approach would help

students reason better in statistics, compared to the traditional approach of using artificial data

and closed tasks. I investigated whether this hypothesis was true for IPS students in my case

study, who mostly had motivational problems and had not performed very well in mathemat-

ics. From this investigation, I expected to come up with recommendations for a teaching

approach in statistics education in Indonesia. Therefore my reflection question was the

following:

Research Question

To what extent did the student-centred teaching of variation using real data

and open-ended tasks help to improve Indonesian social science stream (IPS)

students‟ reasoning about variation?

22

3.2 Research Setting and Research Methodology

To answer the research question, I conducted a study in a secondary school classroom in a

rural area of Indonesia with students of the social science stream (IPS). A pretest-posttest

control-experimental group design was used to investigate the improvement of the students‟

statistical reasoning in the treatment. The original plan was that out of two parallel classes,

one class would be randomly selected as an experimental group and that the other would be-

come the control group. In reality, I had to accept in the end that one teacher did not want to

be the teacher of the control group. Therefore, this teacher‟s class was set as the experimental

group and the class of another teacher as the control group. I taught the students of the

experimental group with the new approach and the regular teacher taught the control group

following the regular program without my intervention. Before and after the teaching, pre-

and posttests were given to both classes. In the following, I explain the school setting and

describe in chronological order the methods that I used for data collection and data analysis.

3.2.1. The School Setting

The research was conducted in a public secondary school Sekolah Menengah Atas Negeri

(SMAN) No. 1 Lebong Tengah, Lebong, Bengkulu, Indonesia. The school had two parallel

grade 11 social science stream (IPS) classes and each class had 40 students and was taught by

different teachers. These two classes were used in the research.

The secondary school can be considered to belong to the better schools in Lebong,

although if compared to Bengkulu city, the capital of the province, this school would be

considered weak because the teaching and learning facilities are meagre. The computer lab

does not have enough computers for 40 students, the usual number of students per class, and

thus the subject „Information and Technology‟ is commonly taught and learned theoretically.

The science lab is present, but in reality can be considered non-existing due to insufficient

facilities.

Lebong is a mountainous, but not so prosperous agricultural area. It is a relatively new

district and thus still deals with shortage of teachers, especially mathematics teachers. I

observed that students coming to this secondary school apparently had not received good

prior mathematics education as their mathematics skills were not of what graduates of lower

secondary school should have mastered.

Reflection Question

What recommendations for the teaching of measures of variation in Indonesian

secondary school curriculum followed from the teaching experiment?

23

3.2.2. Research Methodology

Below I summarize the methods that I applied for data collection and data analysis.

Collecting Data

a. Classroom observations prior to the teaching experiment

The classes were taught by different teachers. In this phase, I talked to both teachers

about their students and checked to see if the students of the two classes were

comparable. I observed the two parallel classes before I started teaching the new

approach. The aim was to confirm that both classes were taught in the traditional

approach and to ensure that the prerequisite knowledge had been taught. It also could

give me the impression of the students‟ attitudes and performance. Furthermore, I

collected the semester reports of the students from the previous semester for later purpose

of an independent samples t-test to compare their ability in mathematics and language.

b. The teaching experiment and the lessons of the control group

I taught the experimental group and its regular teacher acted as an observer. The control

group was taught by the regular teacher and I acted as an observer. Both groups had the

same amount of time (4 lessons). I asked the regular teacher of the experimental group to

observe my teaching and to write an observation note after each lesson. I also talked to

the teacher after each lesson about how things went according to each of us. However,

the teacher did not have time to write her observation so I used our talks and final

interview as my data.

c. Pretest and posttest

Students were given a pretest before the teacher of the control group and I started to

teach variation. The same test was given as a posttest after the teaching was completed.

The idea was that this would give information about students‟ improvements of the

students in the both groups.

d. Giving the questionnaire

e. Doing the interview

Analyzing Data

a. Checking the comparability of the experimental and control group

I used a two-tailed independent samples t-test of the students‟ mathematics and language

marks from their semester report to check if there was an indication of different ability

levels between the two participating groups or that they could indeed be considered as

groups of students with similar competencies.

24

b. Investigating students‟ statistical reasoning

I analyzed the results from the pre- and posttests and the interview. I checked all students‟

written reasoning from the experimental group to get an overview of their responses. I

used this overview together with the theoretical framework to create descriptors of

identified categories for the reasoning in students‟ answers. The same descriptors were

used for the written responses of the students in the control group.

c. Checking students‟ improvement in statistical reasoning

I analyzed quantitatively whether there was an indication of improvement from the

pretest to the posttest in the two groups and compared the improvement between two

groups. If possible, I used a t-test implemented in the Minitab 15 software to analyze the

reasoning improvement quantitatively.

d. Analyzing all data to answer the research question

The class observation, the videotape of the teaching, the talk with the teachers, and the

students‟ questionnaire and interviews were analyzed to illuminate the findings from pre

and posttest and to answer the research question. Together with my journal of my own

teaching experiment, I tried to reflect on the strengths and weaknesses of my teaching

approach and formulated suggestions and recommendations for future use in the statistics

teaching in Indonesian secondary school classrooms.

3.2.3. The Teaching Materials

In accordance with the research-based principles I had chosen (see Subsection 2.3.3), I

designed a teaching experiment consisting of three activities, each 90-minute long. I selected

human growth as the first learning context and I used the original, raw data set of 434 subjects

in Jakarta collected in a recent growth study in Indonesia (Batubara et al., 2006). This raw

data set was reduced for the teaching experiment into the data of 20 boys and 20 girls of age

15-16 years. The context of human growth was used in the first two lessons.

In the first activity, students were given the reduced data of measured height and

weight of 20 boys and 20 girls of the same age (see Appendix B). Given this data set, the

students‟ tasks were to:

make histograms of the data and describe how the data were distributed;

develop a rule to decide whether a child‟s height and weight was: very common;

normal but need attention; and abnormal (and explain it);

make an easy visual aid to use and explain the rule to other people.

25

Students were to work together in groups of four and I encouraged the use of a basic (i.e., not

scientific1) calculator, as mentioned in Section 2.3.3 to save computational time during

students‟ data exploration. The aim of this activity was to provide a rich context for students

to intuitively think about deviation from the mean. Most people, including children have

conceptions and/or preferences about what height and weight they consider perfect. Thus I

thought this context could lead them to define a range of deviation from the perfect height and

weight that they have in mind when deciding what height and weight are normal or abnormal.

The concept of deviation from the mean was later discussed and brought to the formal

concept of standard deviation in a whole-class discussion.

The second lesson consisted of two parts: 2a and 2b. In the first part, I provided a

histogram of boys height and the task was to compute the mean and standard deviation of

boys height, based on the histogram. The students had learned how to approximate the mean

from grouped data prior to my intervention and I assumed that they would be able to find the

mean. For the standard deviation, the students and I should have discussed this formal

measure by the first lesson, so I expected students could transfer that formula for single-

valued data to grouped data (i.e., for a histogram). If they could not, I would help but the

experience of trying would presumably help them to understand the concept.

In part 2b, as planned (See Subsection 2.3.2), after comparing within groups, I asked

students to compare the height between two groups: height of boys in Jakarta and height of

boys in Bengkulu. Jakarta is the capital of Indonesia and Bengkulu is the province of the

place where I conducted my study. Given two histograms of the same range and bar width,

students were first asked to guess which group had larger standard deviation, without doing

any computation. Afterwards, they were asked to compute the value of the standard deviation

(the mean was given) and to check their previous response. The main task for students was to

answer the following question:

„Can you conclude that boys in Jakarta are taller that boys in

Bengkulu? Explain your reasons.‟

I wanted to check whether students applied some understanding of standard deviation in

comparing between the two groups of boys.

In the previous two activities, the mean as a central measure and standard deviation as

a measure of variation were the focus of the lesson activities. In the third activity, I focused

also on the median and interquartile range. The students were to compare data about hours

that they spent on several activities at home. These were collected prior the lesson; for this

purpose I had distributed a questionnaire (see Appendix B) and after return I had summarized

1 It is unlikely anyway for these students to own scientific calculators (see Subsection 3.3.1)

26

the data so that they could be used in the third lesson activity and compared with data that I

had collected before from another school in Bengkulu city. The students were to analyze

those summarized data. The task was for them to answer the following question:

“Who spends more time on activity X: students from Bengkulu city or

Lebong?2 Explain your reasons!”

There were data about hours spent on nine activities. The students chose one of these active-

ties as activity X to work on during the lesson. I had checked beforehand that for this kind of

data, the median and interquartile range were more appropriate measures than the mean and

standard deviation.

In summary, through all the activities, I aimed that students could learn to understand

and reason about the use of central measures and measures of variation in data analysis. In the

lesson activities, the students attempted to analyze real data and draw conclusions in two

contexts, which they could relate to. Therefore I hoped that the experience of analyzing real

data could promote understanding of measures of variation and foster statistical reasoning

about variation. Since the aim of my study is to check this hypothesis, I consider in this thesis

the teaching materials as part of my research setting.

3.3. Research Instruments

There were four main research instruments that I used to answer my research question and

reflection question: the pre- and posttest, a questionnaire, and interviews with the regular

teacher and some students of the experimental group.

3.3.1. The Pretest and Posttest

The pretest and posttest consisted of the same problem set in the same ordering (see Appendix

C). It consisted of two problems on the understanding of the term „variation‟ (referred to as

Subtest A), two computational problems that emphasize the computational skills (Subtest B),

and six more open-ended problems that emphasize the reasoning skills (Subtest C). From the

ten questions, only the two computational problems (subtest B) asked students explicitly to

find a measure of variation, in this case, standard deviation. For the questions in subtest C, I

expected students to consider variation by employing a measure of variation that they had

learned. Below, the questions are given in italics and afterwards the general purpose of each

question is explained.

2 Bengkulu is the capital city of Bengkulu Province and Lebong is a district in Bengkulu

Province (see Section Methodology for the research setting).

27

In the first question, I wanted to explore what students intuitively understand from the term

„variation‟ and whether their understanding before and after the teaching differed. The

teaching intervention itself did not include a formal definition of the term „variation‟. The

second question is also about students‟ understanding of what variation means in practical

cases. Knowing when to minimize or maximize variation is one of the signs of understanding

and statistical reasoning ability.

Subtest A

1. Based on your experience, what does variation mean to you? Give an explanation

and/or an example.

2. For each of the following cases, answer the following question: “Which is more

desirable: high variation or low variation?” Add your reason.

a. Age of trees in a national forest.

b. Diameter of new tires coming off one production line.

c. Scores on an aptitude test given to a large number of job applicants.

d. Weight of a box milk of the same brand.

Subtest B

3. Given the data: 11, 32, 17, 34, 24, 15, 28

For the data above, fill in the table below.

Range

Mean

Median

Standard Deviation

Interquartile Range

4. Below are the data of monthly income.

Monthly Income

( in hundred thousand rupiah )

Number of

People

3 – 5

6 – 8

9 – 11

12 – 14

15 – 17

3

4

9

6

2

The mean of the data above is ________.

The standard deviation is _______.

28

Question 3 and question 4 are related to computational skills. Since a comparison between

two different teaching approaches was attempted, it was fair to the control group to include

„traditional‟ questions in the pre- and posttest. Besides this, regarding my reflection question,

I hoped that the teaching intervention would not only improve the statistical reasoning skills,

but also the computational skills that are needed in the nationwide examination. Question 3

deals with single data and question 4 deals with grouped data.

One aspect of the teaching intervention was to teach variation by letting students analyze data

sets. In data analysis, looking at a graphical display of the data, especially in the form of

histograms, is important. Thus, the ability to read and interpret graphical displays, especially

histograms, was assessed through this question. In addition, this question assessed students‟

ability to describe the variation of a distribution.

Subtest C

5. Four histograms and two descriptions of data are displayed below.

i. A data set of test scores where the test was very easy

ii. A data set of wrist circumferences (measured in centimetres) of

newborn female babies.

a. Which histogram best matches the data in description (i)? Give your

reason.

b. Which histogram best matches the data in description (ii)? Give your

reason.

29

In question 6, I wanted to find out whether students use any measure of variation in

comparing two data sets. This question can be considered as a question that assesses the

ability to read beyond the data (in the sense of Curcio, 1987) in the process of analyzing and

interpreting data. Based on the two data sets, this question implicitly asked students to predict

the future performance of students A and B and thus I could verify whether students used any

consideration of variation in their reasoning.

Given a data set, students were asked in question 7 to make a summary and to draw a

conclusion. I wanted to find out what measures of centre and variation the students would use

to draw an informal inference and whether their views on the measure of centre were adjusted

when an outlier is present. This question can also be considered as a question to assess the

ability to make comparisons within a data.

7. One day Dedi caught a very big catfish from his rice field. He wanted to be

sure of the weight of the fish and therefore he weighed it 7 times on the same

scale/balance. Below are the measurements (in kilogram) that he found:

2.9; 2.7; 5.1; 3.1; 3.0; 2.8; 3.0 kg.

a. How spread out are the measurements he obtained?

b. How many kilograms do you think the true weight of the catfish was? Give

your reason.

6. Two students who took mathematics tests received the following scores (out of

100):

Student A: 60, 90, 80, 60, 80

Student B: 40, 100, 100, 40, 90

If you had an upcoming mathematics test, who would you prefer to be your

study partner, A or B? Why?

30

The third part of this question was indirectly assessing students‟ understanding of variation.

The prominent emphasis was an assessment of the students‟ knowledge of the statistics that

they have learned prior to the intervention, i.e., measure of centre, and whether this knowl-

edge has improved after intervention. Part a and b were explicitly about central measures.

However, in part c, students need to be able to decide which measure of centre can explain the

nature of the data, taking into account the variation or the shape of the graphical display.

Herein was the indirect assessment of any consideration on variation.

8. The below histogram shows the number of hours of exercising per week by

marketing staffs of a bank.

a. Compute the median. _______________

b. Compute the mean. ________________

c. Based on the histogram, how many hours do the staffs in this company

usually exercise per week? Give your reason.

31

9. Forty college students participated in a study of the effect of sleep on test scores. Twenty

of the students studied all night before the test in the following morning (no-sleep group

while the other 20 students (the control group) went to bed by 11.00 pm on the evening.

The test scores for each group are shown in the diagrams below. Each dot on the

diagram represents a particular student‟s score. For example, the two dots above the 80

in the bottom diagram indicate that two students in the sleep group scored 80 on the test.

• •••

•••

•••

•••

•••

• • • •

30 40 50 60 70 80 90 100

Test Scores: No-Sleep Group

• • • • •••

•••

•••

••

••

••

•

30 40 50 60 70 80 90 100

Test Scores: Sleep Group

Examine the two diagrams carefully. Which group is better: the sleep group or the no-

sleep group? Explain your reasons.

Then circle one of the 6 possible conclusions listed below that you mostly agree with.3

a. The no-sleep group did better because none of these students scored below 35 and

the highest score was achieved by a student in this group

b. The no-sleep group did better because its average appears to be a little higher

than the average of the sleep group.

c. There is no difference between the two groups because there is considerable

overlap in the scores of the two groups.

d. There is no difference between the two groups because the difference between their

averages is small compared to the amount of variation in the scores.

e. The sleep ground did better because more students in this group scored 80 or

above.

f. The sleep group did better because its average appears to be a little higher than

the average of the no-sleep group.

3 In the actual pre- and posttest, the multiple-choice part was formatted to be on the back page

of the open part.

32

In this question, I wanted to investigate whether students would use the combination of meas-

ure of centre and variation when comparing two data sets. The multiple choices show the

misunderstanding that students usually have, for example, paying attention either to the ex-

treme values only or the average. I wanted to test whether students realized that in comparing

two data sets, they need to consider not only central measures but also measures of variation.

Finally, in the last question, students were again asked to compare two data sets, in the form

of graphical displays, namely histograms. The ability to understand histograms was essential

here and I wanted to test whether the intervention improved this ability.

3.3.2. The Questionnaire

I used a questionnaire for all students in the experimental group (see Appendix D). The ques-

tionnaire was designed to look for students‟ opinion regarding the teaching intervention. The

questionnaire consisted of the following four parts:

(i) the use of real data (Questions 1-3);

(ii) group work (Questions 4-7);

(iii) the teaching approach (Question 8);

(iv) feedback about the lesson (Questions 9-10).

10. Below is the histogram of the scores of a mathematics test in two classes.

Scores

9585756555

Class A

Fre

qu

en

cy o

f sco

res

24

21

18

15

12

9

6

3

0

Scores

9585756555

Class B

Fre

qu

en

cy o

f S

co

res

24

21

18

15

12

9

6

3

0

a. Comparing the two histograms, one could infer

i. Variability of scores in Class A is higher variability than in class B.

(The scores in class A vary more than the scores in class B)

ii. Variability of scores in Class B is higher than in class A (The scores in

class A vary more than the scores in class B)

iii. Class A and class B have equal variability.

iv. I don‟t know.

b. Why? Give your reason.

33

3.3.3. The Interview

I interviewed the regular teacher of the experimental group to get feedback about the teaching

experiment. I also interviewed several students from the experimental group. I intended to

interview students of the control group, but due to time constraints this was not possible. The

plan was to explore in more depth the answers from several students. I chose a number of

students of different range of abilities from the experimental group, helped by the regular

teacher in selecting, and I went through the answers to the pre- and posttests. I tried to get a

better impression of the reasoning behind their answers.

34

35

4. Results and Analysis of the Teaching Experiment

I present the results related to the classroom observation before the experiment, the teaching

experiment itself, and the feedback about it from the questionnaire and the interview with the

regular teacher of the experimental group. The data collection was conducted in about 4-5

weeks, from the second week of November 2009 to the second week of December 2009.

Below is the timeline of the teaching experiment.

Date Activities

2nd week of November Observation of the regular teachings in the experimental

and control group

November 18, 2009

November 21, 2009

Pretest of the experimental group

Pretest of the control group

November 20 – December 1 2009

November 26 –December 2 2009

Teaching of variation in the experimental group

Teaching of variation in the control group

December 2, 2009

December 3, 2009

Posttest in the experimental group

Posttest in the control group

1st - 2nd week of December Interviews with the students and teachers

Table 4. Timeline of the experiment.

4.1. Classroom Observations Prior to the Teaching Experiment

One week before the teaching experiment, I talked to the teachers from the control and

experimental group. I discussed the students‟ mathematical ability and the topics that had

been taught so far. I also discussed my planned activities and the pretest material, but only

with the teacher of the experimental group to keep the control group‟s teaching from being

influenced.

Regarding the mathematics topics that the teachers had taught, the experimental group

had almost finished the learning of central measures while the control group was behind. It

turned out that both teachers had not taught histograms in the lessons about data

representation. I then asked them both to teach histograms prior to my lesson about variation.

This also gave me a chance to observe their style of teaching. I observed two lessons and one

of those was about learning of histograms.

From my observations, I concluded that both teachers taught in a teacher-centred

approach. The main teaching activities were cycles of:

lectures about how to construct a histogram (procedural knowledge);

working out examples;

giving students exercises and/or homework.

Students were listening without observable active participation and replicating the examples

in exercises. However, I noticed in my second observation of the experimental group that the

teacher showed a slight change of approach when she lectured. She tried to engage her stu-

36

dents by giving questions before presenting the procedural knowledge. However, it is noted

that this second observation happened after our discussion about the planned activities and the

pretest material, when I explained my approach and aims. It seemed that our discussion had a

small influence on her teaching approach. She showed interest in more student involvement.

Regarding the students‟ mathematical ability, students from both groups had been

taught by the current teacher of the control group in the previous academic year, grade 10.

Students‟ selection of streams was primarily decided by their performance in mathematics and

science subjects. Thus, the students in my study, the social science (IPS) students, both had

not performed well in mathematics and science in grade 10. I conducted two two-tailed

independent samples t-tests to evaluate both the students‟ mathematics grades and their

Bahasa (language) grades. In both tests the null hypothesis was that there was no difference in

mean scores of grades of students in the experiment and control group. Regarding both

subjects, the null hypothesis was supported (t = 0.306, df = 76, two-tailed p-value = 0.761 for

mathematics, and t = 1.828, df = 76, two-tailed p-value = 0.071 for Bahasa). These results

indicated that the students of the experimental group and the control group did not differ

significantly regarding their ability in mathematics and language.

4.2. The Teaching Experiment

In the original plan, I had planned to give the students three activities within three lessons (90

minutes each). These three lessons were designed to teach measures of variation, namely

standard deviation and the interquartile range (see Section 3.2.3). However, it turned out that

the students needed more time to finish the first two planned activities and I ended up

teaching the first two activities, which focused on the concept of standard deviation, in four

lessons (90 minutes each). There was no time for the third activity. Therefore, the concept of

interquartile range was introduced at the end of the last lesson through my explanation in a

discussion session in the end of the last lesson.

Before I give an impression of how the lessons went, I first want to describe the nature

of group work in the class. The students were divided into 10 groups of 4 students each. I

asked the students to arrange their seating prior to the beginning of the class. This was

possible because the mathematics lesson was the first lesson in the day so that students could

easily arrange their seating in few minutes before the school started (in this way no time was

wasted in moving chairs and desks) and because the classroom was large enough for a

different arrangement of desks and chairs. Figure 3 shows the seating arrangement in the

37

classroom. Students are normally4 always seated in the same seats in the classroom. The

groups were formed only based on these seating positions. Two adjacent rows of pairs

became one group (see Figure 2).

Figure 3. Picture of the classroom with students working in groups of four.

The group work did not go as well as I had hoped for. I did not observe lively

discussion or argumentation within groups. Students tended to work individually and only

consulted their group mates if they had difficulties or to check their answer. My design to

give all students their own worksheet unfortunately gave them this freedom to work

individually. I urged them to discuss first within groups what to do with the questions or at

least divide the jobs and discuss in pairs. In the third lesson, the observing teacher told me

that the group work started to improve according to her. However, I still observed not many

discussions within groups. The group on the right-hand side of the picture shown in Figure 3

(only two members are visible) was one group that was functioning a little better as a team. I

tried to use the video recording of the teaching for analyzing discussions within groups, but

alas the audio quality was not good enough to get much detailed information out of it.

Lesson 1: Activity 1

According to the lesson plan, students were expected to do three tasks. The first task was to

draw histograms (see Section 3.2.3). In the realization of the lesson, I decided to reduce the

4 Sometimes some homeroom teachers decide to rotate students‟ seating few times during the

year, but generally students sit in the same seats.

38

students‟ tasks because only few students had calculators and most students were slow in

computing. I decided to omit the third question in Activity 1 and the students had only to

work on one data set (boys data or girls data), instead of both. This was to save time without

sacrificing the purpose of the designed lesson.

Introduction: talking about data creation

The lesson started with an introduction of the context of human growth. I wanted to engage

students with the context of the real data they were about to work with by talking about how

the data were created. The intention was that by understanding well how the real data were

created, students would be engaged more in the later activities and could reason using the

knowledge about the context. I asked questions, for example about the growth measurements

of the students‟ siblings that they had seen, about why the students thought we need to do

growth measurements, or about students‟ idea of the detail of how the growth measurements

were done. Unfortunately, I observed a lack of responses from students in this introductory

session and my introduction became more of a short lecture. It seemed that the students were

unfamiliar with the context of human growth measurements. Later after the lesson, the teacher

confirmed this.

Task a: Making a histogram and describing the shape of the data

I distributed the activity sheets to the students and asked them to read it first and to ask me if

there was anything they did not understand. I then explained the tabular representation of the

data and asked them to start working. The first task was to draw histograms of data sets and to

describe the shape of the data. While students were working, I was walking around the class-

room and observed the students‟ progress.

I immediately saw that the students did not know how to construct the histogram albeit

they had studied it not long before my lesson. The students did not order nor group the data. I

asked one group a question whether histograms are used for individual or grouped data. The

group could not answer. Walking around and asking the same question, I noticed that the

students did not know what to do. One group even answered that the frequency would all be

one because there were no modes. Finally, I asked the whole class for attention and gave the

clue that they should group the data first.

After several minutes of letting them group the data without my instruction, I again

asked the whole class for attention and decided to lead a question-and-answer session5 on how

5 A question-and-answer session can be defined as an interactive lecture. I give students questions as a

scaffolding or as a hint to get students to the intended learning path.

39

to group the data. I used the height data of boys as an example. I tried to urge students to elicit

ideas on what interval (class width) we should use, by prompting: “how many centimetres of

height difference do you think are needed to be able to differentiate easily people‟s height?”

There was no response. I continued the question-and-answer session with stimulating

questions but still there was lack of responses and in the end I cut the knot to use 5 cm as the

interval. I finally continued on explaining how to draw histograms and let the students work

on it. After the first period (45 minutes) of the lesson, the progress was still very slow.

Bothered by time limits, after several more minutes to continue work, I urged the students to

start doing the second question in the activity as I was worried about the time constraint.6

In my expectation, this histogram part of the activity would be just an easy procedural

task and would not be time-consuming, especially because the use of a calculator was allowed

and I had reduced the tasks. I assumed that the related task on what could be said about the

shape of histograms was the more challenging part of this first question. However, it turned

out that the one lesson on histograms, which the students previously had gotten, was not

enough to be able to carry out the task or that the students perhaps were confused by the

different tabular representation of the data. I also had not anticipated that the students could

have difficulties with non-integer data, but unfortunately they had never dealt with non-

integer data before and this slowed down their pace. As a result of this mismatch between my

expectations about the students‟ ability levels and their real competences, I had to spend

significant amount of time out of this 90-minute lesson on teaching how to make histograms.

Task b: Finding a rule to decide normal and abnormal height and weight

Students also had difficulties with the second task of making a rule that allows to classify a

given height. I explained that abnormal height could mean that a person is too short or too

tall, and I asked them to play the role of a doctor and comment on given data. While walking

around, I saw groups deciding for each datum whether it was very common, still normal, or

abnormal. However, when I asked one group about the basis of their decision, they could not

answer.

At some point, I drew the whole-class attention and tried to help students by giving an

analogy in the context of Durian7 fruits‟ weight: “If you are a Durian seller, and you are

harvesting Durian, what will be your answer if I ask you the typical weight of Durian that is

considered big?” In this question-and-answer session, mode and average came up and I

6 This might not have been a good decision but I do not think it did much harm. Even if the students

could not finish all questions, I wanted them at least to try doing all tasks. 7 Durian is a popular seasonal fruit in Indonesia.

40

concluded that if the data are in small sizes, it is better to use the average. I pointed out that

the task was about the same thing. I urged them to try using statistics they had learned already

and connect it to the task. I found that my hints did not help much.

When the second period of the lesson was almost over, I decided to give more help.

The question-and-answer session was started by the following question: “If the common

height is the average, how do we decide what is too short and what is too tall?” The average

was computed and I wrote it on the whiteboard. I repeated the question and there was no

response. In general, the students gave very little response in the discussion. I tried to lead the

discussion to the concept of deviation from the mean. I wrote the minimum value, the

average, and the maximum value on the whiteboard and suggested to find ranges or intervals

for „too short‟ or „too tall‟. Two students gave an interval but when I asked the reason behind

the choice, they could not explain. I finally suggested that for deciding the rule we would use

the fact of how far a datum is away from the average. The teaching became more and more

teacher-centred. I introduced the formal notation for the distance of a datum from the mean

value and showed some examples of calculation.

Because the time was already over, I asked them to finish the histogram at home and

also to compute the distance of each datum from the average as homework. I ended the lesson

by answering the first question about the shape, centre and spread of the histogram.

What is interesting from research point of view is what rules students came up with in

this task. From the description above, most groups did not come up with any rule but there

were some groups who decided the normality for each datum. However, they were unable to

explain their reasoning or mechanism. I concluded that students were unable to make a

independently general rule for categorizing the data or that maybe they just had difficulty to

express in words any rules they had in mind.

Lesson 2: Activity 1 revisited

In the second lesson, I started with reintroducing the context of human growth. I felt that I had

not done it as elaborate as I had wanted to in the first lesson. Also, in this first lesson, many

students were late and interrupted my introduction and there was lack of responses. In this

second lesson, my teaching became more teacher-centred. I reintroduced the context in a

lecture way from the need of statistics to summarize data, sampling the population, the need

of visual summaries of the data and the purpose of a growth chart. After that, I let the students

do the second task again and limited the task to boys data only.

The lesson then continued with discussion of the students‟ results. Two groups drew

their histograms on the whiteboard (See Figure 4) and I led a question-and-answer session to

discuss their answers to the first question and the second question. I explained the use of

41

deviation from the mean as one way to make the rule. Based on the histogram on the

whiteboard we decided about proper intervals for very common height and abnormal height

(Figure 5, left-hand side). After this informal rule, I tried to bring the students to the formal

standard deviation by mentioning that we can use one number to describe the spread of the

data. I drew a table on the whiteboard and we computed deviation of each datum from the

mean (Figure 5, right-hand side). Students completed the table and through a question-and-

answer session we came to a formal formula of mean deviation and standard deviation. I

explained how the standard deviation can be used as a measure of spread. I also did a

question-and-answer session about quartiles, mentioning that in practice, quartiles are also

used in creating growth charts. To end the discussion, I distributed the Indonesian growth

chart from Batubara et al. (2006) and also the Dutch growth chart for fun comparison. I

explained how to read the growth chart. Finally, I gave student homework to find quartiles

and standard deviation of some data for the purpose of practicing procedural skills.

Figure 4. Students' histograms drawn on the whiteboard.

Figure 5. Intervals for the rule (left-hand side) and derivation of the formulas of standard deviation.

42


I planned to start the third lesson by checking the students‟ homework. However, it turned out

that most of the students had not finished their homework. Therefore, to check their proce-

dural skills I made up a small-sized data set and asked them to find the standard deviation.

Later the results were checked. This was because, in accordance with one of my design prin-

ciples (Subsection 2.3.3), I wanted that the students could also master procedural knowledge.

As an introduction to the second activity, I reviewed the content of the previous lesson

and did a question-and-answer session with the students regarding interpretation of large or

small standard deviation of data. For example, I gave the case of the basketball players‟

height. I told the students that the average height of students of the basketball team in their

classroom is smaller than the average height of the basketball team from another school. I

then asked the students who would win the basketball match. Most students answered that the

team from the other school would win. I then showed a cartoon showing 4 players of about

the same height and one really tall player and asked if the students changed their mind about

who would win. They did and so I pointed out that it is not good enough to compare data sets

by the value of their means only.

Hereafter I distributed the activity sheets to work on. Students were working in groups

and I walked around to assist when needed. Unfortunately, students could not finish in the

allocated time and I decided to do the discussion in the next lesson.


In this final lesson, I asked the students to present their results of Activity 2 and I tried to

encourage discussions. Answer sheets were hung up on the whiteboard and four groups pre-

sented their result (see Figure 6). Unfortunately, one group lost their answer sheet and could

not manage to present their results without it.

It was not so surprising that the students‟ answers were not very sophisticated. The

students did not explain much the reasoning behind their answers and this affected to some

extent how the discussion went. The discussion was not as lively as I had expected. Groups or

students rarely volunteered to comment. Although I tried to induce the discussion as much as

possible, in general, I felt that there was not so much discussion going on. I believe it was

partly because presenting and discussing were a new experience for the students in mathe-

matics classrooms and partly because of my limited experience in leading classroom discus-

sion. A more experienced teacher knowing the students in the classroom might do a better

job.

43

Figure 6. A group was presenting their answers.

In Activity 2, the students were comparing two data sets of height of boys from

Bengkulu and Jakarta. Despite my emphasis in the previous lesson that it is insufficient to

make a comparison based on the mean values only, the majority of students still based their

conclusions on the mean values only. In Table 5 I present the answers (some are quotations)

for the question of comparing two data sets (2.b.3 in the activity sheet): „Can you conclude

that Jakarta‟s boys are taller than Bengkulu‟s boys?‟.

Answer Number of

Groups

1.

2.

3.

4.

Yes, because the mean height of Jakarta‟s boys is bigger than that of

Bengkulu‟s boys

“No, because boys in Bengkulu are taller than (boys in) Jakarta and

the average in Bengkulu is more varied than the average (in) Jakarta”

No, because there were more boys in Bengkulu whose height was

160 cm.

“No, because there were no 16-year old boys in Jakarta whose height

was under 150 (cm) and none above 175 (cm)

5

1

2

1

Table 5. Students‟ reasoning in comparing two data sets.

Afterwards, when analyzing the students‟ answer sheets, I did not understand the

second answer in Table 5 and it did not come up in the presentation and discussions. On the

other hand, the third and the fourth answer in Table 5 are of the same type, in the sense that

the students counted the frequency of certain values and compared that frequency. This also

44

gave me the opportunity to comment on this type of answer during the discussion with

students. I got a chance to point out that the different size of the two data sets was a factor to

be considered here. I then introduced another measure of variation: the interquartile range,

which can deal with this issue.

In conclusion of the lesson, the students and I reviewed all formal measures of varia-

tion that had been studied, namely: range, mean deviation, standard deviation, and interquar-

tile range. I talked about sources of variability: natural variation or variation due to measure-

ment (I made several students measure height of flowers and discussed the results). We also

talked about high and low variation in data; an issue brought up by me by me was that some-

times higher variation is more desirable than low variation, depending on the context of data.

4.3. The Teaching in the Control Group

The students in the control group got the same period of teaching and learning, which was

four lessons of 90 minutes. The teacher has finished giving all the materials after the third

lesson and therefore the last lesson was dedicated for students to do more exercises on the

subject of variation. I observed the first three lessons and I did not go to the exercise lesson.

During my observations, I sat on the back of the classroom and did not interfere with the

teaching and learning process.

Based on my observations, I concluded that the teaching usually followed the same

sequence: the teacher lectured the materials, the students did exercise problems individually

and if the students were able to finish the exercises in time, the results were checked before

the class ended, and finally students were given homework. I also observed that students did

not engage in the teacher‟s questions. For example, in the first lesson during the introduction,

the teacher gave two small data sets of the same average and asked the students “Which data

set is better? Which data set is more stable?” The students did not volunteer to give answers.

Furthermore, the students did not engage well in the exercise session and most of the students

did not do their homework. The students did not finish their exercise problems, which were

computing standard deviation based on the formula that the teacher had given earlier. The

other lessons proceeded in more or less the same style.

From research point of view, how comparable were the students of the experimental

group and the students of the control group? There are never two classes that are exactly the

same, so I want to report here only about the main similarities and differences that I noticed.

First, the groups had different teachers and each teacher had his/her own style of teaching.

Both teachers‟ instructions were teacher-centred and to some extent their teaching style in my

opinion shaped the way the students behaved toward tasks. However, the differences in

behaviour of the students in the two classes were small. In general, I did observe the same

45

attitude toward mathematics learning in the two classes, but the students of the control group

were more careless about their learning. For example, there were more students who did not

do their exercise problems in the class. There were students who skipped the mathematics

lesson as well. The strictness of the teachers regarding tasks or homework completion might

play some role here. But also, the fact that the teacher of the experimental group was also the

homeroom teacher8 made a difference. In Indonesian schools, homeroom teachers played an

important rule to decide a student‟s advancement to the higher grade. Second, the experi-

mental group‟ mathematics classes were in the morning (the first two 45-minutes period of

school) while the control group‟s classes were later in the afternoon.

Having said these two facts, in general the students‟ motivation and behaviour were

comparable. And regarding the aspects of teaching that I studied, the control group‟s teaching

did not use real data, did not use open-ended questions, and did not use group work.

4.4. The Questionnaire

I gave a questionnaire, which can be found in Appendix D, to the students of the experimental

groups to learn about their opinions regarding the teaching intervention. It consisted of ten

items: seven statements with 5-point Likert scales; one statement for which students had to

mark one or more options; and two open ended questions for students‟ comments and feed-

back. The findings from the questionnaire are summarized by grouping the questions into the

four themes mentioned in Subsection 3.3.2.

The Use of Real Data

The first three items of the questionnaire concerned the use of real data. All three items were

designed to enable me to check my assumptions of the positive effect of using real data in

statistics lessons, instead of using artificial data. The assumptions were that the use of real

data in statistics lesson will make the learning more interesting and fun (Statement 1), easier

to understand (Statement 2), and application-oriented (Statement 3). The result showed that

more than 75% students agreed or strongly agreed with the three statements (see Table 6). It

is interesting to note that no students disagreed or strongly disagreed to the third question.

8 Homeroom teachers in Indonesian schools are teachers who usually know the most about the

students in their class. Their tasks include monitoring daily attendances, organizing students for school

events, writing the end of year/semester report, or dealing with behavioural problems and parents.

46

Statement

Number of Students

Strongly

Agree

Agree Neutral Disagree Strongly

Disagree

1 The use of real data makes learning

Statistics more interesting and fun

8 26 3 2 0

2 Analyzing real data makes it easier for

me to understand statistical concepts;

for example standard deviation.

5 24 5 5 0

3 The use of real data shows me the

application of statistics in real life. 9 25 5 0 0

Table 6. Result for Statements 1-3: the use of real data.

Group Work

Group work was an important part of the teaching in the experimental group and I wanted to

investigate how students felt about working in groups, considering that this was their first

experience with group work in mathematics class according to the teacher. As mentioned in

the description of the teaching intervention, group work did not run very well. Despite this,

from Table 7, in total 31 out of 39 students believed that they were active participants in the

group discussion (Statement 7). A positive attitude toward working in groups also can be seen

from students‟ responses to Statement 5 and 6: 36 students wrote that they liked working in

groups and 35 agreed that it helped them to understand the concepts.

One quite interesting result is about Statement 4. Based on the teacher‟s explanation,

the students were not used to group work in the mathematics lessons before, yet only 6

students disagree. I assume that this is due to the fact that the students were not considering

these statements in regards of the mathematics lesson only.

Statement

Number of Students

Strongly

Agree Agree Neutral Disagree

Strongly

Disagree

4 I am used to working in groups.

8 16 9 6 0

5 I like working in groups.

15 21 0 3 0

6 The group‟s discussion helps me

understanding statistical concepts.

13 22 1 3 0

7 I actively participate in contributing

ideas and in group discussions.

6 25 7 1 0

Table 7. Results for Statement 4-7: group work.

The Teaching Approach

Through this questionnaire, I tried to find out whether the new things I tried to incorporate in

my teaching intervention were indeed recognized by the students. The results are shown in

47

Table 8. Note that in the real form, I did not use any labelling for the statements. I label the

statements here with a-e for referencing purposes. Regarding the things that I believed were

quite obvious to see in the experimental lessons, such as choices a, b and c, more than 64%

students thought so too. For question d and e, I intended to confirm that there was some

degree of constructivist spirit in the teaching. Due to the difficulties I had in the lesson and the

improvised direction to more teacher-centred teaching that I undertook, I thought that perhaps

it would be a bit difficult for students to see these two aspects. However, 32 out of 39 students

agreed with statements d and e. There was one choice where students could freely write

comments in the questionnaire. However, the comments were irrelevant to the issue in state-

ment 8.

Statement 8 Number of

Students

I think that the way of learning and teaching in the last two lessons differs

from the way of learning mathematics I usually experience in the following

sense:

a Using real data, not only artificial numbers. 26

b Using group work 31

c Using calculator is allowed 25

d Demanding students to develop their ideas and then defend those ideas

through correct arguments 32

e Giving students the chance to explore the solving of problems, not just

“telling” the correct ways. 32

Table 8. Results for Statement 8 concerning recognition of components in the new teaching approach.

Students’ Free Feedback

In the last three questions on the questionnaire, I asked the students for comments and sug-

gestions. Unlike what I had hoped, students´ comments were mostly not specific about some

aspect of the teaching. The comments were close to general statements about liking or dislik-

ing without giving specific reasons. Many comments were not informative. Two examples

showing like and dislike are: “fun, because students were taught to be brave to give opinions

and suggestions” and “the materials were difficult to understand.”

I received quite many suggestions regarding my way of speaking in teaching. I got

suggestions to “not speak too fast” or “not to use difficult and high9 language.” There were

four suggestions that indicated preference to the traditional teaching, such as “Give an

explanation and formula first.” Finally, there were two students who suggested using data

9 I am not sure about the meaning of „high‟ here.

48

from Indonesia, not abroad. I found this surprising since the data they analyzed were

Indonesian growth data and I only used a Dutch growth chart in the last minutes of one lesson

as an informative comparison.

4.5. The Interview with the teacher of the experimental group

After the first lesson, I immediately talked about the lesson to the regular teacher of the

experimental group. She expressed her disappointment about her students that they did not

automatically think of ordering the data while she had emphasized this aspect in her teaching.

She told me that the natural-science students would have immediately ordered the data and

would have been able to do the tasks better. When I asked what she thought about the lesson,

she told me that when she observed the lesson, it had occurred to her that maybe it would be

better if the students did also data collection by measuring their own heights. I agreed but

suggested that, due to time limitation, it would be more effective to integrate collecting data

to the teaching of the first part of statistics chapter, namely data representation. From the

teacher, I found out that the notion of growth chart is unfamiliar to people in this district.

Growth measurements are rarely done, even to babies. Therefore, growth measurement was

unfortunately not a familiar topic for the students.

The teacher was too busy to write observation notes. She only filled in the observation

sheet once after the first lesson. There was also one written interview, when she asked me to

write down my questions, because she could not provide time at that moment, and she would

write down her answers later. Therefore, for the next lessons, I wrote down important issues

from our talk after each lesson and summarized it in the interview after the teaching

experiment done. The summary of my interviews with the teacher of the experimental group

is given below.

Students’ social economic background

The teacher was also the homeroom teacher of the experimental classroom. Therefore she

knew the students background well enough. In general, students in this school were from poor

families. Students with better economy situation mostly go to the oldest school in this district,

which is considered better by parents. Most parents were farmers who did not have their own

rice fields and thus, are employed in other farmers‟ rice fields. Parents‟ attention and support

to their children‟s schooling was very low. In my opinion, this lack of parental support to

schooling was one factor that caused students‟ poor basic computational skills (skills that

should have been obtained in the precious level of education).

49

Students’ comment on researcher’s language in classroom

Since I received many comments from students that I spoke too fast and used „high‟ language,

I asked the teacher‟s opinion about this. She pointed out that actually I did not do much

talking in the class (I confirmed, I talked mostly in the discussion part), but she said that I did

seem to speak a little bit fast.

“(you did) speak a little bit fast. By fast I mean direct „O this, this, this, topic,

this, this. Right?”

She commented also that maybe I was too fast when I was explaining how to read the growth

chart. I used formal statistical terms like standard deviation there and she said that it was

already difficult for the students to read charts and that perhaps this episode with formal

language was why the students gave the comment on the language use.

Regarding the level of language, she disagreed with the students. Based on her own

experience, the students‟ language skills were not so good. Students from her early career in

teaching once asked her to not use the good and correct Bahasa Indonesia (language).

“… based on my experience, „Bu, dont use Bahasa Indonesia‟, meaning don‟t use

good and correct Bahasa Indonesia. Use some Bengkulu language, Lebong

language10

. That is the language (issue).”

Discussion within groups

I concluded that the group work was not working as well as I hoped. Although the teacher

told me that the students got better in the third and fourth lesson in working cooperatively, I

observed that the cooperative working mainly concerned procedural skills. It is unfortunate

that I could not recheck this issue from the audio stream of the video recording of the teaching

and learning. The audio quality of the video recording was unfortunately too bad to enable me

to catch even discussions within groups nearest to the video camera. From my observation, I

only saw students discussing how to compute this and that, or checking their computational

results. Therefore I asked whether the teacher had observed students discussing the essence of

the question.

Researcher (R): In the first activity the task was to create a rule to decide whether

a child‟s height is very normal, normal, and abnormal. Did you

hear any discussion about that?

Teacher (T): No

R: No, right? Well, yes, I did not eith- (interrupted)

T: The (students‟) discussion happened in your discussion. In the discussion,

(you asked) how would the answer be? Then (the students) started to think

about it. But when the students worked (in group), they did not discuss what

the task was about.

R: No?

T: They just did the computing.

10

Language of instruction in schools is Indonesian national language: Bahasa Indonesia. Bengkulu

language is used in provincial level and this district has its own language (the students‟ mother

language).

50

The teacher also stated that the students were able to compute the average (mean) and that this

actually was the statistics they used when dealing with data.

Willingness to teach

I told the teacher that perhaps if she taught the lesson instead of me, the students could do

better since they were familiar and comfortable with her. I then asked her, what kind of

preparation she would need if she was going to teach the lesson. She commented that she

would need to prepare herself on box-plot diagrams and how to read a growth chart.

The teacher also told me after a lesson, that she was of opinion that the teaching

approach in the experiment is the right approach to use in the class because it makes the

students active. However, she said she would need ready materials because she would not

have time to prepare such activity. If the materials were provided, she was willing to try the

approach. In fact, she asked me to develop materials for the whole academic year and then she

would use it in her teaching.

4.6. Analysis of the Teaching Experiment

It was my personal aim to introduce an alternative approach in teaching statistics in an

Indonesian secondary school classroom. The teaching experiment was designed to use the

concept of social constructivism. By using the term social constructivism, I mean to introduce

the formal statistical terms from students‟ informal understanding, which they jointly have

developed after trying to analyze real data in small teams. I analyze here the extent of the

social constructivist nature of the teaching experiment by analyzing the three components that

I focused on and by reflecting on things went differently or not as well as expected.

The positive

First of all, to some extent, the teaching and learning was in my opinion to some extent social-

constructivist. The key components of the teaching approach that I wanted to try out were

present and the results from the questionnaire seemed to confirm this. I also observed that the

teacher was interested in applying this approach in her daily teaching. She suggested that I try

the lessons also with the natural science students and said that the results would be better.

However there was no time to do this during my stay. I even observed a change in her

teaching after I explained her my goals and the ideas that underpinned my activities.11

In our

11

The teacher of the control group was uninfluenced by my approach because I only explained the lesson to the

teacher of the experimental group. The control group teacher asked for a copy of the pretest after the experiment

but I did not have time to discuss it with him.

51

discussion, we were both optimistic that, if we tried this approach right from the beginning of

the statistics unit, the results would be better.

The less positive

The quality of discussion within group work was not as good as I expected. Students were so

used to being passive and individual learners that they did not seem to know what to do when

they were given tasks without being taught a procedural knowledge beforehand. Moreover,

the tasks were open questions and it seemed to be difficult for them to verbalize their

reasoning and to express their opinions. Given time and guidance, I believe that students

would work better in groups and express their reasoning well.

Next, I found out after the lesson that the context of human growth was not familiar to

the students and this seemed to be a factor that reduced the engagement of students to the

activity. The students did not engage well in my efforts to discuss the creation of data the

task. Yes, they were really working in their groups (so to some extent they were engaged), but

in the first activity, for example, students did not produce any clear rules that I could use as a

foundation or bridge to the formal statistics. There was no specific rule for deciding the

normality of height that being developed by groups. There were groups that seemed to have

some rule to classify whether a datum was normal or not, but no rule was explicated or

externalized.

It does not mean that students had to be able to create sophisticated rules in Activity 1.

I think it means that the teaching experiment was not as social constructivist as I had planned

it to be because in the end I frequently used question-and-answer sessions. Nevertheless, I

tried to avoid being too teacher-centred in my instructions by asking questions that make

students think, giving hints and encouraging students to express their opinions.

Reflection on my own teaching

I acknowledge my limitations as the teacher of the experimental approach. My teaching

experience is limited in years and I have never taught social science students before. This

might not have prepared me to be sensitive enough to the needs of this type of students, for

example to slow down my talking speed a little. An Indonesian experienced teacher once told

me that based on her experience, low-achieving students tend to easily complain that their

teacher explains too fast. In my teaching experiment, the students were unfamiliar with me

and thus perhaps felt hesitant to interrupt me and ask me to slow down, and I was not

sensitive. I am also not an expert in constructivist teaching and learning. Although I have

always encouraged my students to be active learners, I had applied mostly a teacher-centred

52

approach. All these limitations mean that the teaching approach has in my opinion potential to

work better when replicated in similar situations.

53

5. Results and Analysis of the Pretest and Posttest There were 39 students in the experimental group and 39 students in the control group who

did the pre- and posttest. In this section, I present the results of the pretest and posttest as the

main source to answer my research question:

To what extent did the student-centred teaching of variation using real data and open-ended

tasks help to improve Indonesian social science stream (IPS) students‟ reasoning about variation?

There are three subtests of the pre- and posttest questions and the results are presented in the

order of the questions. For the ease of reading, I repeat the questions and their designed

purpose which I have described in general terms in Subsection 3.3.1 and which I will detail

occasionally. Anticipated answers are also given.

First, I describe the results of pretest and posttest separately. Next, I present the results

of students‟ interview about the pretest and posttest. Finally, I give my analysis of the

findings of the pre- and posttest.

5.1. Subtest A: Question 1 and 2

Questions in Subtest A were intended to probe students‟ intuitive understanding of the term

„variation‟ and its use in statistical contexts. The questions were taken from Meletiou (2000).

Question 1

Based on your experience, what does variation mean to you? Give an explanation and/or an

example.

Purpose

In the first question, I wanted to explore the students‟ intuitive understanding of the term

„variation‟, and whether their understanding had changed after the teaching. The teaching

intervention itself did not include a formal definition of the term „variation‟. I also wanted to

test if the Indonesian word for variation that I chose made sense to the student. In textbooks,

the statistical term of „variance‟, i.e., the square of standard deviation, is „ragam‟ in Indone-

sian language and the everyday word „variation‟ is „variasi‟. However, several textbooks refer

to „measure of variation‟ in Indonesian language also as „measure of keragaman‟12

instead of

„measure of variasi‟. I support this use of „keragaman‟ because for me „variasi‟ is just an

adoption. Therefore, I suspect that the meaning would not be immediately familiar to stu-

12

„Keragaman‟ is a noun which is formed from the infinitive „ragam‟. Both are nouns. „Keragaman‟

means the state or condition of being varied and „ragam‟ is the quality of being variable. In my study, I

use „keragaman‟ for „variation‟ and „variability‟ (interchangeable), and I use „ragam‟ for „variance‟ in

accordance with wording in Indonesian mathematics textbooks.

54

dents. Therefore, I decided to use the word „keragaman‟ (which can also means „diversity‟)

for „variation‟.

Anticipated answer

The word „keragaman‟ is similar in sound to the words „keanekaragaman‟ (translation:

diversity). I predicted that the students‟ answers would be close to the meaning of

„keanekaragaman‟. I did not consider the answer of definition of „keragaman‟ in the sense of

„keanekaragaman‟ (diversity) as a wrong answer. This question was intended to be

exploratory and informing. However, after the lesson on variation, I hoped students would

consider the meaning of variation in a statistical sense, which is to consider data sets as

distributions not as collection of individual values. A meaning of variation that I anticipated is

related to distribution of data such as „differences of values from measurement results‟.

Results

During the pretest, I could see that the students were very much clueless of what to do for

Question 1 and 2, especially in case of Question 2. Hence, in order to explain Question 2, i.e.

what is meant by high and low variation, without giving away the answer for the first ques-

tion, I asked the students what they knew about the meaning of variation. After some silence,

a student answered that it means „bermacam-macam‟ (translation: varied), followed by an-

other student‟s „variasi‟ (translation: variation). I justified both answers and then explained

that the higher the variation the more variable it becomes and the lower the variation is, the

more similar it becomes.

In summary, there were three types of responses observed (see Table 9):

A. Variation means varied or having many kinds

B. Variation means sameness or similarity

C. Blank or unclear answer

Perhaps due to my justification, the students‟ answers were mostly „bermacam-

macam‟ (See Table 9). There are 25 out of 39 students in the experimental group who

consistently answered in their pre- and posttest that it means „varied‟ or „a number of kinds‟

(Meaning A). I observed similar results in the pretest of the control group. The teaching

seemed to give little or no effect on students‟ understanding of the term „variation‟.

55

Meaning of ‘Variation’ Experimental Group Control Group

Pretest Posttest Pretest Posttest

Meaning A 25 25 24 28

Varied or having many kinds

Meaning B 8 3 2 1

Sameness or similarity

Meaning C 6 11 13 10

Blank or unclear

Table 9. Three types of answers for the meaning of variation and the distribution of the students‟ answers.

However, I analyzed the examples that the students gave and how the examples

changed from pretest to posttest. Since the question did not ask strongly for examples, not all

students gave an example. For the experimental group, out of 39 students, 12 students in the

pretest and 9 students in the posttest gave no example. For the control group, out of 39

students, in the pretest 4 students gave blank answers and 9 other students gave answers that

either made no sense to me or were clearly deviated from the intention of the question.

Table 10 presents some of the students‟ examples of variation. In the pretest, the

examples given by both groups tended to reflect the students‟ understanding of variation as

the state of having many different kinds, such as variation of cultures, fishes, ethnics and

human personalities. I was hoping that in the posttest, students could give examples that

indicated their ability to view data as aggregates. This expectation did not happen in the

control group. Students of the control group seemed to still see data as collection of individual

values. For the experimental group, there was a small positive change. Few students in the

experimental group gave examples that reflected my expected answer, such as human height,

sizes of Durian fruit and body weight. It was not much surprising because the lesson activities

involved data of human height and weight. Nevertheless, this was a small improvement

illustrating that there was a small numbers of students in the experimental group starting to

see the statistical meaning of variation.

56

Samples of Students' Examples of Variation

Experimental Group Control Group


animals human height “Some trees are tall,

some others are short”

Plants

short, tall, thin, fat body weight Human personalities Houses, ways of

talking

brain capability

(cleverness)

sizes of Durian Ethnics Animals and plants

variation of cultures variation of marks Kinds of plants Marks

shapes of trees facial shape Culture

variation in opinion clothing brand

Table 10. A sample of students' examples of variation.

How about the individual change of answers? In the experimental group, six students

consistently wrote that variation means similarity or sameness (meaning B) and eight students

changed their pretest answer in the posttest; five of them changed either from Meaning B or

Blank to Meaning A. Out of this five, two gave body height as an example (see Table 11).

For the control group, there were more individual changes (see Table 12) that were not

all to be considered positive.

Pretest Posttest

Meaning B without an example Meaning A, example: height

Blank Meaning A, example: height

Meaning B without an example Meaning A without an example

Meaning A, example: table of histogram Blank

Blank Meaning A without an example

Blank Example : variation in marks

Blank Meaning B without an example

Meaning A without an example Blank

Table 11. Changes of students‟ answers for Question 1 from pretest to posttest for the experimental group.

57

Pretest Posttest

Meaning B Not Clear

Meaning A, example: trees Not clear

Blank Meaning A, no example

Meaning A, example: colors Not clear

Not clear Meaning A ** , example: house, ways of talking

Meaning A, example: ethnics Not clear

Not clear Meaning A, no example

Meaning A, example: books, trees Meaning A *, example: marks of tests

Not clear Meaning A, example: marks from tests

Not clear Meaning A, no example

Blank Meaning A, Example: weight, colors, ages

Blank

Meaning A, example marks of a test

Blank Not clear

Table 12. Changes of students‟ answer for Question 1 from pretest to posttest for the control group.

** concluded from the examples as the answer was not clear: “that we often see”

* seemed to have a consideration of variation in data sets: “varied values”

Question 2

For each of the following cases, answer the following question: “Which is more

desirable: high variation or low variation?” Add your reason.





Purpose

The second question is about students‟ understanding of what variation means in practical

cases. Knowing when to minimize or maximize variation is one of the signs of understanding

and statistical reasoning ability.

Anticipated Answer

a. high variation

b. low variation

c. high variation

d. low variation

58

Results

Despite my effort to explain what is meant by high and low variation, the students seemed to

keep having difficulties. Most students seemed to have difficulty in understanding the term

„the desirable variation‟. I classified the students‟ answers into four types:

0. No Answer or answers without any explanation

1. No consideration of variation.

In this type, the students only described the common or the real situation of the case

being asked, such as in Question 2a: the age of trees in a national forest, a student

(MU) wrote „high, because old trees can protect from national disaster,‟ or they

described the desirable value of the case (instead of the desirable variation of the

case).

2. Wrong understanding of high or low variation and no consideration of what variation

is „desirable‟.

Students considered variation but did it wrongly. This happened mostly with students

who gave answer Meaning B in Question 1. For example, for the case the age of trees

in a national forest, a student answered “low, because some trees are old, some have

just been planted.”

3. Fair understanding of high or low variation, but still no consideration of what varia-

tion is „desirable‟; such as for the case scores on an aptitude test: “high, so that we

can assess the ability of the job seeker.”

4. Good understanding of high or low variation and what variation is „desirable‟; such

as for the case scores on an aptitude test, “high, so that it won‟t be difficult for us to

select.”

Table 13 showed the results of the identified students‟ level of reasoning using my categori-

zation. I group Question 2a and 2c, and I also group Question 2b and 2d because the results

showed that students were having problems in understanding Question 2b and 2d. One student

did not know what a „diameter‟ is and the phrase „coming off one production line‟ in the

Question 2b was misinterpreted by most of the students. The same thing happened with

Question 2d for the phrase „of the same brand‟. Due to this language issue, in categorizing

students‟ answers for 2b and 2d I followed students‟ interpretations of the question and

checked whether there was any consideration of variation in their reasoning. Table 13 illus-

trates that the number of students who were at level 3 and 4 increased very little from pretest

to posttest. The majority of students were at Level 1 (No consideration of variation).

59

Level of answers


2a 2c 2a 2c

Pretest Posttest Pretest Posttest Pretest Posttest Pretest Posttest

0 7 4 11 4 4 2 9 6

1 18 12 17 17 26 26 22 25

2 2 5 2 6 0 0 0 0

3 10 14 8 9 5 6 4 6

4 2 4 1 3 4 5 4 2

Level of answers


2b 2d 2b 2d


0 11 7 14 6 10 9 16 12

1 14 15 15 14 18 18 16 17

2 4 5 4 8 0 1 0 2

3 9 6 5 7 7 8 4 5

4 1 6 1 4 4 3 3 3

Table 13. Distribution of the students‟ answers for Question 2.

Was there any individual improvement from pretest to posttest in both groups? Since

the levels are somewhat hierarchical, to investigate this, I checked the changes in answers

from individuals. In Table 14, we can see that there were 16, 12, 13, and 17 students in the

experimental group who showed a positive change in Question 2a, 2b, 2c, and 2d respec-

tively. While in the control group, there were 8, 10, 8, and 12 students who showed positive

change in the posttest. It is noted that there was more positive improvement than decrease as

well. The students from the experimental group showed a greater tendency to change their

answers between pre- and posttest, while the students from the control group showed very

few changes. It seemed that the control group students‟ reasoning prior to the lessons re-

mained least affected by the teaching and learning, either good or weak. All this indicated that

students in the experimental group improved more than students in the control group.

Change

of level

2a 2b 2c 2d

Experimen-

tal Group

Control

Group

Experimen-

tal Group

Control

Group

Experimen-

tal Group

Control

Group

Experimen-

tal Group

Control

Group

-4 1

-3 1 3 1 1 1

-2 1 1 1 0 0 0

-1 2 0 5 3 1 2 2 4

0 19 29 21 23 24 28 20 22

1 8 3 8 6 8 6 12 8

2 5 3 1 2 2 2 2 2

3 2 1 1 2 1 1 2

4 1 1 2 2 2

Table 14. The distribution of individual changes from pre- to posttest for Question 2 for students in the

experimental group and the control group.

60

5.2. Subtest B: Question 3 and 4

The questions in this subtest measured students‟ computational and procedural skills.

Question 3 and 4

3. Given the data: 11, 32, 17, 34, 24, 15, 28


Range

Mean

Median

Upper Quartile

Standard Deviation

Interquartile Range


Monthly Income

( in hundred thousand rupiah )

Number of

People

3 – 5

6 – 8

9 – 11

12 – 14

15 – 17

3

4

9

6

2



Purpose

Question 3 and Question 4 were related to computational skills. Since a comparison between

two different teaching approaches was attempted, it was fair to the control group to include

„traditional‟ questions in the pre- and posttest. Besides, regarding my reflection question, I

hoped that the teaching intervention would not only improve the statistical reasoning skills,

but also the computational skills that are needed in the nation-wide examination. Question 3

dealt with individual data and Question 4 dealt with grouped data.

Anticipated Answer

For Question 3, the correct answer is:

Range 23

Mean 23

Median 24

Upper quartile 32

Standard Deviation 8.2

Interquartile Range 17

For Question 4, the mean = 10 and the standard deviation = 6.2.

61

Results

The questions in this subtest were meant to check the computational and procedural skills of

the students. This kind of question was the typical question that the students were used to. I

wanted to ensure that the students in the experimental group could perform at least equally

well compared to students from the control group. Table 15 and Table 16 show the results for

Question 3 and 4 for both groups, respectively.

Initially, I expected students to use calculators to do these questions and I did ask and

urged them to bring calculators to their mathematics classes. However, it turned out that most

students did not own a calculator. This fact gave some idea to why students from both groups

did not perform so well in items that asked students to compute standard deviation. These

students were social science students whose arithmetic skills were not so well and perhaps

suffered from mathematical anxiety.

So, were the performances from both groups comparable? Yes. First of all, comparing

the number of blank answers in the experimental and the control group and between that in

the pre- and the posttest (Table 15 and 16), I could conclude that the students of the control

group showed less effort to even just try doing the questions. Secondly, an increase in the

number of correct answers from pre- to posttest in the experimental group was slightly better

than that of the experimental group.


Question Items Pretest Posttest Pretest Posttest

Correct Blank Correct Blank Correct Blank Correct Blank

Range 19 9 26 3 3 25 9 29

Mean 16 5 21 2 6 23 9 23

Median 20 4 26 0 13 15 11 27

Upper Quartile 9 12 7 12 2 23 2 24

Standard Deviation 0 26 2 12 0 12 0 30

Interquartile Range 9 15 13 7 1 19 0 24

Table 15. Distribution of the correct answers for Question 3.


Question Items


Correct Blank Correct Blank Correct Blank Correct Blank

Mean 1 9 2 3 3 26 1 34

Standard Deviation 0 23 0 10 0 19 0 36

Table 16. Distribution of students‟ answers for Question 4.

5.3. Subtest C: Question 5-10

The questions in this subtest were mainly related to statistical reasoning about variation.

Students were expected to use their understandings of histograms and central measures and

62

consideration of variation in data, simultaneously with drawing conclusions based on data and

graphical displays. There was also a question about understanding of central measures.

Question 5

Four histograms and two descriptions of data are given below.

i. A data set of Mathematics test scores where the test was very easy

ii. A data set of wrist circumferences of newborn female babies (measured in

centimeters).

a. Which histogram best matches the data in description (i)? Give your reason.

b. Which histogram best matches the data in description (ii)? Give your reason.

Purposes

One aspect of the teaching intervention was to teach variation by letting students analyze real

data sets. In data analysis, looking at and interpreting the graphical data display of the data,

especially in the form of histograms, is important. The ability to read and interpret graphical

displays, especially histograms, was assessed through this question. In addition, this question

assessed students‟ ability to describe the variation of a data distribution.

Anticipated Answer

This question was taken from the Comprehensive Assessment of Outcomes in a First

Statistics course (CAOS) test (Web ARTIST Project, 2005) and the answer key provided is:

a. Histogram B

b. Histogram A.

For sub-question a, students needed to reason that the marks will mostly be high and thus the

correct suitable histogram should be Histogram B. For sub-question b, the anticipated

reasoning was that the measures vary as the weights of new-born babies vary. Therefore,

63

Histogram C was also an answer that would reflect good consideration about variation.

However, the large majority of babies normally have more or less the same weight so the

distribution is more symmetrical. Therefore, histogram A is considered the most suitable.

Results

I considered Question 5a as more straightforward than Question 5b in the reasoning. The

context in 5a is more familiar to the students than the context in 5b. Both questions needed

consideration of variation but for Question 5b, in addition, students also needed to consider

the tendency to normality for a distribution of data such as the wrist circumferences of

newborn female babies. The different level of difficulty for both questions was reflected in

the distribution of the students‟ answers (See Table 17). Out of 39 students, only 5 and 7

students answered Histogram A in Question 5b in the pre- and posttest, respectively. While

there were more students who chose the correct Histogram B in Question 5a.

In general, even when students managed to consider the variation in the data, students‟

difficulty in reading histograms hindered their reasoning to answer this question correctly. For

example, the modal answer for Question 5a was Histogram D. Students‟ explanation in

choosing Histogram D for 5a reflected one common misconception in understanding

histograms, namely, that students did not see the height of the bars as frequency but as case

value. For Question 5b, the modal answer was Histogram B. Students‟ explanation showed

that they still had the misconception of not seeing a data set as an aggregate but as individual

value. Many of them reasoned that since a baby grows, the wrist circumference increases.

Combined with reading bars as case values, Histogram B became the modal answer for

Question 5b.

Answer


5a 5b 5a 5b


A 6 7 5 7 9 6 6 7

B 9 11 12 13 13 12 11 10

C 2 1 6 1 3 4 4

D 17 18 11 8 11 14 9 8

Blank 3 2 8 4 3 3 8 9

Multiple Answer 2 1 2 1 2 1 1 1

Table 17. Distribution of the students‟ answers of Question 5.

I used the SOLO taxonomy (Biggs & Collis, 1982) to categorize students‟ reasoning. I

created the rubric shown in Table 18, which is based on students‟ reasoning from both

experimental and control groups. There were many common aspects in the reasoning of

students the experimental and control groups. Clearly, students of both the experimental and

the control group still had difficulties understanding the histogram. Judging from the amount

64

of time spent in the lesson about histograms and the traditional style of teaching about

histograms that I observed in both groups, this was not really a surprise. In fact, both teachers

were ready to leave out teaching histograms if I had not asked them to teach histograms prior

to the teaching of variation. Several students showed some ability to read the histogram in one

question but failed to do so in the other question. This indicated that the sophistication of

reasoning about variation also depended much in the context of the question.

65

Level General

Indicators

Specific

Indicators Example Students’ answer

1 Pre

structural

The task is not

attacked

appropriately;

the student

hasn‟t really

understood the

point and uses

too simple a

way of going

about it

No consideration

of variation in the

data; e.g. for 5a,

many high scores

and no low scores

in the test and for

5b, the

measurements

vary but most of

them are around a

central measure.

Inability to read

histograms.

Just stating that

the chosen

histogram

fitted the

description of

the data.

Explaining the

shape of the

chosen

histogram.

Blank or no-

sense answers

“Histogram D

because it best

matches (the data)”

(5a)

“Histogram B

because the scores

increased

drastically” (5a)

“Histogram C

because wrist

circumference of

newborns are very

small” (5b)

“Histogram B

because babies

grow so that the

circumferences

increase” (5b)

2 Uni

structural

The student's

response only

focuses on one

relevant aspect.

Recognition of

variation in the

data (mentioned

in level 1) but

focused only on

this.

Inability to read

histograms.

Choosing a

histogram

based on the

height of the

bars.

“Histogram C

because there were

many that got high

scores” (5a)

“Histogram C

because it showed

variation” (5b)

3 Multi

structural

The student's

response

focuses on

several relevant

aspects but they

are treated

independently

and additively

Able to consider

variation in the

data and to read

histogram but

independently.

Choosing

histogram

(either correct

or not) based

on the two

main aspects.

“Histogram B

because most of the

scores are high”

(5a)

“Histogram D

because the values

vary” (5b)

4 Relational The different

aspects have

become

integrated into a

coherent whole.

This level is

what is

normally meant

by an adequate

understanding

of some topic

Able to consider

variation in the

data and to

choose the correct

histogram in the

context.

Giving good

explanation of

the variation in

data and

connect it to

the shape of

histogram.

None

Table 18. Levels of students‟ reasoning for Question 5 based on SOLO Taxonomy.

66

So, how did the students of the experimental group perform compared to the students

of the control group? The results listed in Table 19 do not give the correct picture of students

reasoning, as I found that they might choose the correct histogram but for wrong reasons. The

distribution of levels of reasoning based on the rubric in Table 18 can be found in Table 19.

From this table, I inferred that students of the experimental group performed better than the

students of the control group. For both questions, there were some improvements in level of

reasoning in the posttest for the experimental group. On the other hand, students in the control

group barely showed any improvement.

Level

5a 5b

Experimental Group

Control Group Experimental Group

Control Group


1 18 11 26 27 32 24 33 32

2 18 21 12 11 5 9 6 7

3 3 7 1 1 2 6

4

Table 19. Distribution of the students‟ levels of reasoning for Question 5.

I also looked at the individual change of levels to compare the performance. From Table 20

for Question 5a, it can be seen that there were more students from the experimental group that

improved their reasoning level (11 students) and fewer students that decreased in perform-

ance. For Question 5b, the students of the control group almost did not change at all.

Change of level


5a 5b 5a 5b

-2 1 1

-1 1 4 5

0 26 24 28 38

1 9 9 4 1

2 2 2 1

Table 20. Distribution of the individual changes of reasoning level for Question 5.

Comparison of the pretest results and the mean gains between groups

Firstly, I conducted for each sub-question a two-tailed independent samples t-test to evaluate

via the pretest that there was no difference in students‟ ability levels of reasoning at the start

of the intervention. In both cases, the null hypothesis was supported. The pretest mean scores

of students in the control group were not statistically different from those of students in the

experimental group: in Question 5a (t = 1.59, df = 76, two-tailed p-value = 0.115) and in

Question 5b (t = 0.54, df = 76, two-tailed p-value = 0.592).

67

Secondly, I conducted a paired-samples t-test for both the experimental and control

group (each group size equal to 39) to evaluate via the pretest and posttest the null hypothesis

that there was no difference in students‟ levels of reasoning before and after the teaching. For

the experimental group, the null hypothesis was not supported: in Question 5a (t = 2.24, two-

tailed p-value = 0.031) and in Question 5b (t = 2.04, two-tailed p-value = 0.048). For the

control group, the null hypothesis was supported: in Question 5a (t = -0.24, two-tailed p-value

= 0.812) and in Question 5b (t = 1.00, two-tailed p-value = 0.324). These results indicated that

there was a significant gain in the experimental group, but not in the control group.

Finally, I conducted for each sub-question a two-tailed independent samples t-test to

evaluate the null hypothesis that the there was no difference in the mean gains between the

experimental and the control group. The null hypothesis was supported: for both Questions 5a

(t = 1.80, df = 76, two-tailed p-value = 0.076) and for Question 5b (t = 1.77, df = 76, two-

tailed p-value = 0.081). It seemed that although there was statistically significant gain in the

experimental group, the gain of the experimental group did not differ too much from the gain

of the control group to be statistically significant.

Question 6

Two students who took mathematics tests received the following scores (out of 100):

Student A: 60, 90, 80, 60, 80

Student B: 40, 100, 100, 40, 90

If you had an upcoming mathematics test, who would you prefer to be your study partner,

A or B? Why?

Purpose

In this question, I wanted to find out whether students use any measure of variation in

comparing two data sets. This question can be considered as a question that assesses the

ability to read beyond the data (in the sense of Curcio, 1987) in the process of analyzing and

interpreting data. Based on the two data sets, this question implicitly asked students to predict

the future performance of students A and B and thus I could verify whether students used any

consideration of variation in their reasoning.

Anticipated Answer

This question was taken from Meletiou (2000). She found the following: (i) out of 30

university students, 50 % of the responses chose option „student A‟ because there was less

variation (consistent); (ii) 33 % of the responses chose „student B‟, viewing the perfect score

as potential; and (iii) 15 % of the responses said “it wouldn‟t really matter who you study

68

with since they both are essentially the same grade standing of 74%. They compensate each

other” (p. 131). The three kinds of answers could also be found in my study. However, I

realized that the answer to this question might depend on the students‟ experience and

expectations. Students might consider their own situation in answering the question. This is

not a problem because there is no correct and wrong answer. I wanted to check if students

considered variation in their answer and not only informally invented measures or central

measures. However, because student A and student B in the question have the same average

score, I considered answer „Student A‟ as the correct answer because his/her scores are less

variable and so more reliable as a study partner.

Results

For this question, there were three types of students‟ answers that I observed (Table 21):

Student A;

Student B;

Anyone is fine.

The modal answer was „Student A.‟

Although the answer „Student A‟, which was the answer I considered as the correct

answer, was the modal answer, the students‟ reasoning behind it was mostly not what I had

expected. Students mostly based their conclusion on an informal measure such as the extreme

values or a particular standard value. In the pretest, none of the students seemed to consider

standard deviation in their reasoning.

Answer Experimental Group Control Group


Blank/Unclear 2

Student A 35 34 28 24

Student B 1 3 6 11

Anyone 3 2 3 4


To investigate the students‟ reasoning, I used the framework of Mooney (2002) to

categorize students‟ answers. There are four levels of statistical reasoning in Mooney‟s

framework and I defined the descriptors for each level based on students‟ answers. The

descriptors of the four levels (see also, Subsection 2.2.2) that I identified are:

1. Idiosyncratic : using blank or unclear answers

Examples:

“A because his marks are greater than B‟s.”

“A because (he) has satisfying marks.”

69

“I chose A because his marks are little (numbers) so it is easy to compute without a

calculator.”

2. Transitional: using informal or invented central measures, such as the extreme values

or some values based on certain life experience (e.g., 60 as the standard passing

score), or an informal measure of variation.

Examples:

“A because A‟s scores are more satisfying. Although B got 100 twice but all A‟s

scores are above the standard.”

“A because his marks are higher than B‟s and the differences are not too far.”

“In my opinion, I will choose B because his results are satisfying. Although there is a

40 but the 100 is enough to compensate the deficiency.”

“B, because the standard deviation of B‟s marks is higher than A‟s.”

3. Quantitative: using formal central measures, namely the mean or median. Since the

numbers of data on the two sets are equal, I include the sum as a formal measure here.

Example: “Because the average score of the two students are the same, so I choose

student A because all his marks are above passing mark.”

4. Analytical: using measures of variation in combination of the central measures.

Example: “A because his typical score is not to far away from the sum of his marks.”

The distribution of the students‟ levels of reasoning is presented in Table 22. The

majority of students from both the experimental and the control group were at level 2. As

mentioned earlier, most students drew their conclusions from an informal measure, most

notably based on whether all marks passed 60, the standard passing mark. Another typical

reasoning was to base the answer on extreme values, e.g., the perfect mark. Not many changes

happened in the posttest although the number of students in level 1 decreased in both groups.

Table 23 also showed not much level changes.

Level Experimental Group Control Group


1 9 5 12 8

2 25 29 23 25

3 5 4 4 6

4 1


70

Change of Level Experimental Group Control Group

-1 4 1

0 29 32

1 4 5

2 1 1

3 1

Table 23. Distribution of the individual changes of levels of reasoning for Question 6.


Firstly, I conducted a two-tailed independent 2-sampled t-test to evaluate via the pretest that

there was no difference in students‟ ability levels of reasoning at the start of the intervention.

The null hypothesis was supported. The pretest mean scores of students in the control group

were not statistically different from those of students in the experimental group (t = 0.75,

df = 76, two-tailed p-value = 0.457).

Secondly, I conducted a two-tailed paired t-test for each group to evaluate via the

pretest and posttest the null hypothesis that there was no difference in students‟ levels of

reasoning before and after the intervention. For the experimental group, the null hypothesis

was supported: (t = 1.09, two-tailed p-value = 0.281). For the control group, the null hypo-

thesis was also supported (t = 1.97, two-tailed p-value = 0.057). These results indicated that

there was no statistically significant gain in either the experimental group or the control

group.

Finally, I conducted a two-tailed independent samples t-test to evaluate the null

hypothesis that the there was no difference in the mean gains between the experimental and

the control group. The null hypothesis was supported: (t = -0.18, df = 76, two-tailed p-value =

0.856). Thus, there was no statistically significant difference between the mean gain of the


Question 7

One day Dedi caught a very big catfish from his rice field. He wanted to be sure of the

weight of the fish and therefore he weighed it 7 times on the same scale/balance. Below

are the measurements (in kilogram) that he found:

2.9; 2.7; 5.1; 3.1; 3.0; 2.8; 3.0 kg.

a. How spread out are the measurements he obtained?

b. How many kilograms do you think the true weight of the catfish was? Give your

reason.

71

Purpose

This question was a modified version of one of the questions used by Groth (2003) as a tool to

describe high school students‟ statistical thinking. In his study Groth used the question to

investigate students‟ thinking in using measures of centre (7b) and measures of spread (7a). In

my pretest, I used the same question but I changed the context and the data. In Groth‟s study,

the measurements were all different but in mine, I deliberately put two same measurement

values to find out whether students immediately used the formal measure of mode. I wanted

to find out what measures of centre and variation the students possibly used to draw an

informal conclusion and whether their views on the measures of centre were adjusted when an

outlier is present.

Anticipated Answer

For the first question, I expected in the pretest that most students answered it by mentioning

the easiest measure of variation, namely range, for example, the measurements are spread out

from 2.7 kg to 5.1 kg. Besides this, since there is an outlier, I also expected that after the

lessons, students might also come up with the interquartile range, which is “around 1-2

ounces from 3.0 kg.”

For the second question, I thought most students would think of using the notion of

mode. However, I preferred that students either used the median for the estimation of the true

weight or just used the mean by first omitting the outlier.

Results

Question 7a. was perhaps the most misunderstood question in the pretest as well as in the

posttest. The phrase “how spread is…” that I assumed to be easy to be grasped turned out to

be confusing for the students. I assumed that the most probable answer, that the students

would immediately think of, was to answer the question in one of the measure of spread

namely range. Range has been taught to the students of the experimental group previously by

the regular teachers as a part of 5-number summary and I assumed the same case for the

students of the control group.

However, none of the students from both groups gave the range as an answer in the

pretest and some of them did not answer at all (See Table 24). In fact, for the control group,

even in the posttest no measure of spread was used in their answer. Their explanations did not

indicate any understanding of the question. I concluded that this was due to the limited

linguistic skills. Therefore, I decided to exclude this Question 7a from my analysis.

72

Answer

7a



Blank 12 3 9 5

Not use any formal measures of spread 27 29 30 34

Use the range 7

Table 24. Distribution of the students‟ answers of Question 7a.

For the second question, Table 25 shows that when the data are not all different, many

students would use mode as the central measure. My expectation that students considered the

effect of the presence of an outlier to the central measure was not met. No students computed

the mean by first excluding the outlier. However, more students in the experimental group

used the mean (of all data) or the median for the central measure (from 3 in the pretest to 11

students in the posttest). In contrast, there seemed to be almost no change in the control group

(from 4 in the pretest to 5 in the posttest). The modal answer in the control group was still not

to make use of any central measures. No students in the control group used the median. It

indicated that the teaching intervention stimulated students to use formal measures.

Answer

7b



Blank or no explanation 8 2 6 3 Not use any formal central measure 12 8 19 19

Use the mode 15 16 10 12

Use the mean of all data 3 11 4 5

Use the median 1 2

Table 25. Distribution of the students‟ answers for Question 7b.

Question 8

The below histogram shows the number of hours of exercising per week by marketing

staffs of a bank.

876543210

9

8

7

6

5

4

3

2

1

0

Number of Hours

Nu

mb

er

of

Pe

op

le

Histogram of Number of Exercising Hours Per Week

73

a. Compute the median. _______________

b. Compute the mean. ________________

c. Based on the histogram, how many hours do the staffs in this company usually

exercise per week? Give your reason.

Purpose

The third part of this question was indirectly assessing students‟ understanding of variation.

The prominent emphasis was an assessment of the students‟ knowledge of the statistics that

they have learned prior to the intervention, i.e., measure of centre, and whether this knowl-

edge has improved after intervention. Part a and b were explicitly about central measures.

However, in part c, students need to be able to decide which measure of centre can explain the

nature of the data, taking into account the variation or the shape of the graphical display.

Herein was the indirect assessment of any consideration on variation.

Anticipated Answer

The shape of histogram is skewed, so it is better to use the median instead of the mean as the

answer to the question. To compute the median and the mean, students must be able to read

histograms. Otherwise, the misconception such as using the mid value of x-axis as the median

could occur.

Results

The purpose of Question 8 was to probe students‟ understanding of the meaning of central

measures, specifically their understanding of the use of mean versus median and their ability

to compute the arithmetic values for the histogram. Because my study focused on students‟

reasoning about variation, I did not delve deep in the first two questions.

Research on students‟ understanding and misconceptions about the central measures,

in particular the mean and the median are quite abundant (See, for example, Garfield & Ben-

Zvi, 2008, Chapter 9 and references therein) and although I saw the misconceptions

mentioned in the research literature, I do not want to discuss it here. My focus for Question 8a

and 8b was actually whether students could compute the mean and the median. And the

results in Table 26 indicated that students had difficulties in computing them. I suspected that,

similar to the result from Question 5, students‟ understanding of the notion of histogram was

poor and that this was the main reason that they could not compute the mean and median.

74

Answer

8a 8b

Experimental Group

Control Group Experimental

Group Control Group


Blank 10 7 16 10 15 8 17 13

Wrong 28 29 20 21 24 29 22 26

Correct 1 3 3 8 2

Table 26. Distribution of the students‟ answers for Question 8a and 8b.

For Question 8c, I checked which central measure the students used and the reasoning

behind. The results for Question 8c are listed in Table 27. The performance in the pretest was

really poor. Most of the students did not answer or answered without using any measures of

centre. The lack of understanding of the histogram again played a role here. The title of the

histogram clearly mentioned that the data was about number of exercising hours per week

and thus the horizontal axis meant the number of hours. Students‟ answers revealed that many

considered the horizontal axis as numbers of hours spent by one person in different week. I

also found in the given answers that the students misunderstood the horizontal axis with the

number of hours per day, so that when the students used the mode, they would multiply it by

7 and also sometimes, 24. In the posttest, the numbers of students using mode and median

increased, and Table 28 shows that there was more improvement for students in the experi-

mental group (17 students) than in the control group (9 students).

Categories

8c



0 Blank 17 10 18 14

1 No Central Measures 15 9 13 14

2 The Mean 9

3 The Mode 7 9 8 11

4 The Median 2

Table 27. Distribution of the students‟ answers for Question 8c.

Change of Level

8c


-3 1

-2 2

-1 3 3

0 18 25

1 7 4

2 5 2

3 5 3

Table 28. Distribution of the individual changes of levels of reasoning for Question 8c.

75

Question 9

Forty college students participated in a study of the effect of sleep on test scores. Twenty

of the students studied all night before the test in the following morning (no-sleep group


The test scores for each group are shown in the diagrams below. Each dot on the diagram

represents a particular student‟s score. For example, the two dots above the 80 in the

bottom diagram indicate that two students in the sleep group scored 80 on the test.

• •••

•••

•••

•••

•••

• • • •

30 40 50 60 70 80 90 100


• • • • •••

•••

•••

••

••

••

•

30 40 50 60 70 80 90 100


Examine the two diagrams carefully. Which group is better: the sleep group or the no-

sleep group? Explain your reasons.

(Note: In the actual test sheet, the following question was on a different page at the

back of the page of the above question. This was designed so that students would give

their own reasoning before choosing one of the multiple choices below).

Then circle one of the 6 possible conclusions listed below that you mostly agree with.

a. The no-sleep group did better because none of these students scored below 35 and

the highest score was achieved by a student in this group

b. The no-sleep group did better because its average appears to be a little higher

than the average of the sleep group.

c. There is no difference between the two groups because there is considerable

overlap in the scores of the two groups.



e. The sleep ground did better because more students in this group scored 80 or

above.

76

f. The sleep group did better because its average appears to be a little higher than

the average of the no-sleep group.

Purpose

In this question, I wanted to find out whether students would use the combination of measure

of centre and variation when comparing two data sets. This problem was taken from the

Statistical Reasoning Assessment (SRA) designed by Garfield (2003). The correct answer is

(d) and other choices are common misunderstandings of students, for example, paying

attention either to the extreme values only or to the average. I wanted to test whether students

realized that in comparing two data sets, they needed to consider not only central measures

but also measures of variation.

Anticipated Answer

Garfield (2003) designed the question to find out wether students took variation of the scores

into consideration when they compared the two data sets. A correct reasoning skill, namely

„understanding sampling variability‟, corresponds with option d and a misconception,

„comparing groups based on their averages‟, corresponds to option b and f (Garfield, 2003, p.

27). I predicted that quite some students would choose the misconception, but I hoped that in

the posttest many students would choose option d.

Results

This question was formatted such that students were first expected to freely explore the open-

ended question before choosing one of the multiple choices which were provided on the next

page. This question design was unsuccessful. From now on, I refer to the open-ended question

as the first part and the closed question (the multiple-choice question) as the second part. Due

to the unfamiliarity of the students with the first part, most students just browsed through the

pages and so saw the multiple choices. This resulted in students only copying one of the

multiple choices in the second part into the first part.

Adopting levels of statistical thinking for „Analyzing and Interpreting Data‟ from Mooney

(2002), I categorized students‟ answers into the following four ability levels and descriptors

for each level:

1. Idiosyncratic: students make no inferences or inferences that are not based on the data

or based on irrelevant contextual issues.

Blank answers or answers that do not make sense and answers based on anecdotal

experience are categorized into this level.

77

2. Transitional: students make inferences that are primarily based on the data.

Options a, c, and e correspond with this level. Students base their answers on extreme

values, certain values or simple graphical properties. One typical answer from students

was one that compares the number of high or low marks, for example “Sleep group,

because the sleep group has more high marks than the no-sleep group” and “Sleep

group, because in sleep group only 10 people got marks below 60.”

3. Quantitative: students make reasonable inferences based on data and the context.

I consider option b and f, where the mean is the only basis for comparing data sets to

belong to this level.

4. Analytical: students make reasonable inferences based on data and the context using

multiple perspectives.

Consideration of variation and mean is the underlying aim for this question and

therefore if students‟ reasoning shows consideration of both variation and the central

measures, for example option d, I categorize it in this level.

The modal answer for the second part was option e in the experimental group and

option f in the control group (See Table 29). This indicated that most students of the

experimental group were in the Level Transitional, since they used informal measures to draw

conclusions. Students of the control group showed this indication as well, but there were more

students who chose option f and this means that they were using primarily the mean to draw

conclusions (Level Quantitative). However, the result from the open question part (See Table

30) showed that the students from the experimental group seemed to improve more compared

to those of the control group. Fewer students reasoned idiosyncratically and more students

reasoned quantitatively. In the control group, the majority of the students were at the

idiosyncratic level in the open question part and there was not as much change at the

quantitative level as in the experimental group.

Answer Experimental Control


a 4 2 1 1

b 1 6 5 3

c 1 2 1 5

d 5 5 7 4

e 11 10 6 4

f 6 8 9 14

Blank 10 3 9 6

Multiple Answers 1 3 1 2

Table 29. Distribution of the students‟ answers for the closed part of Question 9.

78

Level


Open Closed Open Closed


1-Idiosyncratic 12 6 10 3 22 25 9 7

2-Transitional 24 24 16 15 12 11 8 10

3-Quantitative 3 9 7 16 5 3 15 18

4-Analytical 6 5 7 4



Firstly, I conducted for each part a two-tailed independent 2-sampled t-test to evaluate via the

pretest that there was no difference in students‟ ability levels of reasoning at the start of the

intervention. For the closed part, the null hypothesis was not supported (t = -3.87, df = 76,

two-tailed p-value = 0.000). For the open part, the null hypothesis was also not supported

(t = 3.36, df = 76, two-tailed p-value = 0.001). Thus, the experimental and control group

differed significantly in the statistical sense. This is the reason why I will below not compare

the two groups with each other, but only look at gain within a particular group of students.

Secondly, I conducted a two-tailed paired t-test for each group of students (N=39) to

evaluate via the pretest and posttest the null hypothesis that there was no difference in

students‟ levels of reasoning before and after the intervention. For the experimental group, the

null hypothesis was not supported in the closed part (t = 2.51, two-tailed p-value = 0.017), but

it was supported in the open part (t =1.97, two-tailed p-value = 0.056). This indicated that in

the experimental group, there was a statistically significant gain in the closed part, but there

was no statistically significant gain in the open part. On the other hand, for the control group,

the null hypothesis was supported for both the closed part (t = -0.16, two-tailed p-value=

0.872) and the open part (t = -1.22, two-tailed p-value = 0.230). This indicated that there was

no statistically significant gain in the control group, but this was not a big surprise for the

closed part because the students in the control group were already at a higher level compared

to the experimental group (In retrospect, normalized gain would have been a better variable to

use in the statistical analysis).

Question 10

Below is the histogram of the scores of a mathematics test in two classes.

79

Scores

9585756555

Class A

Fre

qu

en

cy o

f sco

res

24

21

18

15

12

9

6

3

0

Scores

9585756555

Class B

Fre

qu

en

cy o

f S

co

res

24

21

18

15

12

9

6

3

0

a. Comparing the two histograms, one could infer

i. Variation of scores in Class A is higher than in class B. (The scores in

class A vary more than the scores in class B)

ii. Variation of scores in Class B is higher than in class A (The scores in

class B vary more than the scores in class B)

iii. Class A and class B have equal variation of scores.

iv. I don‟t know.

b. Why? Give your reason.

Purpose

Finally, in the last question, students were asked to compare the variation in two data sets, in

the form of graphical displays, namely histograms. The ability to understand histograms was

essential here and I wanted to test whether the intervention improved this ability.

Anticipated Answer

The correct answer is in my opinion that the variation of scores in class B is higher than the

variation of scores in class A. This problem was taken from Cooper & Shore (2008), but I

decided to use the word „variation‟, instead of „variability‟. However, besides the ability to

consider both the case value and the frequency aspects of histograms, I thought that it was

also possible that students could still correctly answer this question when they consider the

standard deviation as a measure of variation. I think that this is especially true in the Indone-

sian curriculum where students learn how to estimate the mean and the standard deviation of

grouped data. Intuitively, as most scores for Class A are concentrated around the centre com-

pared to Class B, the standard deviation should be smaller and therefore the variation would

be less.

80

Results

Similar to Question 9, by adopting levels of statistical thinking for „Analyzing and Interpret-

ing Data‟ from Mooney (2002), I categorized students‟ answers into four levels and identified

descriptors for each level:

1. Idiosyncratic: students make no inferences or inferences that are not based on the data

or based on irrelevant contextual issues.

Blank or unclear answers and answers based on anecdotal experience are categorized

into this level.

2. Transitional: students make inferences that are primarily based on the data.

Students base their answers on extreme values, certain values or simple graphical

properties. A typical response was one that compared the number of high or low

marks, for example “because class A gets more mark 75 than class B” and “… and

also in class B there are many students who get pretty high marks.” A response based

on the range of values belongs to this level. Another example of reasoning in this level

was the following: “because both diagrams have symmetrical shapes.”

3. Quantitative: students make reasonable inferences based on data and the context.

I consider answers in which a formal statistical measure is the basis for comparing

data sets, belong to this level. Students in this level did some further exploration such

as considering a central measure or a measure of variation. One example was the

following answer: “because the average value in class B is a little larger than the

average value in class A.”

4. Analytical: students make reasonable inferences based on the data and the context

using multiple perspectives.

In this level, students consider both the central measure and measure of variation. One

simple example was the consideration of the deviation from the central value: “… and

also values in Class A are far from the average value.”

The modal answer for this question in both experimental and control group was that

the „variation of marks in Class B is higher than that in Class A‟ (See Table 31). From Table

37, regarding the distribution of answers, it can be concluded that there was no change

between the pretest and posttest results in both groups.

Although the modal answer was the answer I considered as correct, I could not

conclude that students had good understanding of variation or histograms. Most students of

both experimental and control groups gave explanations which did not make much sense at all

(The idiosyncratic level) or only based their reasons on some invalid informal measures (The

81

transitional level, see Table 32). Popular reasons were, for example, repeating the options,

comparing the frequency of the highest mark, or simply comparing the height of the middle

bar. And this happened in both groups.

Answers Experimental Group Control Group


Variation of marks in Class A is higher than that

in Class B

14 15 10 8

Variation of marks in Class B is higher than that

in Class A

16 17 10 12

Class A and Class B have equal variation 7 5 5 7

I don't know 2

Blank 2 14 12


Level Experimental Group Control Group


1 Idiosyncratic 10 12 28 31

2 Transitional 27 22 11 8

3 Quantitative 2 5

4 Pre-Analytical 0



Firstly, I conducted a two-tailed independent samples t-test to evaluate via the pretest the null

hypothesis that there was no difference in students‟ ability levels of reasoning at the start of

the intervention. The null hypothesis was supported (t = 1.19, df = 37, two-tailed p-value =

0.240). Thus, the pretest mean scores of the students in the control group were not statistically

different from those of the students in the experimental group

Secondly, I conducted for each group a paired samples t-test to evaluate via the pretest

and posttest the null hypothesis that there was no difference in students‟ levels of reasoning

before and after the intervention. For the experimental group, the null hypothesis was

supported: (t = 0.22, two-tailed p-value = 0.831). For the control group, the null hypothesis

was also supported (t = -0.90, two-tailed p-value = 0.373). These results indicated that there

was no statistically significant gain in the experimental group and the control group.

Finally, I conducted a two-tailed independent samples t-test to evaluate the null

hypothesis that the there was no difference in the mean gains between the experimental and

the control group. The null hypothesis was supported: (t = 0.70, df = 76, two-tailed p-value =

0.487). Thus, there was no statistically significant difference between the mean gain of the


82

5.4. Result of the Interview

I interviewed six students of different levels of readiness from the experimental group after

the teaching experiment and the posttest ended. Because the examination period had started, I

could not interview students from the control group. In the interviews, I tried to find out if I

could gain more insight into students‟ reasoning in answering the pretest and posttest than

what the students have shown in their written responses. As mentioned in chapter 5, the

students seemed to have some difficulties in linguistic ability. I wanted to check if students‟

reasoning might be more sophisticated but could not be expressed due to difficulty in

expressing thoughts in a written form.

The interviews with the students did not give me more insight than what I had already

obtained from the pretest and posttest. I went through the students‟ answer in the pre- and

posttest and students did not seem to remember what their answers were without looking at

their written answers. This might indicate that the reasoning levels of the students were not so

high. One of my questions was if the students did any computation in any questions in the

test, for example in Question 6 for which pupils were to choose a study partner. It seemed that

if they did not indicate it in their answer, then indeed they did not do any computation.

I interviewed a student who was active in the teaching experiment. According to the

teacher, this student was an average learner in the class. However, his pretest result was

amongst the better ones in the class and during my interactive lecture and discussion, he was

one of the very few who come up with ideas. He did his posttest in the teachers‟ room

because he did not go to school at the day the posttest was given. I had the opportunity to

observe his effort in doing the posttest. Due to bad timing of the posttest (it was the last day of

the school before the exam period and school ended early), he was however impatient to

finish. I saw that in Question 2, for example, he gave his answer without his reasoning, So I

urged him to do it seriously because he still had much time. When finally I allowed him to

finish, he told me that the pretest was not a mathematics test, but more of a Bahasa test

(language test) and he never had gotten such test before. The result of his posttest was better

than the pretest and I considered him among the better students in the class. In my interview,

he showed good interpretation of standard deviation. I asked him the following question:

Researcher (R): Now, if for example you want to sell Durian13

. You are a

Durian farmer, just did harvesting, and you have plenty of

Durian. Because you do not have time to sell the Durians

yourself, you want to look for two people.

Student (S): yes.

13

A popular seasonal fruit. I made up different contexts for this kind of question for different

students to anticipate a student telling the next interviewee the questions I asked.

83

R: Now, suppose there are these two people. If- You hired them too last year,

let‟s name them A and B.

I explained that he observed these two persons‟ sales performance by

recording their daily sales number for a week. He then computed the mean

and standard deviation and found out that the means are equal but with

different standard deviation.

R: If A and B‟s daily-sales means are equal, for example their mean is

selling 50 Durian per day, but the standard deviation for A is 1 and

standard deviation for B is 10. Which of them will you hire, if you only

want to hire one person? A whose standard deviation is 1 or B whose

standard deviation is 10?

S: the one with standard deviation 1, Mam.

R: Why?

(The student paused (thought) for around a minute so I told him he could use

his native language if he had difficulty in Bahasa Indonesia)

S: because--- (almost two-minute pause/thinking)

S: because he can sell, in a day, Mam, for example (pause) 30. Maybe (he

can sell) the next day 30.

R: Okay. If let say today 30, tomorrow 30. How about the other person?

S: The other person, if today (he can sell) 20, maybe tomorrow (he can sell)

1, and the day after that (he can) reach 50.

From his answer, I infer that he could connect the values of standard deviation to making

prediction. I could not see this understanding of standard deviation in the posttest, for

example in Question 6. His answers for Question 6 were the same in the pre- and posttest:

Anyone could be a study partner because the average marks are the same. He did not seem to

consider any measure of variation. This was partly due to the different context of the

questions but this might also indicate that at early learning stage of statistical reasoning,

students need to be directly pointed to employ the notion of measure of variation. This

particular student had showed a glimpse of intuitive reasoning about the term „variation‟

(from Question 1 and 2) in the pretest and his reasoning improved in the posttest.

I asked all the other interviewed students the above question (in different contexts

sometimes) and found that students who did not perform well in the posttest could not answer

the question.

In summary, it seemed that the interviews did not give me more information about

their reasoning than what I had already observed in the pre- and posttest. My limited

experience in interviewing might be a factor. However, I considered the one particular student

84

I described above as one indication of positive effects of the teaching approach. He had some

behavioral problems (meaning that he did not really take school seriously, for example the

teacher told me he faked the parent‟s signature on his permission letter on the posttest day)

and I believe that traditional teaching could let his potential go unnoticed and undeveloped.

5.5. Summary and Analysis of the Findings from Pretest and Posttest

5.5.1. Subtest A

First of all, I noticed that students had difficulty with the wording of the questions of this

Subtest. In future use, the wording needs to be made more appropriate for pupils. Secondly,

the performance of students in the experimental and control groups was comparable.

However, the teaching intervention seemed to give students more opportunity to change their

prior understanding of the meaning of variation. For both questions in subtest A, although the

improvement in the experimental group was very limited, positive changes were present. The

traditional teaching seemed to have less impact on students‟ reasoning as there were fewer

students who showed positive improvement in the control group.

Regarding the meaning of the term „variation‟, the majority of students still intuitively

understood it as „diversity‟. From the examples of variation given, I found more examples of

variation from the experimental group that showed a reasoning of connecting „variation‟ and

data distribution. Although students from the experimental group who in their learning about

variation dealt with real data and therefore I expected many such examples in the posttest

result, the number was not as big as I hoped.

I found the two types of meaning of variation that were also present in the results of

Meletiou‟s study (2000), from which the questions were taken. Several university students in

Meletiou‟s study gave definition that indicated their viewing „variability‟ as „variety‟ or

something that takes multiple values. This is similar to Meaning A that I found: „varied or

having many kinds‟. Other students defined „variability‟ by equating it with the mathematical

notion of „variable‟. This is similar to the mistake that students in my study made: equating

„keragaman‟ with „keseragaman‟.

However, for Question 2, the university students in Meletiou‟s study gave reasonable

answers that indicated their understanding when to expect high or low variability. In my

study, the students seemed to misunderstand that the question asked for the desirable

variation. I suspect that this was because of the problem of language proficiency of the

students. I found out afterwards that there were students who did not know the meaning of

„diameter‟. It seemed that understanding a text was still a little challenge for most students. In

85

a future replication of this study, I would make the wording and/or layout of the question

easier for this type of students.

5.5.2. Subtest B

Students from both experimental and control groups did not perform well in Question 3 and 4,

especially in the task of finding the standard deviation. One possible reason was that the stu-

dents did not use calculators to ease the computational process. Qualitative analysis showed

that there was no significant difference in the performance of the students in the experimental

and control group. This indicated that the teaching experiment did not negatively affect the

computational and procedural skills of the students.

5.5.3. Subtest C

The results from the t-tests I performed in this subset are listed in Table 33. Based on the

independent two-sampled t-test I performed in the pretest results, I conclude that there was no

significant difference in the ability level of students in the two groups (3 out of 4 tests showed

no statistically significant differences). The two groups were more or less comparable.

The t-test results for comparing the mean gain between groups did not give strong

indication that the students in the experimental group gained more or improved more in their

reasoning level. However, the paired t-tests results showed that out of four questions, students

in the experimental group seemed to gain more in one and half question, Question 5 and the

closed part of Question 9. This is not a strong indication, but it showed positive potential.

Two-tailed t- test Statistical Significance for Question (Yes/No)

5 6 9 10

difference of mean pretest results between groups

N N Y N

Gain within groups: experimental/ control

Y/ N N/N Y (closed part only)/N N/N

Difference of mean gains between groups

N N not applied N

Table 33. Summary of the t-test results for Question 5, 6, 9, and 10.

Regarding Question 7 and 8, I chose not to perform any statistical tests because I

decided that the categories of reasoning that I employed were not exactly hierarchical. Again,

from these two questions, students from the control group seemed to keep their reasoning

intact. There were few students who changed their reasoning after the learning (See Table 28).

A striking difference is that there were more students in the experimental group who used

formal central measures in the posttest than in the control group. In fact, for Question 8c I

could not find any students‟ responses from the control group that used the mean or median in

their reasoning (See Table 27).

86

Finally, the results indicated that, regardless of the evidence of improvement, the

responses from the students showed that they were mostly at a low level of reasoning about

variation. Many of them were still in the idiosyncratic level or prestructural level (Level 1) of

reasoning after the learning process. The majority of students were found to be at level 2, the

level in which students only used one aspect of the data (see Table 19, 22, 30, and 32). I

hardly ever found a response that belonged to the highest level (relational level or pre-

analytical level), a level which shows good reasoning about variation. Moreover, students

rarely used any formal statistical measures when dealing with data.

One possible reason to why the teaching experiment did not show strong indication

was that the students in the experimental group were only exposed to one type of data: the

growth data. Jones et.al. (2004) concluded that students need to have experiences with differ-

ent kinds of data to help them move from their idiosyncratic descriptions. In the original plan,

I designed another activity with data on students spending time on activities. However, the

plan could not be executed due to time constraint.

Another possible cause was, as in my analysis of the teaching experiment indicated,

the unfamiliarity of the students with open-ended problems. Another research in a similar

setting seems to confirm this indication: Sharma (2006) studied 14 to 16 year-old Fijian

students‟ reasoning in understanding data in the form of tables and graphs through individual

interviews. In one of his interview tasks, students were asked to compare the temperature data

of two cities in Fiji and conclude which city is warmer. The question was similar to Question

6 in my pretest, the difference was that he presented the data in a table. Out of five students,

four students answered the question based on their everyday experiences (idiosyncratic or

prestructural level in my categories). The students in Sharma‟s study were also accustomed to

mathematics tasks that expect one single correct answer and were not accustomed in

expressing their reasoning verbally.

Finally, I think that students‟ insufficient reasoning about central measures and histo-

grams played a role too. It is not uncommon to find that students do not use any statistical

measures in comparing data, even after having it taught at class (cf., Gal et al., 1989; Sharma,

2006). Misconceptions that Lee & Meletiou-Mavrotheris (2003) found about students‟ rea-

soning in comparing two histograms, for example seeing height of bars as a case value instead

of a frequency, appeared also in my study. I hoped to also improve the understanding of the

central measures and histograms through the teaching experiment. But perhaps due to the

short teaching and the issues described in chapter four, the experiment has not revealed sig-

nificant improvement in students‟ understanding of central measures and histograms.

87

6. Conclusions and Discussions

I conclude this thesis by answering my research question and reflection question, describing

the limitation of my study, and making recommendation for future research.

6.1. Conclusions

Research Question

To what extent did the student-centred teaching of variation using real data and open-ended

tasks help to improve Indonesian social science stream (IPS) students‟ reasoning about

variation?

My overarching answer to this question is that the student-centred teaching of variation using

real data and open-ended tasks provided the social science students in this particular study a

more conducive learning opportunity to develop their reasoning about variation. In all

questions in the pretest and posttest, students in control group showed little change in their

answers. While in the experimental group, I observed more change between the pretest and

the posttest, albeit not as big as I had hoped. But this is most probably due to the short

duration of the teaching experiment. On the other hand, students of the control group seemed

to keep whatever ideas they had prior to the teaching of variation. The traditional teaching

seemed to neither add nor change students‟ reasoning.

To go in more details, I come back to my framework of two knowledge areas

(Garfield& Ben-Zvi, 2005) that I hoped the teaching experiment helps to develop (see

Appendix A).

Developing Intuitive Ideas of Variation

In this knowledge area of variation, I can focus on two things, namely whether students were

able to:

1. see that variation is present in both qualitative and quantitative variables and to see

data as an entity or aggregate.

2. see that variation can be expected to be high or low depending on the sources and

context of the variation.

In the first question of the pre- and posttest, I asked the students about their definition of

„variation‟ and/or an example. From the responses, developing ideas of data as an entity or

aggregate is indeed not an easy task for students. Partly due to the justification I gave in the

88

pretest (see Chapter 5, p. 52), the majority of students in the experimental group and the

control group defined „variation‟ as „varied or having many kinds‟, which indicates the idea

of variation in qualitative variables. Therefore I looked more closely to the examples the

students had given. In the control group, I only observed one variable: marks. I saw more

examples of quantitative variables in the posttest results of the experimental group, for

examples human height and weight, sizes of Durian fruit, and marks. However, those

variables are variables that the students and I talked about in the classroom. Therefore, I

conclude that the teaching approach gives more opportunity for teachers and students to

discuss data as an entity or aggregate. For example, sizes of Durian fruit came up in

discussion unplanned. In particular, the use of real data can start students‟ thinking of

„variation‟ as „an entity in a distribution‟, instead of only „variation of individual values‟.

Regarding the ability of students to reason about the desirable variation, the majority

of students from both groups did not show any consideration of variation. However, the

quantitative analysis indicated that the individual positive changes in the experimental group

were bigger than that in the control group.

Using Variation to Make Comparisons

Reasoning about variation is a long and gradual process. The majority of students from both

groups were at a low level of reasoning: Level Unistructural in the SOLO taxonomy of Biggs

and Collis (1982) or Level Transitional in Mooney‟s framework (2002). At this level, the

students were mostly using only one aspect of data sets in making comparisons within or

between groups, for example using the extreme values or some standard values. Despite this,

the results of the quantitative analysis showed that the mean gain of the experimental group in

some of the questions (Question 5 and 9) was statistically significantly while the mean gain of

the control group was not statistically significant. Thus, the students in the experimental

group seemed to gain more. From the qualitative analysis, the teaching approach seemed to

help students start using the mean in comparing data sets.

In summary, the teaching approach provided students a more conducive learning

opportunity to develop ideas of variation as an entity or seeing variation in quantitative

variables. Students also started to develop ideas of using the mean for comparisons within or

between groups. In more exposure to the teaching approach, I am optimistic that the

improvement of students‟ reasoning about variation would be more significant.

89

Reflection Question

What recommendations for the teaching of measures of variation in Indonesian secondary

school curriculum followed from the teaching experiment?

I answer this question through my own perspectives of both a researcher and a beginning

teacher. As a researcher, who designed this study, I identified and selected appropriate

teaching concepts and principles (see Subsection 2.3). Upon implementation I believed those

teaching principles would help students to develop their statistical reasoning, particularly

reasoning about variation. On the other hand, I am also a beginning teacher who had no

experience with students in a social science stream. In this conclusion, I select several

principles I have tested and reflect on my teaching experiment in order to come up with

recommendations.

1. The Use of Real Data: Human Growth

In my study, I tested the suggestion to use real data within a context instead of artificial data

which are meaningless and merely numbers without context. The feedbacks from the

questionnaires were positive. The majority of students did consider the use of real data as

follow:

makes the learning of statistics interesting and fun;

makes it easier to understand the concepts (of standard deviation);

makes the students see the real-life application of statistics.

Unfortunately the context of human growth was not a familiar context to the particular

students in my study. To some extent, this unfamiliarity was one of factors that affected the

students‟ engagement in the activities. This unfamiliarity problem had been less likely to

happen, had I known the students background (that is, if the context is chosen by the regular

teacher who knows the students‟ background very well). Therefore, I recommend the use of

real data in the teaching of statistics within a context which is close to the students‟

experience and background.

2. The Context of Teaching Variation: Doing Data Analysis

In my study I asked the students to do (real) data analysis of some given data sets in the first

two lesson activities and then to create their own data in their activities. I wanted to test which

sequence of activities to do in data analysis activities. Should we start with letting students

analyze their own data or with analyzing given data sets? Unfortunately, the third activity in

my plan was not realized so I cannot comment much on this sequence. However, after the first

90

lesson, the cooperating teacher was inspired to do similar activities with the data from the

students themselves: let students measure themselves. Will it work better?

From the teaching experience in the study, the main problem I faced was that the

students‟ prerequisite knowledge was not enough. First, the students were to some extent

capable in procedural knowledge of central measures, namely the mean and median, but they

were not able to reason with it. In comparing data sets, the students usually used extreme

values or some values related to their experience. The use of median did not appear.

Secondly, the students‟ understanding of histograms was poor. The usual teaching separates

data representation, central measures, and measures of variation and this is probably the cause

that students could not connect all these concepts when dealing with data.

In this study, the students had been taught the concepts of data representation and cen-

tral measure in a teacher-centred approach and started doing data analysis in the learning of

the concepts of variation. Doing data analysis for the first time in reasoning about the standard

deviation, which is one of the most difficult basic concepts, seemed to be problematic. This is

especially because I linked the central measures and measures of variation in the activities. I

recommend doing data analysis right from the beginning of the statistics unit without sepa-

rating the data representation, central measures, and measures of variation, and I recommend

having student collecting their own data to analyze.

3. The Teaching Approach: Student-Centredness, Open Tasks, Group Work and Linking Data

Representations, Central Measures and Measures of Variation

The first three components of the teaching approach were new to the students and to me (as

the teacher), and I believe this is why the lessons did not go as well as I had hoped and

expected. The participating students were students who had not performed well in mathemat-

ics. When they first dealt with open tasks, I could see that they were unsure of what to do be-

cause they were used to closed tasks. For example, the question “what is the mean of the fish

weight?” is more familiar and less confusing to them than the question “what is the true

weight of the fish?” In working in groups, students tended to work individually and went to

other groups for answers checking. In groups of students who are low-achievers, it was even

worse because there was not a member who was willing to be in charge. As their teacher, I

also had difficulties in managing 10 groups. As all groups did not work very well, I had diffi-

culties to give assistance or scaffolding to all groups in due time.

Despite the difficulties, the students in the experimental group did not perform worse

than the students in the control group, even regarding the procedural knowledge. In addition,

the group work was getting better in the third lesson. Reflecting upon my experience, I can

91

recommend this teaching approach. What must be taken into account is that teachers need to

be patient in the first time, in the sense that relaxing the time schedule to finish the tasks,

especially if the students suffer from mathematics anxiety. Regarding curriculum demands

and time constraints, teachers can stress on the procedural knowledge in the homework, as

long as they make sure students care about their learning and do the homework.14

6.2. Limitations of my Study and Suggestions for Future Research

There are two main limitations of my study. Firstly, the study was conducted in a short period

of time. In fact, it was conducted toward the end of semester, near to the exam period. I had to

borrow other teachers‟ classes to complete the activities, especially for the control group

which started the lessons about variation a little bit later than the experimental group.

Secondly, the study only involved students from a social science stream. It would give a more

complete picture of students‟ reasoning if the study also involved students from natural-

science stream or language-stream. Thirdly, the control and experimental group were taught

by different teachers. Cognitively, the students are comparable. However, based on my

observation, there seemed to be differences in behaviour, for example in how serious they

considered their mathematics learning in classes. Fourthly, the language of instruction in this

study was the second language of the students, not in their everyday speech used in the

district. The socio-economic condition of the students had not given them enough exposure to

the language of instruction used. All these limitations restrict a generalization in my

conclusions.

Therefore, to obtain generalization, a longitudinal study or a large cross-sectional

study would give much better information about aspects of the best teaching practice and

developments of students‟ statistical reasoning. The results of this case study have indicated

that using real data analysis in a socio-constructivist approach is promising, even to the

students in this study who could be categorized as students with no optimal prior education. It

would be enlightening to compare how different students from other streams would perform

with the students of the social science stream in this study or to do cross-sectional studies with

students from social science stream in many regions. Longitudinal study could also address

the issue I dealt with in the teaching when the students were new to the experienced socio-

constructivist approach.

The limitation of this study also prevented me from using intensive ICT in the

teaching. It would also be informative to see whether the teaching approach deployed here

combined with more intensive use of relevant ICT could lead to better results. Further study

14

In my teaching experiment, students did not finish their homework.

92

on the benefit of using real data analysis in a socio-constructivist approach plus ICT might

give us a better idea of students‟ reasoning about variation.

Finally, in this study I deliberately separated the statistics unit and probability unit in

the learning of variation. I did not use probability contexts in the teaching and learning due to

the structure of Indonesian curriculum. Trying out the approach I used here in probabilistic

contexts might be beneficial to the students in broadening their understanding of and

reasoning about variation.

93

References Badan Litbang Puskur. (2007). Kajian Kebijakan Kurikulum Mata Pelajaran Matematika. Indonesia:

Ministry of National Education. Retrieved 6 September 2009 from

www.puskur.net/download/prod2007/50_kajian kebijakan kurikulum matematika.pdf

Batubara, J., Alisjahbana, A.,Gerver-Jansen, A.J.G.M., Alisjahbana, B., Sadjimin, T., Juhariah, Y.T.,

Tririni, A., Padmosiwi, W.I., Listiaty, T., Delemarre-van de Waal, H.A., & Gerver, W.J. (2006).

Growth Diagrams of Indonesian Children. The Nationwide Survey of 2005. Paediatrice Indonesiana,

46 (5-6), 118-126.

Ben-Zvi, D., & Garfield, J. (Eds) (2004). The Challenge of Developing Statistical Literacy, Reasoning,

and Thinking. Dordrecht: Kluwer Academic Publishers.

Biggs, J.B., & Collis, K.F. (1982). Evaluating the Quality of Learning: The SOLO taxonomy

(Structured of the Observed Learning Outcome). London, UK: Academic Press.

Chance, B.L. (2002). Components of Statistical Thinking and Implications for Instruction and

Assesment. Journal of Statistics Education, 10 (3). Retrieved 27 July 2009 from

www.amstat.org/publications/jse/v10n3/chance.htm

Cobb, G.W., & Moore, D.S. (1997). Mathematics, Statistics, and Teaching. The American

Mathematical Monthly, 104 (9), 801-823.

Cooper, L. L., & Shore, F. S. (2008). Students‟ Misconceptions in Interpreting Center and Variability

of Data Represented via Histograms and Stem-and-Leaf Plots. Journal of Statistics Education, 16 (2).

Retrieved 28 July 2009 from www.amstat.org/publications/jse/v16n2/cooper.html

Curcio, F.R. (1987). Comprehension of Mathematical Relationships Expressed in Graphs. Journal for

Research in Mathematics Education, 18(5), 382-393.

Gal, I., Rothschild, K., & Wagner, D.A. (1989). Which group is better? The Development of Statistical

Reasoning in School Children. Paper presented at the meeting of the Society for Research in Child

Development, Kansas City, KS. Retrieved 21 July 2009 from www.eric.ed.gov/PDFS/ED315270.pdf

Garfield, J. (2003). Assessing Statistical Reasoning. Statistics Education Research Journal, 2 (1), 22-

38.

Garfield, J., & Ben-Zvi, D. (2005). A Framework for Teaching and Assessing Reasoning about

Variability. Statistics Education Research Journal, 4 (1), 92-99.

Garfield, J., & Ben-Zvi, D. (2007). How Students Learn Statistics Revisited: A Current Review of

Research on Teaching and Learning Statistics. International Statistical Review, 75 (3), 372-396.

Garfield, J., & Ben-Zvi, D. (2008). Developing Students‟ Statistical Reasoning. New York: Springer

Verlag.

Groth, R.E. (2003). Development of A High School Statistical Thinking Framework. Dissertation.

Retrieved from: www.stat.auckland.ac.nz/~iase/publications/dissertations/03.groth.disertation.pdf

Jones, G.A., Langrall, C.W., Mooney, E.S., & Thornton, C.A. (2004). Models of Development in

Statistical Reasoning. In J. Garfield, & D. Ben-Zvi (Eds.). The Challenge of Developing Statistical

Literacy, Reasoning, and Thinking (pp. 97-117). Dordrecht: Kluwer Academic Publisher.

94

Jones, G. A., Langrall, C. W., Thornton, C.A., Mooney, E.S., Wares, A., Jones, M.R., Perry, B., Putt,

I.J., & Nisbet, S. (2001). Using Students‟ Statistical Thinking to Inform Instruction. Journal of

Mathematical Behavior, 20 (1), 109-144.

Jones, G.A., Thornton, C.A., Langrall, C.W., Mooney, E.S., Perry, B., & Putt, I.J. (2000). A

Framework for Characterizing Children‟s Statistical Thinking. Mathematical Thinking and Learning,

2(4), 269-307.

Lee, C., and Meletiou-Mavrotheris, M., (2003). Some Difficulties of Learning Histograms in

Introductory Statistics. In 2003 Proceedings of the American Statistical Association, Statistics

Education Section , pp. 2326 - 2333. Alexandria, VA: American Statistical Association. Retrieved

from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.8456&rep=rep1&type=pdf

Konold, C. & Pollatsek, A. (2002). Data Analysis as the Search for Signals in Noisy Processes.

Journal for Research in Mathematics Education, 33 (4), 259-289.

Meletiou, M.M. (2000). Developing Students‟ Conceptions of Variation: An Untapped Well into Sta-

tistical Reasoning. Dissertation. Retrieved from:

www.stat.auckland.ac.nz/~iase/publications/dissertations/00.Meletiou.Dissertation.pdf

Mooney, E.S. (2002). A Framework for Characterizing Middle School Students' Statistical Thinking.

Mathematical Thinking and Learning, 4 (1), 23-63.

Moore, D.S. (1997). New Pedagogy and New Content: The Case of Statistics. International Statistical

Review, 65 (2), 123-137.

Moore, D.S., & McCabe, G.P. (2005). Introduction to The Practice of Statistics (5th Ed.). New York:

W.H. Freeman & Company.

Reading, C., & Shaugnessy, M.J. (2004). Reasoning About Variation. In J. Garfield, & D. Ben-Zvi

(Eds.). The Challenge of Developing Statistical Literacy, Reasoning, and Thinking (209-226).

Dordrecht: Kluwer Academic Publisher.

Sembiring, R.K., Hadi, S., & Dolk, M. (2008). Reforming Mathematics Learning in Indonesian

Classrooms through RME. ZDM Mathematics Education, 40 (6), 927-939.

Sharma, S. (2006). High School Students Interpreting Tables and Graphs: Implications for Research.

International Journal of Science and Mathematics Education, 4 (2), 241-268.

Shaugnessy, J.M. (2007). Research on Statistics Learning and Reasoning. In F.K. Lester, Jr. (Ed.),

Second Handbook Of Research On Mathematics Teaching And Learning (pp. 957-1009). Charlotte,

NC: Information Age Publishing.

Shaughnessy, J. M., Garfield, J., & Greer, B. (1996). Data Handling. In A.J. Bishop, K. Clements, C.

Keitel, J. Kilpatrick, & C. Laborde (Eds.), International Handbook of Mathematics Education (pp.

205–237). Dordrecht: Kluwer Academic Publishers.

Sumanto, Y.D., Kusumawati, H., & Aksin, N. (2008). Gemar Matematika 6 untuk SD/MI Kelas VI.

Jakarta: Pusat Perbukuan Departemen Pendidikan Nasional.

Watson, J.M., Kelly, B.A., Callingham R.A., & Shaugnessy, J.M. (2003). The Measurement of School

Students‟ Understanding of Statistical Variation. International Journal of Mathematical Education in

Science and Technology, 34 (1), 1-29.

Web ARTIST Project. (2005). Comprehensive Assessment of Outcomes for a first course in Statistics

(CAOS) 4. Retrieved 28 July 2009 from https://app.gen.umn.edu/artist/

95

Extended Bibliography Ben-Zvi, D. (2004). Reasoning about Variability in Comparing Distributions. Statistics Education

Research Journal, 3 (2), 42-63.

Cobb, P. (1999). Individual and Collective mathematical Development: The Case of Statistical Data

Analysis. Mathematical Thinking and Learning, 1 (1), 5-43.

Delmas, R., Garfield, J., Ooms, A., & Chance, B. ( 2007). Assessing Students‟ Conceptual Under-

standing After a First Course in Statistics. Statistics Education Research Journal, 6 (2), 28-58.

Garfield, J. (2002). The Challenge of Developing Statistical Reasoning. Journal of Statistics Educa-

tion, 10 (3).

Groth, R.E. (2005). An Investigation of Statistical Thinking in Two Different Contexts: Detecting a

Signal in a Noisy Process and Determining a Typical value. Journal of Mathematical Behavior, 24 (2),

109-124.

Hancock, C. (1992). Authentic Inquiry with Data: Critical Barriers to Classroom Implementation.

Educational Psychologist, 27 (3), 337-364.

Lehrer, R., & Schauble, L. (2000). Inventing Data Structures for Representational Purposes:

Elementary Grade Students‟ Classification Models. Mathematical Thinking and Learning, 2 (1&2),

51-74.

Lehrer, R., & Schauble, L. (2004). Modeling Natural Variation through Distribution. American

Educational Research Journal, 41 (3), 635-679.

Lehrer, R., & Kim, M. (2009). Structuring Variability by Negotiating its Measure. Mathematics

Education Research journal, 21 (2), 116-133.

Pfankuch, M. (2005). Thinking Tools and Variation. Statistics Education Research Journal, 4 (1), 83-

91.

Makar, K., & Confrey, J. (2005). “Variation-Talk”: Articulating Meaning in Statistics. Statistics

Education Research Journal, 4 (1), 27-54.

Moore, D.S. (1990). Uncertainty. In L. Steen (Ed.) On the Shoulders of Giants: New Approaches to

Numeracy (pp. 95-137). Washington, DC: National Academy Press.

Reading, C. (2004), Student Description of Variation while Working with Weather Data. Statistics

Education Research Journal, 3 (2), 84-105.

Reading, C., & Reid, J. (2006). An Emerging Hierarchy of Reasoning about Distribution: From a

Variation Perspective. Statistics Education Research Journal, 5 (2), 42-68.

Torok, R., & Watson, J. (2000). Development of the Concept of Statistical variation: An Exploratory

Study. Mathematics Education Research Journal, 12 (2), 147-169.

Watson, J.M., & Kelly, B.A. (2002). Variation as part of chance and data in Grades 7 and 9, in B.

Barton, K.C. Irwin, M. Pfannkuch and M.O.J. Thomas (Eds.). Mathematics Education in the South

Pacific: Proceedings of the 25th Annual Conference of the Mathematics Research Group of

Australasia, Auckland, NZ, Vol. 2, MERGA Sydney, 682-689.

96

97

List of Appendices

Appendix A. Garfield and Ben-Zvi‟s Framework for Assessing Reasoning about Variability.

Appendix B. Students‟ Activity Sheets.

Appendix C. Pretest/Posttest.

Appendix D. The Questionnaire

98

99

Appendix A. Garfield and Ben-Zvi’s Framework for Assessing Reasoning about Variability

Garfield, J., & Ben-Zvi, D. (2005). A Framework for teaching and Assessing Reasoning about

Variability, Statistics Education Research Journal, 4(1), 92-99.

1. Assessment - Developing intuitive ideas of variability

Items that provide descriptions of variables or raw data sets (e.g., the ages of children in a grade school, or the height of these children) and asking students to describe variability or shape of distribution.

Items that ask students to make predictions about data sets that are not provided (e.g., if the students in this class were given a very easy test, what would you predict for the expected graph and expected variability of the test scores?).

Given a context, students are asked to think of ways to decrease the variability of a variable (e.g., measurements of one students’ jump).

Items that ask students to compare two or more graphs and reason about which one would have larger or smaller measures of variability (e.g., Range or Standard Deviation).

2. Assessment - Describing and representing variability

Items that provide a graph and summary measures, and ask students to interpret it and write a description of the variability for each variable.

Items that ask students to choose appropriate measures of variability for particular distributions (e.g., IQR for skewed distribution) and select measure of center that are appropriate (e.g., median with IQR, mean with SD).

Items that provide a data set with an outlier that ask students to analyze the effect of different measures of spread if the outlier is removed. Or, given a data set without an outlier, asking students what effect adding an outlier will have on measures of variability.

Items that ask students to draw graphs of distributions for data sets with given center and spread.

3. Assessment - Using variability to make comparisons

Items that present two or more graphs and ask students to make a comparison either to see if an intervention has made a difference or to see if intact groups differ. For example, asking students to compare two graphs to determine which one of two medicines is more effective in treating a disease, or whether there is a difference in length of first names for boys and girls in a class.

Items that ask students which graph shows less (or more) variability, where they have to coordinate shape, center, and different measures of spread.

4. Assessment - Recognizing variability in special types of distributions

Items that provide the mean and standard deviation for a data set that has a normal distribution and students are asked to use these to draw graphs showing the spread of the data.

Items that provide a scatterplot for a specific bivariate data set and students have to consider if values are outliers for either the x or y variables or for both.

Items that provide graphs of bivariate data sets where students are asked to determine if the variability in one variable (y) can be explained by the variability in the other variable (x).

100

5. Assessment - Identifying patterns of variability in fitting models

Items that ask students to determine if a set of data appear normal, or if a bivariate plot suggests a linear relationship, based on scatter from a fitted line.

6. Assessment - Using variability to predict random samples or outcomes

Items that provide students choices of sample statistics (e.g., proportions) from a specified population (e.g., colored candies) for a given sample size and ask which sequence of statistics is most plausible.

Items that ask students to predict one or more samples of data from a given population.

Items that ask students which outcome is most likely as a result of a random experiment when all outcomes are equally likely (e.g., different sequences of colors of candies)

Items that ask students to make conjectures about a sample statistic given the variability of possible sample means.

7. Assessment - Considering variability as part of statistical thinking

Items that give students a problem to investigate along with a data set, that requires them to graph, describe, and explain the variability in solving the problem.

Items that allow students to carry out the steps of a statistical investigation, revealing if and how the students consider

101

Appendix B. Students’ Activity Sheets

Activity 1 Am I Normal?

As we have discussed, it is a common practice to check a child‟s height, weight, head

circumference, etc to see whether his or her growth is normal.

In this activity, you will use the following data to create an easy rule to decide whether a boy

or girl of your age is growing normally, based on his or her height and weight.

Below is the data of height and weight of 16-year-old boys and girls in Jakarta. These data

were collected in 2005 by a PhD student of VU University, Amsterdam, for the making of

Indonesian growth chart.

Boys Girls Height (cm) Weight (kg) Height (cm) Weight (kg)

170.2 49.7 156.4 61.5 175.3 43.9 153.5 44.8 168.0 89.7 155.7 46.8 161.4 48.8 160.4 58.7 175.9 62.3 164.0 53.6 169.7 51.4 150.9 45.5 166.2 55.5 154.4 45.2 148.9 82.8 152.8 53.2 163.2 52.2 150.7 38.7 164.4 40.2 157.4 38.7 162.8 62.9 142.7 38.6 159.3 42.7 164.8 48.3 158.9 63.7 165.1 40.2 165.0 46.9 158.6 42.4 172.8 55.0 150.8 44.6 159.9 50.8 146.6 50.1 159.2 46.6 149.1 63.5 169.7 73.3 150.2 44.2 167.3 42.1 157.4 37.4 167.3 40.0 146.9 43.8

a. Make a histogram of the boys height and weight data. From that histogram, what can

you say about boys‟ height and weight? How is the data spread out?

b. Make a rule that allows you to determine whether a 16-year-old boy or girl has a

height that is :

- Very common;

- Still normal, but needs attention;

- Abnormal, does not mean there is a health problem, only need to be checked

up.

102

Explain how your rule works, why it might be a good one, and how it could work in

practice.

If your rule uses numbers, you must explain how you compute that numbers.

c. Make an easy visual aid (for example, a table or a diagram) that allows you to:

- Quickly apply your rule;

- Explain it to others, for example, to your classmates or your parents.

103

Activity 2a Who is Taller?

In 2005, a PhD student conducted a study about the growth of Indonesian children and he

created a growth chart. Below is the histogram of boys height data collected in Jakarta for this

study.

174168162156150

12

10

8

6

4

2

0

Height (cm)

Fre

qu

en

cy

of

he

igh

t

Histogram of Boys' Height in Jakarta

2.a. Compute the mean and standard deviation of the boys height in Jakarta, based on the

histogram above. Show your work/computation.

104

Activity 2b Who is Taller?

In 2007, the Ministry of Health had a social survey carried out in all provinces of Indonesia.

This survey covered many topics in public health. The histogram below shows the height data

of boys in Bengkulu obtained from this survey.

177

174

171

168

165

162

159

156

153

150

147

144

141

138

135

132

129

126

123

120

117

114

111

108

105

30

28

26

24

22

20

18

16

14

12

10

8

6

4

2

0

Height (cm)

Fre

qu

en

cy

of

He

igh

t

Histogram of Boys' Height in Bengkulu

The mean of the raw data of boys’ height in Bengkulu is 154.7.

On the next page are shown two histograms of boys height in Bengkulu and Jakarta. Use

these histograms to answer the following questions.

2b.1. Without doing any computation, based only two histograms in the next page, is

the standard deviation of Bengkulu’s data higher than the standard deviation of

Jakarta’s data (6.1)? Give your reasons.

2b.2. Now check your answer for (a) by computing: What is the standard deviation of the

boys’ height in Bengkulu?

2b.3. Can you conclude that boys in Jakarta are taller that boys in Bengkulu? Explain your

reason.

2b.4. What makes the histogram of boys’ height in Bengkulu looked like the above

histogram?

105

177

174

171

168

165

162

159

156

153

150

147

144

141

138

135

132

129

126

123

120

117

114

111

108

105

30

28

26

24

22

20

18

16

14

12

10

8

6

4

2

0

Height (cm)

Fre

qu

en

cy

of

He

igh

t

Histogram of Boys' Height in Bengkulu

177

174

171

168

165

162

159

156

153

150

147

144

141

138

135

132

129

126

123

120

117

114

111

108

105

30

28

26

24

22

20

18

16

14

12

10

8

6

4

2

0

Height (cm)

Fre

qu

en

cy

of

He

igh

t

Histogram of Boys' Height in Jakarta

106

Where does the time fly? Homework

I have collected from your questionnaires the data on the time that you spend for various activities.

Analyze the data (on the next page) and work in groups to answer the questions below:

a) On which activity do students in your class spend most time per week? Give your reasons.

b) Which activity is the most popular, that is, the one the most students participate in? Is this the same activity as the one identified in a)?

c) On which activity do students in your class spend the least amount time per week? Give your reasons.

d) Which activity is the least popular? Is this also the one on which students in your class spend the least amount of time in a week?

107

Activity 3

Where does the time fly?

Rural Vs Urban?

In this activity you are to analyze similar data (on the next page) which are collected from SMAN No.5 Bengkulu.

There are 9 activities on the data sheet. Choose just one activity to be analyzed, for example doing homework.

We name the activity you have chosen as activity X.

Compare this data (of activity X) with the data of activity X from your class and answer the following question:

“Who spends more time on activity X: students from Bengkulu city or Lebong?”

Explain your reasons!

108

No Activities Number of Hours Per

Week

Frequency

1. Doing Homework 0-2

3-4

5-6

6-8

8-10

11-12

13-14

2. Reading (not school work) 0-2

3-4

5-6

6-8

8-10

11-12

13-14

3. Playing computer, Play Station or video

games

0-2

3-4

5-6

6-8

8-10

11-12

13-14

4. Watching TV, videos or movies 0-2

3-4

5-6

6-8

8-10

11-12

13-14

5. Playing or listening to music 0-2

3-4

5-6

6-8

109

8-10

11-12

13-14

6. Doing jobs at home 0-2

3-4

5-6

6-8

8-10

11-12

13-14

7. Working for pay outside the home 0-2

3-4

5-6

6-8

8-10

11-12

13-14

8. Participating in sports 0-2

3-4

5-6

6-8

8-10

11-12

13-14

9. Hanging out with friends 0-2

3-4

5-6

6-8

8-10

11-12

13-14

110

Questionnaire Name : __________________ School: _________________________ Age :__________________ Grade : ___________________ In the last week, approximately how much time did you spend on each of the following activities?

No Activities Number of Hours Per Week

1. Doing Homework

2. Reading (not school work)

3. Playing computer, Play Station or video games

4. Watching TV, videos or movies

5. Playing or listening to music

6. Doing jobs at home

7. Working for pay outside the home

8. Participating in sports

9. Hanging out with friends

111

Appendix C. Pretest/Posttest Name: __________________________

Class : __________________________

Do the following problems as carefully and as best as you can.

You may use a calculator if needed.

1. Based on your experience, what does variation mean to you? Give an explanation and/or

an example.

2. For each of the following cases, answer the following question:

“Which is more desirable: high variation or low variation?”

Add your reason.





112

3. Given the data: 11, 32, 17, 34, 24, 15, 28


Range

Mean

Median

Standard Deviation

Interquartile Range


Monthly Income

( in thousand Euros )

Number

of People

3 – 5

6 – 8

9 – 11

12 – 14

15 – 17

3

4

9

6

2



5. Four histograms and two descriptions of data are displayed below.

i. A data set of Mathematics test scores where the test was very easy

ii. A data set of wrist circumferences of newborn female babies (measured in

centimeters).

113

a. Which histogram best matches the data in description (i)? Give your reason.

b. Which histogram best matches the data in description (ii)? Give your reason.

6. Two students who took mathematics tests received the following scores (out of 100):

Student A: 60, 90, 80, 60, 80

Student B: 40, 100, 100, 40, 90

If you had an upcoming mathematics test next week, who would you prefer to be your

study partner, A or B? Why?

7. One day Jeroen caught a very big catfish. He wanted to be sure of the weight of the fish

and therefore he weighed it 7 times on the same scale/balance. Below are the

measurements (in kilogram) that he found:

2.9; 2.7; 5.1; 3.1; 3.0; 2.8; 3.0 kg.

e. How spread out are the measurements he obtained?

114

f. How many kilograms do you think the true weight of the catfish was? Give your

reason.

8. The histogram below shows the number of hours of exercising per week by marketing

staffs of a bank.

876543210

9

8

7

6

5

4

3

2

1

0

Number of Hours

Nu

mb

er

of

Pe

op

le

Histogram of Number of Exercising Hours Per Week

g. Compute the median. _______________

h. Compute the mean. ________________

i. Based on the histogram, what is the typical number of hours of exercising per

week of the staffs in this company? Give your reason.

115

9. Forty college students participated in a study of the effect of sleep on test scores. Twenty

of the students studied all night before the tests in the following morning (no-sleep group)


The test scores for each group are shown in the diagrams below. Each dot on the diagram

represents a particular student‟s score. For example, the two dots above the 80 in the

bottom diagram indicate that two students in the sleep group scored 80 on the test.

• •••

•••

•••

•••

•••

• • • •

30 40 50 60 70 80 90 100


• • • • •••

•••

•••

••

••

••

•

30 40 50 60 70 80 90 100


Examine the two diagrams carefully.

Which group is better: the sleep group or the no-sleep group? Explain your reasons.

116

Then circle one from the 6 possible conclusions listed below the one you most agree with.

a. The no-sleep group did better because none of these students scored below 35 and the

highest score was achieved by a student in this group

b. The no-sleep group did better because its average appears to be a little higher than the

average of the sleep group.

c. There is no difference between the two groups because there is considerable overlap in

the scores of the two groups.



e. The sleep ground did better because more students in this group scored 80 or above.

f. The sleep group did better because its average appears to be a little higher than the

average of the no-sleep group.

10. Below is the histogram of the scores of Mathematics test in two classes.

Scores

9585756555

Class A

Fre

qu

en

cy o

f sco

res

24

21

18

15

12

9

6

3

0

Scores

9585756555

Class B

Fre

qu

en

cy o

f S

co

res

24

21

18

15

12

9

6

3

0

b. Comparing the two histograms, one could infer

i. Variation of scores in Class A is higher variation than in class B. (The scores

in class A vary more than the scores in class B)

ii. Variation of scores in Class B is higher than in class A (The scores in class A

vary more than the scores in class B)

iii. Class A and class B have equal variation.

iv. I don‟t know.

c. Why? Give your reason.

117

Appendix D. The Questionnaire

For statements no1-6, choose one answer that is suitable to your opinion.

1. The use of real data makes the learning of Statistics more interesting and fun.

a. I strongly agree b. I agree c. Neutral d. I disagree e. I strongly disagree

2. Analyzing real data makes it easier for me to understand statistical concepts; for example

standard deviation.


3. The use of real data shows me the application of Statistics in real life.


4. I am used to like working in groups.


5. I like working in groups.


6. The group‟s discussion helps me understanding statistical concepts.


7. I actively participate in contributing ideas and in group discussions.


For statement no. 8, give a tick mark √ in the box besides the statement that you agree

with. If you have something to add, please write it in the provided box.

8. I think that the way of learning and teaching in the last two lessons differs from the way of

learning mathematics I usually experience in the following sense:

Using real data, not only artificial numbers.

Using real data, not only artificial numbers.

Demanding students to develop their ideas and then defend those ideas through

correct correct arguments.

Giving students the chance to try solving problems, not directly “telling” the correct

ways. Ways.

Using calculator is allowed.

Others (please write it in the box below)

118

For question 9-11, please fill in your answers in the provided boxes.

9. What are according to you the strengths and/or weaknesses of the last two lessons?

10. What suggestions do you have for future improvement of the last two lessons?

11. Do you have any other comment about the last two lessons? If so, please write it down.

Thank you for filling in the questionnaire!

Date post:	06-Mar-2019
Category:	Documents
Upload:	trantram
View:	223 times
Download:	0 times

Exploring Student-Centred Teaching, Open-Ended Tasks, and ... · Exploring Student-Centred...

Documents