+ All Categories
Home > Documents > 47721775 Ines Descriptive Statistics Level i Asta 2010

47721775 Ines Descriptive Statistics Level i Asta 2010

Date post: 07-Apr-2018
Category:
Upload: utcm77
View: 213 times
Download: 0 times
Share this document with a friend

of 81

Transcript
  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    1/81

    INES- RUHENGERIFaculty of Fundamental Applied sciencesDepartment of Applied Statistics

    LECTURE NOTES

    DESCRIPTIVE STATISTICS I

    LEVEL I APPLIED STATISTICS, 2010

    BY Ir. DANCILLE NYIRARUGERO, Tutorial Assistant

    1

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    2/81

    DESCRITIVE STATISTICS I

    COURSE OBJECTIVE

    At the end of this Course students must be knowledgeable about vocabulary, concepts,

    and statistical procedures used in these studies.

    Students may be called on to conduct research in their fields, since statistical procedures

    are basic to research. To accomplish this, they must also be able to collect, organize,

    analyze, summarize data and present data and communicate the results of the study in

    their own words. Students must be also able to determine measures of central tendency,

    measures of dispersion and position.

    COURSE CONTENTS

    Chapter 1: Introduction, Definitions and statistics vocabulary ;

    Chapter 2: Frequency distributions and graphs: organizing data, histograms,

    frequency polygons and ogives, other types of graphs;

    Chapter 3: Data description: measures of central tendency, measures of

    dispersion ( variation), measures of position;

    Chapter 4: Exploratory data analysis: Box plot, Moments, Skweness andKurtosis, Contingency table, presentation and charts.

    2

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    3/81

    Bibliographie indicative:

    1. Marcel AVELANGE : Statistique Descriptive, classe de 3me Ed. Sciences et

    Lettres ; Lige, 197

    2. DAGNERIE, P., Statistiques thorique et applique, T.1, De Boeck & Larcier s.a,

    Paris, Bruxelles 199

    3. Allan G. Bluman, Elementary Statistics,2004

    4. MURRAY R. SPIEGEL, Ph. D., SCHAUMS OUTLINE OF Theory and Problems

    of STATISTICS, 3eme Ed, 2008

    5. Cottrell M, Genon-Catalot V, Duhamel C, et Meyre T. Exercices de probabilits.

    Licence-Master-coles d'ingnieurs. Cassini, 200

    6. Foata D et Fuchs A. Calcul des probabilits. Cours, exercices et problmes corrigs.Dunod, 2003

    7. DOMINICK SALVATORE, Ph. D. DERRICK REAGLE, Ph.D, SCHAUMS

    OUTLINE OF Theory and Problems of Statistics and Econometrics, 2th Ed, New

    York, 2001

    8. Saporta G. Probabilits, analyse des donnes et statistique. Technip, 2006

    9. ANDRE FRANCIS, Business Mathematics and Statistics, sixth edition, 2004

    10. Douglas A. Lind, Statistical Techniques in Business & Economics, TwelfthEdition, 2005

    11. GEORGE K. KINGORIAH, Fundamentals of Applied Statistics, Nairobi, 2004

    12. P.S.S. Sundar Rao and J. Richard, Introduction to Biostatistics and Research

    Methods, 4th Ed, 2006;

    13. GIARD VINCENT. Statistique Applique la gestion, 2me Ed. Economica 2003 ;

    14. Walder Masiri, Statistique et Calcul des Probabilits, 2001.

    15. CB Gupta. Vijay Gupta, An Introduction to Statistical Methods, 23rd Revised

    Edition , 2007;

    16. Dr. P.K. Srimani & M. Vinayaka Moorthy, Probability & Statistics, 1st Edition,

    Bangarore, 2000

    3

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    4/81

    CHAPTER 1: INTRODUCTION, DEFINITIONS AND STATISTICS

    VOCABULARY

    1.1 Introduction

    Statistics refers to the collection, organizing, presentation, analyzing, and interpretation

    of numerical data to make inferences and reach decisions in all branches of economics,

    business, medicine, and other social and physical sciences.

    A. Definition: the Meaning of Statistics

    The word statistics has two meanings:

    1. In plural sense, statistics is considered as a numerical description of quantitativeaspect of things. It stands for numerical facts pertaining to a collection of objects.

    2. In singular sense, statistics means the science of collection, organization,

    presentation, analysis and interpretation of numerical data to assist in making

    more effective decisions.

    The term statistics is used to mean either statistical data or statistical method.

    When it used in the sense of statistical data it refers to quantitative aspect of things, and is

    numerical description.

    Every data is not statistics. It must fulfil certain essential characteristics to be called

    statistics.

    B. Branches of Statistics

    Statistics is subdivided into two branches: descriptive and inductive or inferential.

    (i) Descriptive statistics consists of the collection, organization, summarization,

    and presentation of data in variousforms such as tables, graphs and diagramsor using a numerical summary. The purpose of descriptive statistics is to

    display and pass on information from which conclusions can be drawn and

    decisions made. Businesses, for example, use descriptive statistics when

    presenting their annual accounts and reports.

    4

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    5/81

    (ii) Inferential or inductive statistics consists of generalizing from samples to

    populations, performing estimations and hypothesis tests, determining

    relationships among variables, and making decisions.

    .

    1.2 Characteristics of statistics are the following:

    A. Statistics means an aggregate of facts.

    Facts can be analyzed only when there are more than one fact. Single fact cannot be

    analyzed.

    Example: the weights of 60 students of a class can be statistically analyzed. But theweight of one student cannot be called statistics.

    Hence, only a collection of many facts can be called statistics.

    B. Statistics are affected to a marked extent by multiplicity of causes

    The facts are the results of action and interaction of a number of factors.

    C. Statistics are numerically expressed.

    Only numerical facts can be statistically analyzed. Therefore, facts such as Pricedecrease with increasing production can not be called statistics.

    5

    Statistics

    Describing data

    Numerical summariesVisual display

    Making inferences from samples

    Estimating parameters Testing hypotheses

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    6/81

    D. Statistics are enumerated or estimated according to reasonable standards of

    accuracy.

    The facts should be enumerated or estimated with required degree of accuracy. The

    degree of accuracy differs from purpose to purpose.

    E. Statistics are collected in a systematic manner.

    The facts should be collected according to planned and scientific methods. Otherwise,

    they are likely to be wrong and misleading.

    F. Statistics are collected for a pre determined purpose

    There must be a definite purpose for collecting facts. Otherwise, the facts become useless

    and hence, they cannot be called statistics.

    G. Statistics are placed in relation to each other

    The facts must be placed in such a way that a comparative and analytical study becomes

    possible. Thus, only related facts which are arranged in logical order can be statistics.

    1.3 Functions of statistics

    The following are the six important functions of the science of statistics:

    i) To present facts in a precise and definite form (i.e., helps proper comprehension

    and avoids ambiguity).

    ii) To simplify mass of figures (i.e., condensing the mass of data).

    iii) To facilitate comparison (by furnishing suitable devices). Statistics adds

    precision to thinking.

    iv) To help formulation and testing of hypothesis (by appropriate statistical tools).

    Statistics helps in comparing different sets of figures. For example, the

    imports and exports of a country may be compared among themselves or they

    may be compared with those of another country.

    v) To help in framing suitable policies and plans (i.e., in making predictions). It

    guides in the formulation of policies and helps in planning. Planning and

    policy making by the government is based on statistics of production, demand,

    6

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    7/81

    etc. it indicates trends and tendencies. Knowledge of trend and tendencies

    helps future planning.

    vi) To help in the formulation of policies (i.e., to provide the basic Material).

    Statistics helps in studying relationship between different factors. Statistical

    methods may be used for studying the relation between production and price

    of commodities.

    Limitations of statistics

    Statistics deals with only those subjects of inquiry which are capable of being

    quantitatively measured and numerically expressed.

    This is an essential condition for the application of statistical methods.

    1.4. Origin of statistics

    The term statistics is linked to the notion of State from Latin STATUS which was

    changed into Latin word statisticum. Statisticum was the activity of collecting data

    which helped government to ensure knowledge about state income and possessions.

    The history of statistics showed that the first census had been made in Sumerian

    Kingdom (Babylone) around 3000 before J.C. In 2238 before Jesus-Christ,

    agriculture survey had been done in Chine by King YAO. In 2500 before J.C. inEgypt they had to collect data for taxes.

    Statistics originated from two quite dissimilar field, games of chance and political

    states. These two different fields are also termed as two disciplines

    10. Primarily analytical

    20. Secondarily essentially descriptive.

    Some of pioneers of statistics are: Pascal (1623-1662), Bernouilli (1654-1705),

    As regards the descriptive side of statistics it may be stated that statistics is as old as

    statecraft.

    Since time immorial men must have been compiling information about wealth and

    manpower for purpose of peace and war. This activity considerably expanded at each

    upsurge of social and political development and received added impetus in periods of

    war.

    7

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    8/81

    The development of statistics can be divided into three stages: the empirical stage

    (down to 1600), the comparative stage (1600-1800), the modern stage (1800 up to

    day).

    It has now become a useful tool and statistical methods of analysis are now being

    increasingly used in biology, psychology, education, economics and business.

    1.5. Statistics vocabulary

    subject or individual is : an item for study;

    Population or universe: a population consists of all subjects (the totalities

    of all observations) that are being studied;

    Statistical units: the individual subjects or objects upon whom the data

    are collected.

    Raw data: are collected data have not been organized numerical;

    ARRAY: An array is an arrangement of raw numerical data in ascending

    or descending order of magnitude;

    Frequency: the frequency is the number of values in a specific class of

    the distribution.

    Variable: is a characteristic of the subject or individual which varies from

    unit to unit. Example: height, weight, age, etc.,

    1.6. Types of variables

    There are two main types of variables: qualitative and quantitative.

    A. Qualitative variable

    A qualitative variable is one that, generally, cannot be expressed in numbers. It is an

    attribute, and is descriptive in nature.

    Example sex (male or female); state of birth, cause of death, religious (Catholics,

    protestants, ect).

    When the data are qualitative, we are usually interested in how many or what proportion

    in each category. Qualitative data are often summarized in charts and bar graphs.

    8

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    9/81

    B. Quantitative variable

    A quantitative variableis numerical and can be ordered or ranked.

    Example: level of hemoglobin in the blood;age; heights, weights; body temperatures;

    the number of children in a family.A quantitative variable can be a discrete variable ora continuous variable.

    Discrete variables assume values that can be counted and represented by an integer

    such as 1, 2, 3, etc.

    Example: number of children in a family, the number of rooms in a house, number of

    patients in a hospital, etc.

    Continuous variables can assume all values between any two specific values (within

    an interval). They are obtained by measuring (ex: heights, weights, age, level of

    protein in blood, etc.)

    Figure 1.1: summary of the types of variables

    9

    Types of variables

    Qualitative Quantitative

    Gender Color Marital

    Discrete Continuous

    1. children in family2. cows in a farm3. patient in a

    oAgeoWeightoHeightoTime

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    10/81

    1.7 Levels of Measurement

    Data can be classified according to levels of measurement. The level of measurement of

    the data often dictates the calculations that can be done to summarize and present the

    data. There are four levels of measurement: Nominal, Ordinal, Interval and Ratio.

    a) Nominal-level data or nominal measurement

    From Latin nomen meaning name, nominal data are the same as qualitative, attribute,

    categorical, or classification.

    With the nominal level, the data is classified into categories and cannot be arranged in

    any particular order. EX: gender, eye color, Religions affiliation, marital status.

    Nominal level variables must be: mutually exclusive and exhaustive.

    - Mutually exclusive means an individual or object is included in only one category.

    - Exhaustive means each individual or object must appear in a category.

    To summarise, the nominal-level data have the following properties:

    Data categories are mutually exclusive and exhaustive.

    Data categories have no logical order.

    Example: list of jobs in Rwanda, consumption in Rwanda

    We usually code nominal data numerically. However, the codes are arbitrary placeholders with no numerical meaning, so it in improper to perform mathematical

    analysis on them.

    Example: yes as 1. No as 2.

    b) Ordinal-level data involves data arranged in some order, but the differences

    between data values cannot be determined.

    Example 1: when appreciating student dissertation we can have:

    Superior, good, average, poor, inferior.

    The data classifications are mutually exclusive and exhaustive.

    Data classifications are ranked or ordered according to the particular trait they

    possess.

    10

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    11/81

    c) Interval-level data or interval measurement

    This kind of data is acquired through process of measurement where equal measuring

    units are employed. The movement in magnitude between one measure to the one above

    it or below it is identical in the subject population under consideration.

    The data contains all the characteristics of nominal and ordinal data; the only one

    difference being the scale of measurement that moves uniformly in equal interval in

    which real number form can show several decimal places.

    Example: temperature, shoe size.

    Data classifications are mutually exclusive and exhaustive.

    Data classifications are ordered according to the amount of characteristic they

    possess.

    Equal differences in the characteristic are represented by equal differences in the

    measurements.

    d) Ratio-level data or ratio measurement: Practically all quantitative data are the ratio

    level of measurement. The ratio level is the "highest level of measurement. It hasall the

    characteristics of the interval level, but in addition, the o point is meaningful and the ratio

    between two numbers is meaningful. Ex: Wages, Weight, etc.

    Data classifications are mutually exclusive and exhaustive.

    Data classifications are ordered according to the amount of the characteristics

    they possess.

    Equal differences in the characteristic are represented by equal differences in the

    numbers assigned to classifications.

    The zero point is the absence of the characteristic.

    11

    Levels of measurements

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    12/81

    Figure 1.2: Summary of the characteristic for Levels of Measurement

    1.8 Statistic Method

    For the purpose the following, procedure may be adopted with advantages:

    Collect data: information should be collected regarding

    Organize the data obtained

    Present this information by means of diagrams or other visual aids Analyze the data above to determine the average, the extent of disparities that

    exist.

    To have an understanding of the phenomenon (interpretation of facts)

    All this lead to a policy decision for improvement of the existing situation.

    1.9 Collection of data

    Statistics is concerned with the analysis of numerical data, so the first stage in statisticalmethod must be the collection of the data to be analyzed. Data can be collected in two

    ways: first as primary data and second as secondary data.

    a) Primary data

    12

    Nominal Ordinal Interval Ratio

    Data may onlybe classified

    Data are ranked Meaningful differencebetween values

    Meaningful o point andratio between values

    Type of residence(rural, urban)

    Rank in class Temperature Number of patients

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    13/81

    Primary data is data which is collected by the investigator himself with a specific

    objective. This means that primary data is original in character. Sources of primary data

    are eithercensuses orsamples.

    Census

    A census is the name given toa survey which examines every item of the population

    Three important official censuses are the population census, the census of distribution

    and the census of production.

    A census has the advantages of completeness and being accepted and as representative,

    but of course must be paid for in terms of manpower, time and resources.

    Sample

    A sample is a relatively small subset of a population with advantages over a census that

    costs, time and resources are much less. Sample is used when it is impossible or

    impractical to observe the entire group or population. The main disadvantage is that of

    acceptability by layman.

    b) Secondary data

    Secondary data is data that has already been collected by some other investigator or

    agency, and used by an investigator for his purpose.

    As far as the investigator is concerned, the data he uses is from a secondary source, that

    is, he did not collect it.

    The prime example of secondary data is the official statistics that are published by the

    Government: Financial statistics, Economic trends, etc.

    The advantages of using secondary data are savings in time, manpower and resources in

    sampling and data collection.

    The dangers of secondary Data

    13

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    14/81

    If we have to use secondary data, there are dangers to be aware of:

    (i) The data available may not be very up-to-date.

    (ii) We do not necessarily know how the data were collected and analyzed or for

    what reason. They may be biased because of poor collection techniques or

    simply because they were collected for a different purpose.

    (iii) We may not be able to find a complete set of data for our purposes in one

    place. This could mean we would have to collate data from several sources

    with the chance of making errors while doing so. Obtaining the data from

    more than one source may also compound the chances of bias discussed in.(iv) There is the distinct possibility of transcription or printing errors in published

    data.

    If you are using secondary data to support arguments in reports, articles or essays it is

    advisable to try to find out more about how the data were collected and analyzed and why

    they were collected. These mean that:

    Before using secondary data it is necessary to scrutinize them in the light of the

    following points:

    (i) The type and purpose of the institution that publishes statistics as a routine;

    (ii) The purpose for which the data are issued and the consumers to whom they

    are addressed;

    (iii) The nature of the data themselves. Are the data biased? Are the data samples

    only or complete enumeration?

    (iv) In what types of units are the data expressed? Are they the same at differenttimes, at different places, and for all cases at the same time or place?

    (v) Are the data accurate?

    (vi) Do the data refer to homogeneous condition?

    (vii) Are the data germane to the problem under study?

    14

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    15/81

    1.10 Misuse of statistics

    The figures themselves cannot mislead, but the statisticians who present the figures

    certainly can.Data can be misused in the following ways:

    (i) They can be used for the wrong purpose, that is, one that is different from the

    purpose for which they were collected.

    (ii) They can be collected incorrectly so that they are biased

    (iii) They can be analyzed carelessly so that the results obtained from them are

    misleading.

    .

    1.11. Data classification

    The data collected from the sample is generally referred to as the raw data, because it is

    not arranged and organized into any format. Raw data conveys very little information to

    the investigator or to anyone interested in that investigation. Therefore, the mass of

    numbers must be classified.

    Classifications are the process of arranging the available facts into in groups or classesaccording to their resemblances, affinities and other relationships.

    The main objectives of classifying data are:

    1. To condense the mass of data into a concise format;

    2. to bring out the relevant points of similarity and dissimilarity, and thus

    facilitate comparison;

    3. To make the statistical treatment of the data easy.

    Types of classification

    15

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    16/81

    Generally, classification of data may be of the following types: spatial or geographical,

    temporal or chronological, qualitative, and quantitative.

    Spatial or geographical classification: this classification is based on space, that is,

    geographical locations. For example, data on human population may be classified on thebasis of different continents or countries or states of a country or districts of a state or

    towns and villages of a district.

    Temporal or chronological: data are arranged on the basis of time (years, months, days,

    hours, minutes and seconds).

    Qualitative classification: data are classified on the basis of quality or attribute such as

    sex, colour, behaviour, religion, marital status, literacy, etc.

    Quantitative classification: the classification of data is done according to some variable

    (characteristics) that may be measured, such as, height, weight etc., in this type of

    classification there are two elements: the variable and frequency.

    Classification of units on the basis of one variable is called simple or one-way

    classification.

    Simultaneous classification of units on the basis of two variables is called two way

    classification. A table that presents the two way classification is called Contingency

    table.

    CHAPTER2: FREQUENCY DISTRIBUTIONS AND CHARTS

    16

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    17/81

    2.1 FREQUENCY DISTRIBUTIONS

    2.1.1. Introduction

    After collecting the data, the researcher must organize and present them so they can be

    understood by those who will benefit from reading the study. The most convenient

    method of organizing data is to construct a frequency distribution.

    The most useful method of presenting the data is by constructing charts and graphs.

    This chapter describes how to organize data by constructing frequency distributions and

    how to present the data by constructing charts and graphs. The charts and graphs

    illustrated here are histograms, frequency polygons, ogives, pie graphs.

    2. 1.2 .Organizing Data

    Before the data obtained from a statistical survey or investigations have been worked on,they are called raw data. Since little information can be obtained from looking at raw

    data.

    The following table gives an example of a set of raw data.

    Table 2.1 Marks in Statistics obtained by 20 Students of Level I STEA in 2004

    Data as originally collected

    15 18 7 12 17 9 13 14 12 14

    16 11 10 8 9 16 13 14 10 8

    In order to make the data easily understandable, the first task of the researcher is to

    prepare an array ". The array is prepared by arranging the values of the variable in an

    ascending or descending order. Data array give a general idea of distribution.

    Example: the raw data of table 2.1 have been arrayed and are shown in table 2.2.

    Table 2.2 Raw data of Table 1 put into an array

    7 8 8 9 9 10 10 11 12 12

    13 13 14 14 14 15 16 16 17 18

    From this table, the highest and lowest marks are immediately seen and the marks

    which occur most frequently are readily identified.

    17

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    18/81

    After arranging the data, their bulk must be condensed, reduced, and simplified so

    that the mind comprehends them easily. A first step in such a condensation would be

    achieved by representing the repetitions of a particular value of observation by tallies

    instead of rewriting the value itself.

    The number of tallies corresponding to any given values is the frequency of that

    value and usually represented by the letter f. Frequency means thus the number of

    times a certain value of the variables is repeated in the given data. A table so formed

    is known as frequency distribution

    In other words a frequency distribution is the organization of raw data in table form,

    using classes and frequencies.

    Statistical table

    A statistical table presents numerical data in columns and rows. The main object of

    statistical table is to arrange the physical presentation of numerical facts that the attention

    of the reader is automatically directed to the information. Some of advantages statistical

    tables are:

    Tabulated data can be easily understood than facts stated in the form of

    descriptions; They facilitate quick comparison;

    They leave a lasting impression;

    They make easier the summation of items and detection of errors and

    omissions;

    A tabular arrangement makes it unnecessary to repeat explanations,

    phrases and headings;

    All unnecessary details and repetitions are avoided.

    2.1.3. Types of frequency distributions

    18

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    19/81

    There are two types of frequency distributions: simple frequency distribution orone-way table and grouped frequency distribution

    A. Simple frequency distributions

    A simple frequency distribution consists of a list of data values, each showing the

    number of items having that value.

    a) Quantitative

    Variable X Frequenciesx1 ni

    xn nnTotal N

    Example: There are data from a classroom marks in probability exam in 2005.

    16, 14,5, 8, 15, 15, 9, 12, 10, 9, 11, 11, 10, 17, 12, 10,14,5

    Table 2.3. Frequency distribution of the marks obtained by 18 students

    Marksxi 5 8 9 10 11 12 13 14 15 16 17 Total N

    Tally marks II I II II III I I II II I I 18

    frequencies ni 2 1 2 2 2 1 1 2 2 1 1 18Tally chart is used to record the occurrence of repeated values systematically

    b) Qualitative

    Example: The experience consists to know the number of students in Level I statistics in

    2010 according to their sex. There are 45 students in Level I STA, then gender is coded

    as G for girl and B for boy.

    Table 2.4. Distribution of 45 students in Level I STA according to their sex clothes in

    2009.

    Tallymarks

    frequencyni

    B

    19

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    20/81

    GTotal

    To convert a frequency distribution to relative frequency distribution, each the

    frequencies is divided by the total number of observations.

    When a relative frequency is multiplied by hundred it gives percentage. It is a

    percentage distribution.

    Cumulative frequency distribution is used when we require information on number of

    observations whose characteristic is less than a given value. Data may be arranged in

    such a way as to form a cumulative frequency distribution.

    This is obtained by adding the numbers of observations in value cumulatively.

    Cumulative distributions may be constructed for relative frequencies and percentages by

    adding either the relative frequencies or the percentages in a cumulative way as has been

    for absolute frequencies.

    i

    n

    i

    i-1

    ii

    i

    n

    i

    i-1

    1 1

    % *100, total of f % equal 100

    f % 100

    nrelative frequency :f = ,

    Ntotal of f equal 1

    f 1

    cumulative frequency

    or

    ii

    p p

    i i

    i i

    nf

    N

    cum of ni n f = =

    =

    =

    =

    =

    Example 2.3 :

    20

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    21/81

    Marksxi 5 8 9 10

    11

    12

    13

    14

    15

    16

    17

    Total N

    frequencies ni 2 1 2 2 2 1 1 2 2 1 1 18

    Relative frequency fiCumulative Frequency

    B .Grouped frequency distribution

    When the number of distinct data values in a set of raw data is large more than 20, a

    simple frequency distribution is not appropriate, since there will be too much

    information, not easily assimilated.

    In this case, a grouped frequency distribution is used. A grouped frequency distributionorganizes data items into groups or classes of values, each showing how many items have

    values included within the group, known as the class frequency. The number of classes is

    usually between 5 and 15

    Definitions associated with frequency distribution classes

    a) Class limits : are the lower and upper values of the classes;

    b) The lower class limit represents the smallest data value that can be

    included in the class;c) The upper class limit represents the largest data value that can be

    included in the class;

    d) Class boundaries: are the lower and upper values of a class that mark

    common points between classes. These classes are used when there are

    the closed intervals.

    e) Class width (or length): is the difference between the lower and upper

    class boundaries. If all class intervals of a frequency distribution haveequal widths, this common width is denoted by C in such case C is equal

    to the difference between two successive lower class limits or two

    successive upper class limits. Class width = Upper boundary lower

    boundary;

    21

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    22/81

    f) Class mark or class mid- point: the class midpoint mX is obtained by

    adding the lower and upper class limits and dividing by 2, or adding the

    lower and upper boundaries and dividing by 2

    lower boundary + upper boundary

    2

    lower limit +upper limit

    2

    m

    m

    X

    or

    X

    =

    =

    Formulation of grouped frequency distributions

    A tabulation of n data values into k classes called bins, based on values of data. The bin

    limits are cutoff points that define each bin. Bins must have equal widths and their limits

    cannot overlap.

    1) calculate the range : highest value minus lowest value (W)

    2) find number of classes (K) using following formula:

    K number of classes is 2K N.

    (rule of Sturge1)

    42,5 Yule's rule.K N=

    3) Calculate class interval or class widths ( lengths )

    1h

    K

    =

    or

    1 Herbert Sturge proposed 21 logk N= +

    22

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    23/81

    4) The first classs boundary of the frequency distribution equal lowest value of series

    -2

    h

    The last classs boundary of the frequency distribution equal the first class boundary +

    Hk

    The completed frequency distribution is:

    Class

    limits

    Frequency Cumulative

    Frequency

    Relative

    Frequency

    Percentage

    Total

    EXCLUSIVE AND INCLUSIVE CLASS-INTERVAL

    Class-interval of the type ( ): ( , )x a x b a b< < = are called exclusive (opened) since they

    exclude the upper limit of the class. The following data are classified on this basis.

    Income 50-100 100-150 150-200 200-250 250-300No.of

    persoms

    88 70 52 30 23

    In this method, the upper limit of one class is the lower limit of the next class.

    Class intervals of the type { } [ ]: ,x a x b a b< < = are called inclusive since they include

    the upper limit of the class. The following data are classified on the basis.

    Income 50-99 100-149 150-199 200-249 250-299No.of

    persoms

    60 38 22 16 7

    However, to nsure continuity and to get correct class-limits, exclusive method of

    classification should be adopted. To convert inclusive class-intervals into exclusive, we

    have to make an adjustment.

    23

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    24/81

    Adjustment: find the difference between the lower-limit of the second class and upper

    limit of the first class. Divide it by 2, subtract the value so obtained from all the lower

    limits and add the value to all upper limits. In the above example, the adjustment factor is

    100 99 0.52 =

    The adjusted classes would then be as follows:

    Income 49.5-99.5 99.5-149.5 149.5-199.5 199.5-249.5 249.5-299.5No.of

    persoms

    60 38 22 16 7

    Example: the following data show the height in millimeters for 106 maize plants

    after 2 weeks.

    129 148 139 141 150 148 138 141 140 146 153 141 148 138

    145 141 141 142 141 141 143 140 138 138 145 141 142 131

    142 141 140 143 144 135 134 139 148 137 146 121 148 136

    141 140 147 146 144 142 136 137 140 143 148 140 136 146

    143 143 145 142 138 148 143 144 139 141 143 137 144 133

    146 143 158 149 136 148 134 138 145 144 139 138 143 141145 141 139 140 140 142 133 139 149 139 142 145 132 146

    140 140 140 132 145 145 142 149

    Construct a grouped frequency distribution for the data.

    Solution

    The procedure for constructing a grouped frequency distribution for numerical data

    follows:

    1. Determine the classes intervals:

    Find the highest value and lowest value: H = 158 and L = 121

    Find the range: R = highest value lowest value = H L, so R = 158 121 = 37

    Find the class width by dividing the range by the number of classes.

    Width =R 37

    3,7 4number of classes 10

    = =

    24

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    25/81

    Find the lower limit of the first class of distributions by taking: the lower limit of

    series -width

    2=

    4121 119

    2 =

    The upper class limit of the first class = the lower limit + width = 119 + 4 = 123Find the upper class limit, the high value of distributions by taking: the lower value

    of distributions+ width* number of classes = 119 + 4 *10 = 159

    The completed frequency distribution is:

    Class

    limits

    Class mark

    or

    Mid point

    Frequency Cumulative

    Frequency

    Relative

    Frequency

    Percentage

    119 - 123

    123 - 127

    127 - 131

    131 135

    135 139

    139 143

    143 147

    147 151

    151 155

    155 - 159

    121

    125

    129

    133

    137

    141

    145

    149

    153

    157

    1

    0

    1

    7

    15

    39

    28

    13

    1

    1

    1

    1

    2

    9

    24

    63

    91

    104

    105

    106

    0.009

    0.000

    0.009

    0.066

    0.142

    0.368

    0.264

    0.123

    0.009

    0.009

    0.9

    0.0

    0.9

    6.6

    14.2

    36.8

    26.4

    12.3

    0.9

    0.9

    Total 106 1.000 100

    Exercise

    Example: construct a grouped frequency distribution of students of applied statistic Level

    I in INES in 2010 according to: height, weight, age.

    25

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    26/81

    2.1.4. Two way Frequency Distribution ( Bivariate Frequency Distribution)

    A two way Frequency Distribution is used when two variables are involved.

    A two way frequency table has class intervals for one variable as columns and for the

    other variables as rows. The boxes formed at the intersection of rows and columns thus

    represent a joint class.

    The column and row where are the total are named marginal distributions.

    The others columns and rows are named conditional distribution.

    The frequency of this joint class is the number of items that has the value of the first

    variable in the class given by the column heading and the value of the second variable in

    the class given by the row heading.

    The method of constructing of the two way table consists of the following steps:

    Determine the class intervals for each of the variables;

    Place one of the variables at the top of the table and the other on the left hand

    side;

    Place each item in the approximate box;

    Total the tallies in each box and in each row and column. The grand total of rows

    and columns should check with the total number of items.

    Example1: the following table shows the performance of students in two subjects:

    statistics and Accountancy.

    Roll number of students Marks in Statistics Marks in Accountancy1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    15

    1

    1

    3

    16

    2

    18

    5

    4

    17

    13

    1

    2

    7

    8

    9

    12

    9

    17

    16

    26

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    27/81

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    2223

    24

    6

    19

    14

    9

    8

    13

    10

    13

    11

    11

    12

    189

    7

    6

    18

    11

    3

    5

    4

    10

    11

    14

    17

    18

    1515

    3

    Construct a two way frequency for data, take class interval of two variables (Statistics

    and Accountancy) as 1 5; 6 11, etc. Use of 4 classes of width 5 for each variable.

    The Two way Frequency table for marks in Statistics and Accountancy is shown

    as:

    Statistics

    ACC

    1 - 5 6 - 10 11 - 15 16 -20 Total

    1 - 5 2 3 1 6

    6 - 10 3 2 2 6

    11 - 15 1 4 2 7

    16 - 20 1 2 5 5

    Total 6 6 7 5 24

    Example 2: The age of 20 husbands and wives are given below. Form a two way

    frequency table showing the relationship between the ages of husbands and wives with

    the class-intervals 20-24, 25-29, etc.

    27

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    28/81

    S. No. Age ofhusband

    Age of wife S. No Age of

    husband

    Age of wife

    1 28 23 11 27 242 37 30 12 39 343 42 40 13 23 20

    4 25 26 14 33 315 29 25 15 36 296 47 31 16 32 357 37 35 17 22 238 35 25 18 29 279 23 21 19 38 3410 41 38 20 48 47

    Solution

    Frequency Distribution of Age of Husbands and WivesAge of W

    Age of H

    20-24 25-29 30-34 35-39 40-44 45-49 Total

    20-24 III 3

    25-29 II III 5

    30-34 I I 235-39 II III I 6

    40-44 I I 2

    45-49 I I 2

    Total 5 5 4 3 2 1 20

    Exercises

    1. Prepare a two-way frequency table and marginal frequency tables for 25 values of

    the two variables x and y given below. Take class interval of x as 10-20, 20-30,etc., and that of y as 100-200, 200-300, etc.

    x y x y12 140 51 25024 256 27 550

    28

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    29/81

    33 360 42 36022 470 43 57044 470 52 29037 380 57 41626 280 44 380

    36 315 48 45255 420 48 37048 390 52 31227 440 41 33057 390 69 59021 590

    2. Prepare a bivariate frequency distribution for the following data:

    Marks in Law Marks in

    Statistics

    Marks in Law Marks in

    Statistics

    10 20 13 2411 21 12 2310 22 11 2211 21 12 2311 23 10 2214 23 14 2212 22 12 2012 21 13 2413 24 10 2310 23 14 24

    2.2 . GRAPHIC REPRESENTATION OF A FREQUENCY DISTRIBUTION

    After the data have been organized into a frequency distribution, they can be presented in

    graphical form. It is easier to comprehend the meaning of data presented graphically than

    data presented numerically in tables or frequency distributions.

    The three most commonly used graphs in research are:

    1. The histogram ;

    2. The frequency polygon;

    3. The cumulative frequency graph or ogive.

    29

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    30/81

    1. Histogram

    A histogram is a graphic presentation of a frequency distribution, in which the classes

    are marked on the horizontal axis and the class frequencies on the vertical axis. The class

    frequencies are represented by the heights of the rectangle. Each rectangle represents just

    one class; the rectangle width corresponds to the class width and the rectangles are drawn

    adjacent to each other.

    Notice: in drawing histograms class intervals must be equal and exclusive.

    Example: For the following frequency distribution of height of students drawn the

    histogram.

    Height 140-145 145-150 150-155 155-160 160-165 165-170 170-175No.of

    Students

    4 10 18 20 19 6 3

    Solution

    Histogram of distribution of height of

    students

    0

    5

    10

    15

    20

    25

    Height( Class)

    Numbersofstuden

    ts(

    frequencies)

    140-145

    145-150

    150-155

    155-160

    160-165

    165-170

    170-175

    In the frequency distribution, if the class intervals are of unequal width, we have

    first to calculate frequency density on a convenient scale.

    30

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    31/81

    i

    d

    ii

    i

    i

    i

    nd

    a

    f

    a

    =

    =Some time we can multiply densities to the smallest class interval

    Otherwise to multiply to a predetermined interval or choose the smallest in yourdistribution.

    0

    i 0

    a

    d a

    ii

    i

    i

    i

    nd

    a

    f

    a

    =

    = With a0 the smallest interval

    Example: Average monthly earning of 1035 employees in construction industry

    Monthly earning Numberof workers

    Width0a

    ii

    i

    nd

    a=

    60-70 25 10 2570-80 100 10 10080-90 150 10 150

    90-100 200 10 200100-120 240 20 120120-140 160 20 80140-150 50 10 50150-180 90 30 30

    180 and more 20 - -

    Draw the histogram

    Histogram of average monthly earninng of

    1035 employees

    0

    50

    100

    150

    200

    250

    Average monthly earning ( classes)

    Numberof

    workers(

    frequencies)

    60-70

    70-80

    80-90

    90-100

    100-120

    120-140

    140-150

    150-180

    31

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    32/81

    If the frequency distribution has inclusive class intervals, they should be converted

    into the exclusive type and only then, the histogram should be drawn.

    Example: Draw histogram to present the following data.

    Income No.of Employees Income No.of Employees100-149

    150-199

    200-249

    250-299

    21

    32

    52

    105

    300-349

    350-399

    400-449

    450-499

    62

    43

    18

    9

    Solution: here the grouped frequency distribution is not continuous because the classintervals are inclusive. We first convert it into a continuous distribution as follows:

    Adjustment factor150 149

    0.52

    = . Subtract it from each lower limit and add to each

    upper limit so as to have exclusive class intervals. Thus

    Income No.of Employees

    Income No.of Employees

    99.5 -149.5

    149.5-199.5

    199.5-249.5

    249.5-299.5

    21

    32

    52

    105

    299.5-349.5

    349.5-399.5

    399.5-449.5

    449.5-499.5

    62

    43

    18

    9

    32

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    33/81

    Frequency distribution of employees by

    earned income (HISTOGRAM)

    0

    20

    40

    60

    80

    100

    120

    Income

    Numberofemp

    loyees 99.5-149.5

    149.5-199.5

    199.5-249.5

    249.5-299.5

    299.5-349.5

    349.5-399.5

    399.5-449.5

    449.5-499.5

    2. A frequency Polygon

    A frequency polygon is a graph of class marks. Class marks are values of middle points

    of class intervals. The polygon is drawn by placing the class marks on the horizontal axis,

    and on the vertical axis are placed the frequency of observations.

    If the class intervals are of equal width, the class frequencies are plotted against the class

    mid values. If the class intervals are of unequal width, the graph is obtained by plotting

    frequency density against class mid values.

    Description of a frequency polygon:

    1) Each class is represented by a single point. The height of the point represents the

    class frequency; the position of the point must be directly above the

    corresponding class mid point;

    2) The points are joined by straight lines.

    3) The extremities of the graph are joined with the mid- values of the class preceding

    the first class and the class following the last class at zero frequency i.e on the x-

    axis.

    A curve of relative frequencies can also be drawn, and so can a curve of percentages.

    These are called frequency curves.

    Example: For the following frequency distribution, draw a frequency polygon.

    33

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    34/81

    Income 300-400 400-500 500-600 600- 700 700-800 800-900 900-1000workers 18 32 35 30 21 12 4

    Solution

    midpoint 350 450 550 650 750 850 950workers 18 32 35 30 21 12 4

    Frequency polygon of distribution of Income

    3. Cumulative Frequency Curve or the Ogive

    A cumulative frequency distribution (traditionally called an ogive) is a graph that

    represents the cumulative frequencies for the classes in a frequency distribution.

    Cumulative frequency graph is used to visually how many values are below a certain

    upper class boundary.

    There are two types of ogives:

    A) Less than ogive: Plot the points with the upper limits of the classes as abscissae and

    the corresponding less than cumulative frequency as ordinates.

    For less than distributions, the cumulation will proceed from the least to the greatest size,

    and the series so obtained will be called less than cumulative frequency distribution.

    B) For more than distributions, the cumulation will proceed from the greatest to the least,

    and the series so obtained will be called more than cumulative frequency distribution.

    To form cumulative frequency distributions, the points are joined with straight lines.

    34

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    35/81

    Example

    Draw the two ogives for the following distribution showing the number of marks of 59

    students.

    Marks No. Of students Marks No. Of students

    0-10

    10-20

    20-30

    30-40

    4

    8

    11

    15

    40-50

    50-60

    60-70

    12

    6

    3

    Solution

    Construction of two Ogives

    marks No.of students ( f) Less than cumulat f More than Cumul f0-1010-2020-3030-4040-5050-6060-70

    4811151263

    4122338505659

    595547362193

    Plotting the points ( 10, 4), ( 20,12), (30,23), ( 40, 38), ( 50,50 ), ( 60, 56), ( 70, 59) andjoining them by free hand, the smooth rising curve so obtained is less than ogive.

    Plotting the points (0, 59), (10, 55), (20, 47), (30, 36), (40, 21), (50, 9), (60, 3) andjoining them by free-hand, the smooth falling curve so obtained is the more than ogive.

    Less-than and more than cumulative frequency of marks distribution

    35

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    36/81

    EXERCISES

    1. This table represents sex, age, height, weight of 24 students of Level I AST at INES in

    2010.

    Order Sex Age Height (en cm) Weight (en kg)

    1 F 22 160 58

    2 F 19 170 60

    3 M 23 161 50

    4 M 26 180 61

    5 M 22 159 49

    6 M 27 172 70

    7 M 23 150 45

    8 M 22 150 48

    9 F 23 170 65

    10 M 23 160 58

    11 F 25 155 59

    12 F 23 162 6013 F 24 171 80

    14 F 24 170 62

    15 F 24 165 64

    16 F 23 173 61

    17 F 22 160 57

    18 F 18 163 52

    19 F 19 143 48

    20 F 25 167 67

    21 F 23 168 59

    22 F 22 172 63

    23 F 24 162 5524 F 22 174 63

    Draft a form of tabulation to show:

    Sex and age, weight and height, age and weight, age and height. Present absolute , relative, %, cumulative frequency distributions. Draw Histogram, frequency polygon and ogive for age, height and weight.

    2. Draw a histogram for the following frequency distribution of heights of students. Fromthe histogram, obtain the frequency polygon.

    Height 140-150 150-160 160-165 165-170 170-180 180-190 No. of students

    5 15 15 20 10 2

    36

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    37/81

    3. Daily wages of works of a factory has the following distribution. Draw the less than

    cumulative frequency graph for the wages.

    wages 30-39 40-49 50-59 60-69 70-79 80-89 90-99 100-109 Total

    No of

    works

    9 25 34 25 19 13 7 2 134

    4. Draw a histogram and frequency polygon for the following data:

    Marks No. Of students Marks No. Of students

    0-10

    10-20

    20-30

    30-4040-50

    5

    13

    12

    118

    50-60

    60-70

    70-80

    80-9090-100

    4

    1

    3

    12

    5. The following are the weights of 30 students. Draw up a frequency distribution with:

    a) Class intervals 40-44, 45-49, 50-54,kgs.

    b) Class intervals of width 6 kgs each.

    Weights ( kgs) : 51, 47, 50, 54, 62, 52, 42, 49, 52, 49, 44, 50, 53, 58, 46, 50, 51, 53, 48,

    50, 55, 52, 55, 58, 63, 54, 52,49,50,58.

    2.3 . DIAGRAMATIC AND CHARTS REPRESENTATION

    In section 2.2, graphs such as the histogram, frequency polygon, and ogive showed how

    data can be represented when the variable displayed on the horizontal axis is quantitative,

    such as heights and weights.

    On the other hand, when the variable displayed on the horizontal axis is qualitative or

    categorical several types of charts are used such that: pictograms, statistical maps orcartogram, spider chart, Gantt charts, bar chart, pareto charts, time series graphs, pie

    graphs and so on.

    37

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    38/81

    This section is concerned with the presentation of non numeric or qualitative frequency

    distributions data. The types of diagram described in this section include various types of

    bar charts, pie charts, pareto charts and time series graphs.

    1) Bar charts

    a) Simple bar charts

    It is a chart constructing of a set of non-joint bars. A separate bar for each class is drawn

    to a height proportional to the frequency.

    %

    0.00

    20.00

    40.00

    60.00

    80.00

    %

    0 . 00 2 0. 0 0 4 0. 0 0 6 0. 0 0 8 0 . 00

    tuerculoid

    indeterminate

    %

    %

    0.00

    50.00

    100.00

    %

    % 6 0. 64 2 7. 31 7 .2 3 4 .8 2

    tuercul leprom indeter bordeli

    %

    60.64

    27.31

    7.23

    4.82

    %

    The following bar charts is used for discrete variable

    38

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    39/81

    Example: This table shows the details of monthly expenditure of two families.

    Draw a bar diagram to the data.

    Family items of expenditure Family A Family B

    Food

    Clothing

    House Rent

    Education

    Fuel and Lighting

    Miscellaneous

    Saving

    140

    80

    100

    30

    40

    40

    70

    240

    160

    120

    80

    40

    80

    80Total 500 800

    Solution

    Detail for monthly expenditure of family A

    0

    20406080

    100120140160

    Food

    Clothin

    g

    HouseRent

    Educ

    ation

    FuelandLig

    hting

    Misc

    ellaneo

    us

    Savin

    g

    Expenditure

    Revenue

    Series1

    39

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    40/81

    Details for monthly expenditure of family

    B

    050

    100150200250300

    Food

    HouseRent

    Fueland

    Lighting

    Saving

    Expenditure

    Rev

    enue

    Series1

    ii. Multiple bar charts

    These charts are used as extension of simple bar charts, where another dimension of the

    data is given.

    0 . 0 0

    1 0 . 0 0

    2 0 . 0 0

    3 0 . 0 0

    4 0 . 0 0

    tu e r c ul o i dl e pr o ma t o usi nde t e r mi na tebo r de l i ne

    ma l e

    f e ma l

    male

    female

    0.00

    20.00

    40.00

    60.00

    80.00

    female

    male

    Example: draw bar charts to show the details of monthly expenditure of two families.

    Solution

    40

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    41/81

    Details for monthly expenditure of two

    families A and B

    050

    100150200250300

    Food

    HouseRent

    Fueland

    Lighting

    Saving

    Expenditure

    Rev

    enue

    Series1

    Series2

    2) Pie charts

    A pie chart shows the totality of the data being represented using a single circle. The

    circle is split into sectors, the size of each one being drawn in proportion to the class

    frequency. Each sector can be shaded or colored differently if desired.

    Procedures of drawing a pie graph are:

    Step 1: Calculate the proportion of the total that each frequency represents, using the

    formulaf

    nwhere f = frequency of the class and n = total number of values.

    Step 2: Find the number of degrees for each class, using the formula

    Degrees = o360f

    ng

    or

    Step 3: Find the percentage of values in each class by using the formula

    % 100f

    n= .

    Step 4: Using a protractor and compass, graph each section and write its name and

    corresponding degrees or percentage

    41

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    42/81

    Advantages and disadvantages of Pie charts

    Advantages: easy to construct; easy to understand, a sense of continuity is given by line

    diagram which is not present in a bar chart.

    Disadvantages: might be confusing if too many diagrams with closely associated values

    are compared together. Where several diagrams are displayed, there is no provision for

    total figures.

    Example1: construct a pie charts for the following data

    Monthly expenditure

    of family A

    Family A

    Rs % age Cumulative % age

    Food

    Clothing

    House Rent

    Education

    Fuel and Lighting

    Miscellaneous

    Saving

    140

    80

    100

    30

    40

    40

    70

    28

    16

    20

    6

    8

    8

    14

    28

    44

    64

    70

    78

    86

    100

    Monthly expenditure of Family A

    28%

    16%

    20%

    6%

    8%

    8%

    14% Food

    Clothing

    House Rent

    Education

    Fuel and Lighting

    Miscellaneous

    Saving

    This graph shows that food is the most expenditure of family A.

    42

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    43/81

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    44/81

    For extra money

    For something different to do

    Other

    18

    12

    8

    1. A questionnaire about how people get news resulted in the following information

    from 25 respondents. Construct a frequency distribution and a pie graph for the data

    (N = newspaper, T = television, R = radio, M = magazine).

    N N R T T R N T M R M M N R M T R M

    N M T R R N N

    2. A questionnaire on housing arrangements showed this information obtained from 25

    respondents. Construct a frequency distribution and pie graph for the data (H =house, A = apartment, M = mobile home, C = condominium).

    H C H M H A C A M C M C A M A C

    C M C C H A H H M

    4) Pareto chart

    A pareto chart is used to represent a frequency distribution for a categorical or qualitative

    variable, and the frequencies are displayed by the heights of vertical bars, which are

    arranged in order from highest to lowest.

    Procedures of drawing a pareto chart

    1. Arrange the data from the largest to smallest according to frequency

    2. Draw and label the x and y axes

    3. Draw the bars corresponding to the frequencies.

    Example: The following data are based on a survey from American Travel Survey onwhy people travel. Construct a pareto for the data and comment.

    Purpose Number Personal business

    Visit friends or relatives

    146

    330

    44

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    45/81

    Work related

    Leisure

    225

    299Source: USA TODAY

    0

    50

    100

    150

    200

    250

    300

    350

    1

    Visit friends or relatives

    Leisure

    Work related

    Personal business

    This chart shows that the majority of American travel for visiting friends or relatives and

    the minority travel for personal business.

    5) Time series

    When data are collected over a period of time, they can be represented by a time series

    graph. A time series graph represents data that occur over a specific period of time.

    Procedures of drawing a time series

    Step 1: Draw and label the x and y axes

    Step 2: Label the x axis for years and the y axis for the number of

    Step 3: Plot each point according to the table

    Step 4: Draw line segments connecting adjacent points.

    Example 1: the number of bank failures in the United States during the years 1989 2000is shown. Draw a time series graph to represent the data and comment the results.

    Year 198

    9

    199

    0

    199

    1

    199

    2

    199

    3

    199

    4

    199

    5

    199

    6

    199

    7

    199

    8

    199

    9

    2000

    N. of 207 169 127 122 41 13 6 5 1 3 8 7

    45

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    46/81

    failures

    0

    50

    100

    150

    200

    250

    1985 1990 1995 2000 2005

    Series1

    The graph shows the bank failures from 1989 trough 2000. The most bank failed was

    between 1989 and 1992.

    Example 2: The following table shows meat production for lamb for the years 1960

    2000 (data are in millions of pounds), construct a time series for the data.

    year 1960 1970 1980 1990 2000Lamb 769 551 318 358 234

    46

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    47/81

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1950 1960 1970 1980 1990 2000 2010

    Year

    Meatprod

    uctionforLamb

    Series1

    The graph shows a decline in the quantity of meat production for lamb from 1960

    through 2000.

    47

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    48/81

    Chapter 3: DATA DESCRIPTION: MEASURES OF CENTRAL TENDENCY,

    MEASURES OF DISPERSION, MEASURES OF POSITION.

    3.1 INTRODUCTIONThis chapter explains the basic ways to summarize data. These include measures of

    central tendency, measures of variation or dispersion, and measures of position.

    Central tendency refers to the location of a distribution. A measure of central tendency is

    any of a number of ways of specifying this "central value". Several types of averages can

    be defined, the most important being the mean, the median, the mode and midrange.

    Means could be arithmetic, geometric, or harmonic mean.

    The three most commonly measures of variation are the range, variance, and standard

    deviation. The most common measures of position are percentiles, quartiles, and deciles

    3.2 MEASURES OF CENTRAL TENDENCY

    A. The arithmetic mean

    1. Definition of the arithmetic mean

    The arithmetic mean of a set of values is the simple arithmetic average of theobservations. This is defined as the sum of the values of all the observations divided by

    the number of observations"

    The arithmetic mean is normally abbreviated to just the "mean or average

    The mean or average, of a population is represented by, the Greek letter ( mu); and for

    a sample, by the Roman letterX (read X bar ).

    That is arithmetic mean=the sum of all the values of observations in the sample

    the number of values in the sample

    48

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    49/81

    2. The arithmetic mean for ungrouped data

    The formula for calculating the arithmetic mean is:

    j

    11 2 3 N

    n

    j

    j=11 2 3 n

    xxx x x ... x

    for the populationN N N

    xxx x x ... x

    X for the samplen n

    N

    j

    n

    =+ + + += = =

    + + + += = =

    Where:

    The symbol ( Geek capital letter "sigma")stands for summation: it meansthe total of";

    x represents any particular value of an observation;

    x is the sum of all values in the sample or population; N represents the total number of observations in the population;

    n refers to the number of observations in the sample.

    Assume that the data are obtained from samples unless otherwise specified.

    Example 1: Find the arithmetic mean (the average) of the numbers 8, 3, 5, 12, and 10.

    Solution: in this data set, 1 2 3 4 58, 3, 5, 12, 10 x x x x x= = = = = , n = 5

    Then8 3 5 12 10 38

    x 7.65 5

    + + + += = =

    49

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    50/81

    3. The mean of a simple( discrete ) frequency distribution

    The mean for a simple frequency distribution is calculated using the following

    formula:

    k

    j

    j=11 1 2 2 3 3 x

    1 3 3

    1

    xx xx + x x ... x

    Mean, xn

    j

    k

    k

    kj

    j

    ff f f f f f

    f f f f f f

    =

    + += = = =

    + + +

    Where

    X represents values

    f represents frequencies f is the total frequency or the total number of observations ( n) fx refers to the sum of each value x times its frequency f

    Example : calculate the arithmetic Mean of the marks of 46 students given in the

    following table.

    Table 3.1 Frequency of marks of 46 students

    Marks ( X) Frequency ( f ) fx9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    1

    2

    3

    6

    10

    11

    7

    3

    2

    1

    9

    20

    33

    72

    130

    154

    105

    48

    34

    18Total 46 623

    50

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    51/81

    The total of all these values ( fx ) = 623Total number of observations ( n) = 46

    Therefore, the arithmetic mean of the marks of 46 students is,623

    13.54

    46

    fxx

    n

    = = =

    4. The mean of a grouped frequency distribution

    For grouped data, and x are calculated byx x

    and x =N n

    f f=

    Where f is the frequency, x the mid-point of the class interval and n the total number of

    observation.

    Procedure of finding the Mean of grouped frequency distribution

    Characteristics of the Arithmetic Mean

    1. Make a table as shown.

    Class interval Frequency( f) Midpoint (x) of

    class interval

    f.x

    2. Find the midpoints of each class

    3. Multiply the frequency by the midpoint for each class

    4. Find de sum of the frequency f of each class times the class midpoint X.

    4. Divide the sum obtained by the sum of the frequencies.

    51

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    52/81

    Example 1: Calculate the arithmetic mean of the following data:

    Table 3.2 shows profit per shop

    Profit in N.of shops( f) Mid-point of

    Class interval

    f.x

    0-10 12 5 6010-20 18 15 27020-30 27 25 67530-40 20 35 70040-50 17 45 76550-60 6 55 330Total 100 2800

    The mean profit is:2800

    28100

    fx

    n= =

    Example 2: The following data relates to the number of successful sales made by the

    salesmen in a particular quarter.

    Number of sales: 0- 4 5 9 10 14 15- 19 20 24 25 29

    Number of salesmen 1 14 23 21 15 6

    Calculate the mean number of sales

    Answer:

    Number of sales

    ( class interval)

    Number of

    Salesmen (f)

    Class midpoint ( x) ( fx)

    0 to 4 1 2 25 to 9 14 7 9810 to 14 23 12 27615 to 19 21 17 35720 t0 24 15 22 33025 to 29 6 27 162Totals 80 1225

    1225

    80

    122515.3

    80

    fx

    f

    fxx

    f

    =

    =

    = = =

    The advantages of the mean

    52

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    53/81

    The mean is the most commonly used measure of central tendency

    Every set of interval- or ratio- level data has a mean;

    It is easily understood;

    All the values are included in computing the mean;

    A set of data has only one mean. The mean is unique;

    It is used in performing many other statistical procedures and tests.

    It is not necessary, to know the value of each individual observation in order to

    calculate the arithmetic mean. Only the total of the observations and the number

    of observations are required.

    The disadvantages of the mean are:

    The mean is affected by extremely high or low values, called outliers, and may

    not be the appropriate average to use in these situations;

    It is time consuming to compute for a large body of ungrouped data;

    It cannot be calculated when the last class of grouped data is open ended ( i.e., it

    includes the lower limit of the last class " and over ");

    The sum of the deviations of each value from the mean will always zero:

    Expressed symbolically:

    ( ) 0X X =As an example, the mean of 3, 8, and 4 is 5. Then:

    ( ) ( ) ( ) ( )3 5 8 5 4 5X X = + +

    B. THE MEDIAN

    The median is generally considered as an alternative average to the mean

    53

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    54/81

    The value of the variable which divides the distribution so that exactly half of the

    distribution has the same or larger values and exactly half has the same or lower values is

    called the median.

    1. The median for ungrouped dataThe median of a set of data is the middle value that separates the higher half from the

    lower half of the data set after they have been ordered from the smallest to the largest, or

    the largest to the smallest.

    Procedure for obtaining the median of a set of data:

    order the given data from the smallest to the largest or the largest to the

    smallest;

    Select the middle point.

    Example: Find the median of the following five observations

    1 2 3 4 5x 10, x 15, x 6, x 12 and x 11= = = = =

    Solution: We must:

    1. order the given numbers from the smallest to the largest: 6, 10, 11, 12, 15

    2. Select the middle point: the middle value is 11.Therefore the Median (MD) = 11

    Note 1.Whena set of data contains an even number of items; there is no unique middle or

    central value. The convention in this situation is to use the mean of the middle two items

    to give a median

    .

    Example 1: Find the median of the following six observations:

    1 2 3 4 5 6x 10, x 15, x 6, x 12 and x 11, x 17= = = = = =

    Solution: As before arrange all the values of the observations in numerical order:

    6, 10, 11, 12, 15, 17

    54

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    55/81

    Evidently there is no middle value. However two numbers lie in the middle: 11 and 12.

    The two must be added together and divided by 2; thus obtaining their average:

    11+12

    MD = 11.52

    =

    Example 2: calculation of the median for the data given in table 3.1

    Solution:

    Arranging all the 24 values in ascending order of magnitude, we get the following data:

    2.90 3.57 3.73

    2.98 3.61 3.75

    3.30 3.62 3.76

    3.43 3.66 3.76

    3.43 3.68 3.77

    3.45 3.71 3. 84

    3.55 3.72 3.88

    The 12th value is 3.66 and 13th is 3.68; the median is the average of these two.

    Median =3.66 3.68

    3.67 %2

    g+

    =

    Note 2. For a set with an odd number ( n) of items, the median can be precisely identified

    as the value of the1

    2

    nth

    +item. Thus in a size ordered set of the 15 items, the median

    would be the15 1

    8th item along.2

    th the+

    =

    2. Median for a simple frequency distribution

    Where there is a large number of discrete items in a data set, but the range of values is

    limited, a simple frequency distribution will probably have been compiled.

    The median for a simple frequency distribution is calculated by the following formula:

    MD =1

    2

    f+

    55

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    56/81

    Where f is cumulative frequency, represented by F or N

    Procedure for calculating the median

    To calculate the median for a simple (discrete) frequency distribution, the followingprocedures should be followed

    1. Calculate the value of1

    2

    f+ ;

    2. Form a F ( cumulative frequency) column;

    3. Find that F value which first exceeds1

    2

    f + ;

    4. The median is that x value corresponding to the F value identified in 3.

    Example: calculate the median for the following distribution of delivery times of orders

    sent out from a firm.

    Delivery time (days) 0 1 2 3 4 5 6 7 8 9 10 11

    Number of orders 4 8 11 12 21 15 10 4 2 2 1 1

    Answer

    STEP 1 The median is the1

    2

    Nth

    +

    = 91 12th+

    = 46th item

    STEP 2 The F Column is shown in the following table:

    Delivery time Number of orders

    (Days) orders cum

    ( x ) ( f ) ( F)

    0 4 4

    1 8 12

    2 11 23

    3 12 35

    4 21 56

    5 15 71

    56

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    57/81

    6 10 81

    7 4 85

    8 2 87

    9 2 89

    10 1 90

    11 1 91

    STEP 3 The first F value to exceed 46 is F = 56

    STEP4 The median is thus 4 (days)

    3: Median for a grouped frequency distribution

    There are two methods commonly employed for estimating the median for a grouped

    frequency distribution.

    a) using an interpolation formula;

    b) by graphical interpolation

    a) Estimating the median by formula

    Given a grouped frequency distribution, the best that can be done is to identify the

    class or group that contains the median item. From there, using cumulative

    frequencies and the fact the median must lie exactly one half of the way along the

    distribution.

    The formula for calculating the median for a grouped distribution is:

    Median =2 .

    NF

    L c

    f

    +

    57

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    58/81

    Where

    lower bound(limit) of the median class ( the class contains the middle

    item of distribution)

    sum of frequecies of all classes lower than the median class

    = median class widt

    L

    F

    c

    =

    =

    h(interval of median class)

    = frequency of the median class

    N = total number of obsrvations

    f

    Example1: calculation of Median for the Data of table 3.2

    Protein intake/consumption

    unit (g) /day ( class interval)

    N.of families

    Frequencies ( f)

    Cumulative

    frequency15-25 30 3025-35 40 7035-45 100 17045-55 110 28055-65 80 36065-75 30 39075-85 10 400Total 400

    Median class is 45 -55 N=400

    Median = ( ).

    200 170 .10245 47.73

    110

    NF C

    L gf

    + = + =

    Procedure for estimating the median by formula

    The procedure for estimating the median (by formula) for a grouped frequency

    distribution is:

    1. Form a cumulative frequency (F) Column;

    2. Find the value ofN

    ( where N = ).2

    f

    58

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    59/81

    3. Find that F value first exceeds, which identifies the median class M.

    4. Calculate the median using the following interpolation formula:

    2 .

    NF

    L c

    f

    +

    Example: Estimate the median for the following data, which represents the ages of a set

    of 130 representatives who took part in a statistical survey.

    Age in years 20 and 25 and 30 and 35 and 40 and 45 and

    Under 25 under 30 under 35 under 40 under 45 under 50

    Number of 2 14 29 43 33 9

    Representatives

    Answer

    1.

    Age ( years) Number of representatives ( f) ( F )

    20 and under 25 2 225 and under 30 14 1630 and under 35 29 4535 and under 40 43 8840 and under 45 33 12145 and under 50 9 130

    2.130

    652 2

    N= =

    3. The median class is the class that has the first F greater than 65. Here, it is 35 to 40.

    4. The median can now be estimated using the interpolation formula.

    59

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    60/81

    35; 43; 5

    2Thus, median = .

    65-45= 35 + 5

    43

    = 37.33

    Median = 37.33years

    L F c

    NF

    cf

    = = =

    b) Estimating the median graphically

    A percentage cumulative frequency curve (or ogive ) is drawn and the value of the

    variable that corresponds to the 50% point is read off and gives the median estimate.

    Procedure for estimating the median graphically

    1. Form a cumulative ( percentage ) frequency distribution

    2. Draw up cumulative frequency curve by plotting class upper bounds against

    cumulative percentage frequency and join the points a smoth curve.

    3. Read off 50% point to give median.

    Properties of Median

    1. The median is particularly useful where :

    a) a set or distribution has extreme values present and

    b) Values at the end of a set or distribution are not known. This means that

    median is used for an open ended distributions.

    2. The median can be determined for all levels of data except nominal

    3. the median is unique; there is only one median for a set of data

    The advantages of the median

    The advantages of the median are:

    it is not affected by extremely large or small values ;

    60

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    61/81

    it is easily understood ( i.e half the data are smaller than the

    median and half are greater);

    it can be calculated even when the last class is open ended and

    when the data ere qualitative rather than quantitative;

    The disadvantages of the median

    It does not use much of the information available;

    It requires that observations be arranged into any array, which is time

    consuming for a large body of ungrouped data.

    C. THE MODE

    1. Definition

    The mode is the value of the observation that appears most frequently, or equivalently

    has the largest frequency. Especially, the mode is used in describing nominal and ordinal

    levels of measurement

    It is possible for data not to have any mode at all; like in a case where observations occur

    with equal frequency.

    Example: The mode of the set 2, 1, 3, 3, 1,1, 2, 4 is 1, since this value occurs most often.

    For the data in table 3.1 is 3.76 this observation is most commonly occurring

    The mode of the following simple discrete frequency distribution :

    X 4 5 6 7 8 9 10

    f 2 5 21 18 9 2 1

    Is 6, since this value has the largest frequency

    2. The mode for grouped data

    For a grouped frequency distribution, the mode cannot be determined exactly and so must

    be estimated. The technique used is one of interpolation. There are two methods that can

    be used to estimate the mode:

    Using an interpolation

    61

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    62/81

    Graphically, using a histogram.

    Mode of a grouped frequency distribution by formula

    An estimate of the mode for a grouped frequency distribution can be obtained using the

    following procedure:

    1. Determine the modal class ( that class which has the largest frequency)

    2. Calculate D1= difference between the largest frequency and the frequency

    immediately preceding it.

    3. Calculate D2 = difference between the largest frequency and the frequency

    immediately following it.

    4. Use the following interpolation formula:

    Interpolation formula for the mode

    1

    1 2

    DMode = L+ .

    DC

    D

    +

    Where: L = lower bound of modal class

    C = modal class width

    And: D1, D2 are as described above in 2 and 3

    Example 1: Estimate the mode of the following distribution of ages.

    Age (years) 20-25 25-30 30-35 35-40 40-45 45-50

    Number of employees 2 14 29 43 33 9

    Answer:

    Age (years) number of employees

    20 and under 25 2

    25 and under 30 14

    30 and under 35 29

    35 and under 40 43

    40 and under 45 33

    45 and under 50 9

    62

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    63/81

    D1= 43 29 = 14

    D2= 43-33 = 10

    The lower class bound of the modal class, L = 35

    The class width of the modal class, C = 5 (from 35 to 40 )

    1

    1 2

    Thus: mode= .C

    14= 35+ .5

    14+10

    mode = 37.92 years

    DL

    D D

    + +

    Graphical estimation of the modeThe graphical equivalent of the above interpolation formula is to construct three

    histogram bars, representing the class with the highest frequency and the ones on either

    side of it, and to draw two lines. The mode estimate is the x value corresponding to the

    intersection of the lines.

    Example 2: Estimation of the mode of a frequency distribution using the graphical

    formula.

    Using the data of ex 1:

    Age (years) number of employees

    30 and under 35 29

    35 and under 40 43

    40 and under 45 33

    Draw the graph

    63

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    64/81

    The advantages of the mode

    The mode has the advantage of not being affected by extremely

    high or low values;

    It is easily understood ( half the data are smaller than the median

    and half are greater), not difficult to calculate and can be used

    when the last class of a distribution is open ended;

    The mode is used for al levels of data: nominal, ordinal, interval,

    and ratio.

    The disadvantages of the mode

    The disadvantages of the mode are:

    The mode does not use much of the information available;

    For many sets of data, there is no mode because no value

    appears more than once. For example, there is no mode for this

    set of price data: RWF250 , RWF 400, RWF 650 and RWF

    1250 ;

    The mode is not always unique. Example: suppose the ages ofthe individuals in a scout Club is 14, 16, 17, 18, 18, 20, 20, 22,

    24, 24, and 25. Both the ages 27 and 35 are modes.

    In general, the mean is the most frequently used measure of central tendency and the

    mode is the least used.

    lowest value highest valueMR

    2

    +=

    Example:

    D. THE MIDRANGE

    The midrange is defined as the sum of the lowest and highest values in the data set,

    divided by 2. The symbol MR is used for the midrange.

    Find the midrange of these numbers: 2, 3, 6, 8, 4, and 1

    64

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    65/81

    1 8 9MR 4.5

    2 2

    += = =

    Then, the midrange is 4.5

    The Relationship between the Arithmetic Mean, the Median and the Mode

    In a symmetrical frequency distribution the mode, median, and mean are located

    at the center and are always equal illustrates this for a normal distribution .Fig

    (a ) in this case one of these measures may be used.

    Mean

    Median

    Mode

    If the distribution of the variable is not symmetrical, we have a skew distribution:

    the arithmetic mean is not so typical of the distribution. In a positively skewed

    distribution, the mean is not at the centre. The mean is dragged to the right of

    centre by a few extremely high values of the variable that have been observed.

    The median is generally the next largest measure in a positively skewed

    frequency distribution. The mode is the smallest of the three measures. If the

    distribution is highly skewed, the mean would not be a good measure to use. The

    median and mode would be more representative.

    mode median mean

    65

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    66/81

    In a negatively skewed distribution the mean is reduced by a few extremely low

    values of the variable and hence will be left of centre. The median is greater than

    the arithmetic mean, and the modal value is the largest of the three measures.

    Again, if the distribution is highly skewed, the mean should not be used to

    represent the data.

    In a moderately skew distribution the following relationship holds approximately:1. Mean - Mode= 3 (mean-Median);

    2. Median mode = 2 ( mean median );

    3. Median =2 mean + mode

    3;

    4. Mode = 3 median 2 mean ;

    5. Mean =3 median - mode

    2

    THE GEOMETRIC MEAN G

    The geometric mean is useful in finding the average of percentages, ratios, indexes, or

    growth rates. It has a wide application in business and economics because we are often

    interested in finding the percentage changes in sales, salaries, or economic figures, such

    as the Gross Domestic Product, which compound or build on each other.

    The geometric mean G of a set of N positive numbers 1 2 3, , ,... nx x x x , is calculated using

    the formula: Geometric mean= 1 2 3...n nx x x x

    Where n is the number of observation made of the variable x and 1 2 3, , ,..., n x x x x are the

    values of these observations.

    Example: the geometric mean of the numbers 3, 25 and 45 is:

    G = 3 3 25 45 = 3 3375

    66

    Mean median mode

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    67/81

    THE HARMONIC MEAN H

    The harmonic mean is another specialized measure of location used only in particular

    circumstances; namely when the data consists of a set of rates, such as prices, speeds or

    productivity.

    The harmonic mean H of a set of N numbers 1 2 3, , ,... nx x x x , is the reciprocal of the

    arithmetic mean of the reciprocals of the numbers:

    H =

    1

    111 1n

    i i

    n

    xn x=

    = Where n is the number of observations.

    Example: the harmonic mean of the numbers 2, 4, and 8 is:

    H =

    3 33.43

    1 1 1 72 4 8 8

    = =+ +

    The relation between the arithmetic, geometric, and harmonic means.

    The geometric mean of a set of positive numbers 1 2 3, , ,... nx x x x is less than or equal to

    their arithmetic mean but is greater than or equal to their harmonic mean. In symbols:

    XH G

    The equality signs hold only if all the numbers 1 2 3, , ,... nx x x x are identical.

    Example: The set 2, 4, 8 has arithmetic mean 4.67, geometric mean 4, and harmonic

    mean 3.43.

    67

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    68/81

    3.2 MEASURES OF DISPERSION

    Dispersion refers to the variability or spread in the data. A small value for a measure of

    dispersion indicates that the data are clustered closely, say, around the arithmetic mean. A

    large measure of dispersion indicates that the mean is not reliable.

    The most important measures of dispersion are:

    1. Range is the difference between the largest and the smallest values in

    a data. The range is the simplest of the three measures and is defined

    now. The symbol Ris used for the range.

    R= Largest value smallest value

    1. Find the range of the following distribution.

    35, 45, 30, 35, 40, 25

    R = 45- 25 = 20

    2. Mean Deviation (MD) is the arithmetic mean of the deviations of the

    observations from the arithmetic mean ignoring the sign of these

    deviations.

    a) The formula for the mean deviation for ungrouped data is

    MD = for populationsX

    N

    MD = for samplesX X

    n

    mean

    Where:

    X is the value of each observation;

    X is the arithmetic mean of the values;

    68

  • 8/3/2019 47721775 Ines Descriptive Statistics Level i Asta 2010

    69/81

    is th


Recommended