+ All Categories
Home > Documents > Introduction of Statistics

Introduction of Statistics

Date post: 12-Feb-2017
Category:
Upload: raja-ram-sharma
View: 99 times
Download: 0 times
Share this document with a friend
64
Introduction of Statistics Economic and Non-economic Activities All human beings are engaged in some activity or the other in order to satisfy their basic requirements. For example, farmers are engaged in their field, workers are engaged in factories or teachers are engaged in schools or colleges. All human activities can be divided into two groups:- Human Activities Economic Activities (to earn money) Non-economic Activities (to get satisfaction) Economic Activities are those activities which are related to earn money. For example – worker working in construction site, shopkeeper selling goods in shop or teacher teaching in school or college. Economic activities are concerned with all those activities, which are concerned with production, consumption or investment. So every economy goes for three activities which are as follows – Consumption - It is an economic activities which deals with the use of goods and services for the satisfaction of human wants. For example - eating of bread or watching TV. Production - It refers to all activities which are undertaken to produce goods and services forr generation of income and satisfying human wants. For exple - trader or teacher. Investment - It means expenditure made on the purchase of goods and servises for generating further income.
Transcript
Page 1: Introduction of Statistics

Introduction of StatisticsEconomic and Non-economic Activities

All human beings are engaged in some activity or the other in order to satisfy their basic requirements. For example, farmers are engaged in their field, workers are engaged in factories or teachers are engaged in schools or colleges. All human activities can be divided into two groups:-

Human Activities ↓

Economic Activities(to earn money)

Non-economic Activities(to get satisfaction)

Economic Activities are those activities which are related to earn money. For example – worker working in construction site, shopkeeper selling goods in shop or teacher teaching in school or college.

Economic activities are concerned with all those activities, which are concerned with production, consumption or investment. So every economy goes for three activities which are as follows –

Non-Economic Activities – Activities are not concerned with creation of money or wealth are known as non-economic activities. For example – housewife cooking food for family or teacher teaching his son.

Statistics

The word ‘statistics’ derived from the Latin word ‘Status’ or the Greek word ‘Statistique’ which means a political state. The word statistics conveys different meaning to different people regard statistics as data, facts or measurements, while others believe it to be the study of figures.

Meaning of Statistics

Consumption - It is an economic activities which deals with the use of goods and services for the satisfaction of human wants. For example - eating of bread or watching TV.

Production - It refers to all activities which are undertaken to produce goods and services forr generation of income and satisfying human wants. For exple - trader or teacher.

Investment - It means expenditure made on the purchase of goods and servises for generating further income.

Page 2: Introduction of Statistics

Statistics has been defined differently by different writers from time to time, emphasizing précising the meaning, scope and limitation of the subject. Some writers have defined statistics as statistical data (plural sense), whereas others as statistical methods (singular sense).

Statistics as a Plural Sense

In plural sense, statistics refers to aggregates of facts, affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for predetermined purpose and placed in relation to each other. In simple words, it means a collection of numerical facts.

Features of Statistics as a Plural Sense

Statistics has following features –

(a) Aggregates of facts – Statistics are a number of facts. Single and isolated figures are not statistics as such figures cannot be compared. For example, a single student’s mark 88 is not a statistics, but a series relating to average marks of students in the class will be called statistics.

(b) Affected by multiplicity of causes – Numerical data are influenced by variety of factors. It is not easy job to study the effects of any one factor separately by ignoring other factors. For example, agriculture crop like rice is affected by the rainfall, fertilizers, seeds, method of cultivation etc. It is not possible to study separately the effect of each of these forces on the production of rice.

(c) Statistics are numerically expressed – The statistical approach to a subject is numerical. So, any facts, to be called statistics, must be numerically or quantitatively expressed. For example, Ishita is taller than Manyata and Ankita, will not be called statistics. However, if the same facts are expressed in nubbers (like Ishita: 160 cm, Manyata: 150 cm and Ankita: 145 cm), will call statistics.

(d) Statistics should be collected with reasonable standard of accuracy – Data is collected with reasonable accuracy. For example, when we say that 40 students were present in the class, we are enumerating the number of students present in the class. But when a news channel says that there are 2000 casualties in the earthquake in Nepal on April 25, 2015, then the news channel is simply estimating the number of casualties.

(e) Statistics are collected for a predetermined purpose – The purpose of collecting statistical data must be decided in advance, otherwise usefulness of the data collected would be negligible. Data collected in an unsystematic manner and without complete awareness of the purpose will be confusing and cannot be made on the basis of valid conclusions.

(f) Statistics are collected in a systematic manner – For accuracy or reliability of data, the figures should be collected in a systematic manner, the reliability of such data will deteriorate.

(g) Statistics should be placed in relation to each other – Collection of statistical data are generally done with the motive to compare.

Statistics as a Singular Sense

In singular sense, the term statistics means statistical method, i.e. it is a method of dealing with numerical facts.

• Collection - It is the main and the first step in a statistical inquiry. The technique of collection of data depends upon the objective of the study.

• Organization of data - After collection of data, the data is organised in a proper form which involves editing and classification.

• Presentation of data - After classification, the data is presented in some suitable manner, in the form of text, table, diagram or graph.

1

Page 3: Introduction of Statistics

• Analysis of data - After presentation of data, analysis is done with the help of simple statistical techniques. Like as measures of central tendency or measures of dispersion.

• Interpretation of data - It is the last step in the statistical methodology.

Distinguish between Plural Sense V/s Singular Sense

Plural Sense Singular Sense Statistics deals with numerical information. Statistics is a body of various methods and tools.

It is descriptive in nature. It is basically a tool of analysis.It is often in the raw state. It helps in processing the raw data.It is quantitative. It is an operational technique.

Function of Statistics

It performs many functions useful to human beings which are as follows –

1. To simplify complex facts – It is very difficult for an individual to understand and conclude from huge numerical data. Statistical methods try to understand great mass of complex data into simple and understandable form. For example, statistical techniques like mean, correlation, graph etc. make complex data intelligible and understandable in short period and better way.

2. To present facts in definite form – Quantitative facts can easily be believed and trusted in comparison to abstract and qualitative facts. Statistics summarizes the generalized facts and present them in definite form. For example, inflation in India is 8% annually, is more convincing like prices are rising.

3. To make comparison – Comparison is one of the main functions of statistics as the absolute figures convey a less concrete meaning. For comparison various statistical methods like averages, ratio etc. are used.

4. To facilitate planning and policy formulation – On the basis of numerical data and their analysis, businessmen and administrators can plan future activities and shape their policies.

5. To help in forecasting – As business is full of risks and uncertainties, correct forecasting is essential to reduce the uncertainties of business. Statistical tools (time series analysis) helps in making projections for future.

6. Formulation and testing of hypothesis – Statistics methods are extremely useful in formulating and testing hypothesis. For example, we can test the hypothesis, whether a rise in railway fares and freights will affect passenger traffic or goods traffic or not.

7. To enlarge individual knowledge and experience – Statistics enable people to enlarge their horizon. It sharpens the faculty of rational thinking and reasoning, and is helpful in propounding new theories and concepts.

Importance of Statistics

A. Importance to the GovernmentB. Importance in EconomicsC. Importance in Economic Planning D. Importance in Business

Importance to the Government

In the present scenario, Government collects the largest amount of statistics for various purposes.

2

Page 4: Introduction of Statistics

The role of government has increased and requires much greater information in the form of numerical figures, to fulfill the welfare objectives in addition to the efficient running of their administration.

Popular statistical methods such as time-series analysis, index numbers, forecasting and demand analysis are extensively used in formulating economic policies.

In a democratic country like India, various political groups are also guided by the statistical analysis regarding their popularity in the masses.

Importance of Statistics in Economics

Formulation of economic laws – Law of demand and concept of elasticity of demand have been developed by the inductive method of generalization, which is also based on statistical principles.

Statistical data and statistical methods play a vital role in understanding and solving economic problems such as poverty, unemployment, disparities in the distribution of income and wealth etc.

Study of market structures requires statistical comparison of market prices, cost and profits of individual firms.

Statistical methods can be used to estimate mathematical relation between various economic variables.

Trend-series analysis is used to study the behavior of prices, production and consumption of commodities, money in circulation and bank deposits and clearings.

Statistical surveys of prices helps in studying the theories of prices, price policy and price trends as well as their relationship to the general problem of inflation.

Importance of Statistics in Economic planning

At every stage of economic planning, there is a need for figures and statistical methods. Using statistical techniques, it is possible to assess the amounts of various resources available in

the economy and accordingly determine whether the specified rate of growth is sustainable or not.

Statistical analysis of data regarding an economy may reveal certain crucial areas, like increasing rate of inflation, which may require immediate attention.Importance of Statistics in Business

For establishing a business unit For estimating the demand of product For production planning For making quality control For marketing strategy Accounts writing and auditing

Limitations of Statistics

(a) Statistics does not study qualitative phenomena – Statistics can be applied in studying only those problems which can be stated and expressed quantitatively.

- Qualitative characteristics such as honesty, poverty, welfare, beauty, health etc. cannot be measured quantitatively.

(b) Statistics does not deal with individuals – Statistics deals only with aggregates of facts and no importance is attached to individual items. For example, marks of one student of a class does not constitute statistics, but the averages marks have statistical relevance.

(c) Statistics can be misused – Statistics can be misused by ignorant or wrongly motivated persons. Any person can misuse statistics and draw any type of conclusion he likes.

(d) Statistical results are true only on average – Statistics, as a science, is not as accurate as many other sciences are. Natural sciences are exact as their results are universally true. However, statistical laws

3

Page 5: Introduction of Statistics

are not exact. For example, if average number of thefts in a town is 3 per week, it does not mean that if 3 thefts have taken place on the first day of the week, there will be no more thefts in that week.

(e) Statistical laws are not exact – As statistical laws are probabilities in nature, interferences based on them are only approximate and not exact like interference based on mathematical or scientific laws.

(f) Only experts can make the best possible use of statistics – The techniques of statistics are not so simple to be used by any layman. These techniques can only be used by the experts as they are complicated in nature.

(g) Statistical data should be uniform and homogeneous – It is essential that data must be uniform and homogeneous. Heterogeneous data are not comparable. For example, it would be of no use to compare the heights of trees with the heights of men because these data are of heterogeneous.

Assignment for Introduction of statistics

1. Define statistics in plural sense.2. What is meant by statistics in singular sense?3. State two example of quantitative data.4. State two example of qualitative data.5. What is meant by statistical tools?6. Why the problem of distrust of statistics arise?7. Explain any three importance of statistics.

Revision Exercise

1. Define statistics in plural sense.2. Briefly explain the meaning of statistics in singular sense.3. What is meant by distrust of statistics?4. What is meant by statistical tools?5. State two functions of statistics.

4

Page 6: Introduction of Statistics

Collection of Data

Introduction

Statistics has gained significant place in the modern complex business world. Data is base on which the superstructure of statistical investigation is made. The success and failure of investigation mainly depends upon the quality, adequacy and accuracy of data.

Important are used in statistics are –

A. Statistical Enquiry - It means a search conducted by statistical methods/ enquiry.B. Investigator – The person who conducts the statistical enquiry is termed as investigator.C. Enumerator – The investigator requires the help of certain persons to collect the information, are termed

as enumerator.D. Respondents – The persons from whom information is collected are called respondents.E. Survey – It is a method of gathering information from individuals. The objective of the survey is to collect

data to describe some features like price, quality or usefulness.

Collection of Data – It is the first step in any statistical investigation.

Sources of Data

Internal Sources of Data – In an organization, when data is collected from its reports and records, is known as internal sources of data. For example – sales, salary, profit, dividend etc.

External Sources of Data – Information collected from outside agencies is called external data which can be obtained from primary sources or secondary sources. This type of data can be collected by census or sample methods.

Primary Data

Primary data is original and first hand information. The source from which the primary data is collected is called the primary source. For example, population census conducted by Government of India.

Secondary Data

5

Page 7: Introduction of Statistics

The data which is not directly collected but rather obtained from the published or unpublished sources, is known as secondary data. It is also known as second hand data. For example, Economic survey published by Government of India.

Difference between Primary Data and Secondary Data

Basis Primary Data Secondary Data Originality They are original because they are

collected by investigator himself. They are not original since investigator makes use of the collected by other agencies.

Source They are collected by some agency or person by using the method of data collection.

They are already collected and processed by some person or agency and is ready for use.

Time Factor It requires longer time for data collection. It requires less time.Cost Factor It requires a considerable amount of

money and personals as whole plan of investigation himself collect it.

It is cheaper as it is taken from published or unpublished materials.

Reliability and Suitability

It is more reliable and suitable to the enquiry as the investigator himself collects it.

It is less reliable and less suitable as someone else collected the data which may not serve the purpose.

Precautions There is no great need for precautions while using primary data.

There should be used with great care and caution.

Organization Factor

Collection of primary data requires elaborate organization set up.

There is no need for organizational set up in case of secondary data.

Method of Collecting Data

A. Direct Personal InvestigationB. Indirect Oral InvestigationC. Information from Local Sources or CorrespondentsD. Information through Questionnaire and Schedules

Direct Personal Investigation

Data are collected by the investigator personally from persons is called direct personal investigation. He interviews personally everyone who is in a position to supply information he requires. We can use this method of collection of data when area of enquiry is limited or when a maximum degree of accuracy is needed. The success of this method requires that the investigator should be very diligent, efficient, impartial and tolerant.

Suitability of this method

(a) When detailed information has to be collected.(b) When area of investigation is limited.(c) When nature of enquiry is confidential.(d) When maximum degree of accuracy is needed.(e) When importance is given to originality.

Merits of Direct Personal Investigation

(a) The data collected is original in nature.(b) Data is fairly accurate when personally collected.(c) There is uniformity in collection of data.(d) There is flexibility in the enquiry as the investigator is personally present.(e) It is economical, in case the field of investigation is limited.

6

Page 8: Introduction of Statistics

Demerits of Direct Personal Investigation

(a) It can be used if the field of enquiry is small. It cannot be used when field of enquiry is wide.(b) It is costly method and consume more time.(c) Personal bias can give wrong results.(d) This method is lengthy and complex.

Indirect Oral Investigation

It is that method by which information is obtained not from the persons regarding whom the information is needed. It is collected orally from other persons who are expected to possess the necessary information.

Suitability

(a) When concerned informants are unable to give information due to their ignorance or they are not prepared to part with the information.

(b) When the area of investigation is very large.(c) When secret or sensitive information about the information has to be gathered.(d) When the problem of investigation is complex and need expert’s opinion.

Merits of Indirect Oral Investigation

(a) It is suitable when the area of investigation is large.(b) It is economical in terms of time, money and manpower.(c) It is relatively free from personal bias as the information is collected from the persons who are well aware

of the situation.

Demerits of Indirect Oral Investigation

(a) The result can be erroneous because information is obtained from other persons not directly connected.(b) As compared with direct personal observation, the degree of accuracy of the data is likely to be lower.(c) The persons, providing the information, may be prejudiced or biased.(d) The information collected from different persons may not be homogeneous and comparable.

Information from Local Sources or Correspondent

In this methods, local agents or correspondents are appointed and trained to collect the information from the different parts of the investigation area. These agents regularly supply the information to the central office.

This method is often adopted by newspapers and periodicals for information about politics, business, prices of agricultural and industrial product, stock market, strikes etc.

Suitability of Information from Correspondent

(a) When regular and continuous information is required.(b) When area of investigation is very large.(c) When high degree of accuracy is not required.

Merits of Information from Correspondent

(a) It is comparatively cheap.(b) It gives results easily and promptly. (c) It covers a wide area under investigation.

Demerits of Information from Correspondent

7

Page 9: Introduction of Statistics

(a) In this method original data is not obtained.(b) It gives approximate and rough results.(c) Different attitudes of different correspondents and agents may increase errors.

Information through Questionnaires and Schedules

Under this method, the investigator prepares a questionnaire keeping in view the objective of the enquiry. There are two ways of collecting information on the basis of questionnaire -

(a) Mailing Method and (b) Enumerator’s method

Mailing Method

Under this method, the investigator makes a questionnaire pertaining to the field of investigation and send it to the respondents, along with a covering letter, to collect information from them. It is also assured that the information would be kept confidential.

Suitability of Mailing Method

(a) When the field of investigation is very large.(b) When respondents are literate and likely to co-opearte with the investigation.

Merits of Mailing Method

(a) It is economical in terms of time, money and efforts involved.(b) It is original and therefore, fairly reliable. This is because the information is duly supplied by the

concerned persons themselves.(c) It allows wide coverage of the area of study.

Demerit of Mailing Method

(a) Informants do not take interest in questionnaire and fails to return the questionnaires. Those who return, often send incomplete answers.

(b) It lacks flexibility. When questions are not properly replied, these cannot be changed to obtain the required information.

(c) If the respondents are biased, then the information will also be biased.

Enumerator’s Method

Under this method, a questionnaire is prepared according to the purpose of enquiry. The enumerator himself approaches the informant with a questionnaire. The questionnaires which are filled by the enumeratos themselves by putting questions are called schedules.

Construction of Questionnaire or Schedule

A questionnaire or a schedule is a list of questions relating to the problem under investigation.

Quality of a Good Questionnaire

(1) Limited Number of Questions – The number of questions should be as small as possible. Long questionnaires discourage people from completing them. Only those questions which have a direct relevance to the problem be included.

(2) Simple and Short Questions – The questions should be clear, brief and simple. The question should be framed in such a manner that their answers are specific and precise.

(3) Proper Order of the Questions – Questions must be placed in a proper order.(4) No Undesirable Questions – These type questions or personal questions must be avoided.

8

Page 10: Introduction of Statistics

(5) Non-controversial – Questions should be such as can be answered impartiality. (6) Avoid Question requiring Calculation – The questions relating to calculations which force the

respondent to recollect from his memory should not asked. For example, informants should not be asked yearly income, since in most of the cases they are paid monthly.

(7) Instructions to the Informants – The questionnaire should provide necessary instruction about the terms and units in it. Clear and definite instructions for filling in the questionnaire and address, where completed questionnaire should be sent, must be given.

(8) Questionnaire should look Attractive – a questionnaire should be made to look as attractive as possible. The printing and the paper should be of good quality and enough space should be provided for answers.

(9) Request for return – Request should be made to the respondents to return the questionnaire completed in all respects.

Specimen Questionnaire – Consumer 1. Name ___________________________________2. Age _______________3. Address _______________________________________4. Sex □ Male □ Female5. Phone: Landline ________________ Mobile

_________________6. Monthly Family Income:

□ Less than ₹10,000 □ ₹10,000 to ₹20,000 □ ₹20,000 to ₹30,000 □ More than ₹30,000

7. What kind of

Collection of Secondary Data

9

Page 11: Introduction of Statistics

Census and Sample Methods of Collection of Data Census Method

When a statistical investigation is conducted wherein, the data is collected from each and every element of the population or universe, is termed as census method. Generally the term population is used to mean total number of people living in a country. Population of India was 125 crore in 2015. But in statistics, the term population means the aggregate of all items about which we want to obtain information. For example, there are 1000 students in a particular school. If an investigation relates to all the 1000 students, then 1000 would be taken as universe or population. Each of unit of these 1000 is called item.

Census method is also known as ‘Complete Enumeration’ or 100% Enumeration or Complete Survey.

Merits of Census Method

(h) Intensive study of population(i) High degree of accuracy and reliability (j) Study of diverse characteristics

Demerits of Census Method

(h) Expensive (i) Needs more time and manpower(j) Not suitable to large investigation

Sample Method

It is that method in which data is collected about the sample on a group of items taken from the population for examination and conclusions are drawn on their basis.

Merits of Sample Method

(i) Economical – It is more economical than the census techniques as the task of collection and analysis of data is confined only to a fraction of the population.

(ii) Time Saving(iii) Identification of Error – Because only a limited number of items are covered, errors can be easily

identified. To that extent sampling method shows better accuracy. (iv) More Scientific – It is more scientific because the sample data can be conveniently investigated from

various angles(v) Administrative Convenience – In case of sampling, scale of operation remains at low level. So,

planning, organization and supervision can be conveniently managed, which leads to administrative convenience.

Demerits of Sample Method

(i) Partial – If the investigator is biased, then he might select sample deliberately. In such cases, selected sample cannot be a representative of the characteristics of all the characteristics of the population.

(ii) Wrong conclusion (iii) Difficulty in selecting representative sample(iv) Difficulty in framing sample

Types of Sampling

10

Page 12: Introduction of Statistics

Random Sampling

Random sampling method refers to a method in which every item in the universe has a known chance of being chosen for the sample. It is also known as ‘Probability Sampling’.

(i) Lottery method(ii) Table of Random Numbers

Merits of Random Sampling

(i) It is free from personal bias of the investigator.(ii) Each and every items of the population stands equal chances of being selected.(iii) The universe gets fairly represented by the sample

Demerits of Random Sampling

(i) Unsuitable for small sampling(ii) Difficult to prepare sampling frame(iii) Time consuming

Purposive Sampling

It is that sampling in which the investigator himself makes the choice of the sample items whh in hopinion are the best representative of the universe.

Stratified or Mixed Sampling

In this method, the universe or the entire population is divided into a number of groups or strata and then certain numbers of items are taken from each group at random.

Systematic Sampling

11

Page 13: Introduction of Statistics

Under this method, out of the complete list of available population, the sample is selected by taking every nth item from this list.

Quota Sampling

In this method, the population is divided into different groups or classes according to different characteristics of the population.

Convenience Sampling

In this method, sampling is done by the investigator in such a manner that suits his convenience. For example, to estimate the average height of an Indian, the investigator can take a convenience sample from Delhi city only and estimate the average height of an Indian.

Revision Exercise

Very Short Answer Type Questions

1. What do you by a statistical enquiry?2. What are two main sources of data?3. What is the meaning of primary data?4. What do you mean by secondary data?5. State merits of primary data.6. Mention two demerits of primary data.7. Expand NSSO.8. What do you mean by enumerator?

Short Answer Type Questions

1. What do you mean by secondary data? Mention its sources.2.

12

Page 14: Introduction of Statistics

Organization of Data What is Classification?

The quantitative information collected in any field of society or science is never uniform. They always differ from one to another e.g., prices of vegetables, students in different sections, income of families. Height or weight of a person etc.

The process of grouping into different classes or sub-classes according to characteristics is termed as classification. In the words of Conner, “ Classification is the process of arranging things in groups or classes according to their resembles and affinities and gives expression to the unity of attributes that may exist amongst a diversity of individuals”.

Attributes – The characteristics which are not capable of being measured quantitatively are called attributes. For example, blindness, literate rate, beauty, intelligence etc.

Basis of Classification

Geographical – when the data is classified according to geographical location o region, is called geographical classification. When population of different states is presented.

States Uttar Pradesh Maharashtra Bihar Madhya Pradesh Andhra Pradesh

Rajasthan

population 20 crore 12 crore 10 crore 8 crore 7.8 crore 7.5 crore Chronological – When data is classified with respect to different periods of time, the type of classification is known as chronological classification.

Qualitative – When data is classified on the basis of descriptive characteristics or on the basis of attributes like gender, literacy, region, caste, etc. which cannot be quantified.

Quantitative – Data is classified on the basis of some characteristics which can be measured such as height, weight, income, expenditure, production or sale.

Concept of Variable

A characteristic which is capable of being measured and changes its value overtime is called a variable. A single item out of all the observations of groups as numerical may be called variate or variable. Examples – price is a variable as prices of different commodities are different.

There are two types of variable –

13

Page 15: Introduction of Statistics

(a) Continuous Variable – These variables which can take all the possible values (integral as well as fractional) in a given specified range are termed as continuous variables.

Weight (kg) 30-35 35-40 40-45 45-50 50-55 55-60No. of Students 22 12 8 5 6 3

(b) Discrete Variable – Variables which are capable of taking only exact value and not any fractional value are termed as discrete variables.

No. of children 0 1 2 3No. of families 5 8 9 13

Frequency

Frequency refers to number of times a given value appears in a distribution. For example, suppose there are 30 students in a class and out of them –

15 students have got 70 marks 12 students have got 88 marks 3 students have got 95 marks

Class Frequency – The number of times an item repeats itself corresponding to a range of value (class interval) is termed class frequency. For example, if there are 5 students securing marks between 70-80, then 5 is the frequency corresponding to the class interval 70-80. Thus, 5 will be called frequency.

Tally Bars – Every time an item occurs, a tally bar, (I) is marked against that item.

Raw Data

A mass of data in its crude form is called raw data. It is an unorganized mass of the various items.

Series – Raw data are classified in the form of series. Series refers to those data which are presented in some order and sequence. Arranging of data in different classes according to a given order is called series. In simple words, series is arranged in some logical order.

Types of series

Individual Series

14

Page 16: Introduction of Statistics

Individual series refers to that series in which items are listed single, i.e. each item is given a separate value of measurement. It is presented in two ways –

Ascending Order Descending OrderWhen data is arranged systematically from the lowest value to the highest value, then such arrangement is in the ascending order. For example, - 70, 72, 87, 95 and 98.

When data is arranged systematically from the highest value to the lowest value, then such arrangement is in the descending order. For example,- 98, 95, 87, 72 & 70.

Discrete Series or Frequency Array

A discrete is that series in which data are presented in a way that exact measurements of items are clearly shown. In this series, there is no class intervals.

Illustration –

10 students of Class Xi have secured the following marks –

45, 50, 88, 98, 88, 45, 45, 85, 65 and 65.

Table – Discrete Series

Marks Tally Bars Frequency 455065858898

IIIIIIIIII

321121

Total 10Frequency Distribution

A table in which the frequencies and the associated values of a variable are written side by side, is known as frequency distribution.

Some Important Terms

Class – It means a group of numbers in which items are placed such as 10-20, 20-30, etc.

Class Limit – The lowest and highest values of the variables within a class is called class limit.

Class-Interval – The difference between the lower limit (l1) and upper limit (l2) is known as class-interval.

i= l1 – l2

Range – The range of a frequency distribution can be defined as the difference between the lower limit of first class-interval and upper limit of the last class-interval.

Mid-point – It is the central point of a class-interval.

Mid-point = l1+l2/2

Class Frequency – The number of observations corresponding to a particular class is known as class frequency or the frequency of that class. It is denoted generally by f. The sum of frequencies is denoted as ∑f or N.

Types of Frequency Distribution

15

Page 17: Introduction of Statistics

A. Exclusive SeriesB. Inclusive SeriesC. Open End D. Cumulative FrequencyE. Mid-Value

Exclusive Series – It is that series in which every class interval excludes items corresponding to its upper limit.

Classes Frequency 10-2020-3030-4040-50

65910

Total 30 Inclusive Series – It is that series which includes all items upto its upper limit.

Classes Frequency 10-1920-2930-3940-49

65910

Total 30Difference between Exclusive Method and Inclusive Method

S.N.

Exclusive Method Inclusive Method

1. The upper limit of a class interval is counted in the next immediate class.

Both the limits of a class interval is counted in the same class.

2. The upper limit of a class interval and lower limit of next class are the same.

The upper limit of a class interval and lower limit of next class are different.

3. There is no need of converting it to inclusive method prior to calculation.

For simplicity in calculation, it is necessary to change it into exclusive method.

Open End distribution – When the lower limit of the first class and the upper limit of last class is not given, is known as open end distribution.

Classes Frequency Below 20 20-4040-6060-8080 and above

1512855

Total 45Cumulative Frequency Series – It is that in which the frequencies are continuously added corresponding to each class interval in the series.

Classes Cumulative Frequencies 5-1015-2020-2525-30

591520

Mid-Value Frequency Series – It is the middle value of a class interval. When such mid values are given

16

Page 18: Introduction of Statistics

, is called mid value series.

Mid-value Frequency 15253545

5645

Total 20

Assignment for Organization of Data

1. What is classification?2.

Presentation of Data – Textual and Tabular Presentation Textual Presentation

A textual presentation is a descriptive form of presentation of data written in text or paragraph. It is also called descriptive presentation of data.

Tabular Presentation

It is a systematic presentation of numerical data in columns and rows in accordance with some important features or characteristics.

Component of a Table

(i) Table Number – A table should always be numbered for identification and reference in the future. A table must be numbered 1, 2, 3 etc.

(ii) Title – There must be a title on the top of the table. The title must be appealing and attractive.

17

Page 19: Introduction of Statistics

(iii) Stubs – These are titles of the rows of a table. These titles indicate information contained in the row of the table.

(iv) Caption – It is the title given to the columns of a table.(v) Body of the Table – This is the most important part of the table as it contains data.(vi) Source – A source note refers to the source from which information has been taken.(vii) Footnote – It is the last part of the table. Footnote explains the specific feature of the data content of

the table which is not self-explanatory and has not been explained earlier.

ILLUSTRATION

Table – 1 Coffee Drinking Habits in Town X and Y

Kinds of Table

A. According to Purpose

18

Page 20: Introduction of Statistics

B. According to Originality C. According to Construction

According to Purpose – There are two types of table –

(i) General Purpose Table – This is also called as reference or repository table. It provides information about general use of table for example, census of India.

(ii) Special Purpose Table – It is called text, summary or analytical tables. Such tables are small in size and designed to highlight a particular set of facts in a simple and analytical form.

According to Originality – there are also two types of table –

(i) Original Table – An original table is that in which data are presented in the same form and manner in which they are collected.

(ii) Derived Table – It provides total, ratio, percentage and other statistical calculations. Such tables can be derived from general purpose tables.

According to Construction – There are two types of table –

(i) Simple or One Way Table – It is the simplest table which shows only one characteristics and takes the form of frequency table, for example,

Marks No. of Students0-2020-4040-60

52520

Total 50(ii) Complex Table – A table which presents data according to two or more characteristics is known as

complex table.

Classification of Data and Tabular Presentation

Tabular presentation is based on four fold classification of data –

(i) Qualitative Classification of Data and Tabular Presentation – It occurs when data are classified on the basis of qualitative attributes.

19

Page 21: Introduction of Statistics

(ii) Quantitative Classification of Data and Tabular Presentation – It occurs when data are classified on the basis of quantitative characteristics of a phenomenon.

(iii) Temporal Classification of Data and Tabular Presentation – Data are classified according to time and time becomes the classifying variable.

(iv) Spatial Classification – In spatial classification, place becomes the classifying variable.

Assignment for Presentation of Data

1. What do you mean by presentation of data?2. What is meant by table?3. Define tabulation.4. What are the main forms of a table?

20

Page 22: Introduction of Statistics

5. What are the requisites of a good table? 6. What are the main forms of table?7. Write three essentials of a satisfactory table?8. What are parts to be present in a table? Write any three.

Measures of Central Tendency – Arithmetic Mean What is a central Tendency?

The single value that reads the characteristics of the complex and varied mass of data is called average or central value. The value always falls between the lowest and highest values of the data. It is generally located in the centre or middle of the observations. An average is a figure that represents the whole group is called a measure of central tendency or measure of location.

According to clark, “ An average is a figure that represents the whole group.”

Objective and Function of Average

(i) To present huge data in summarized form(ii) To make comparison easier(iii) To help in decision making (iv) To know about universe from a sample(v) To trace precise relationship (vi) Base for computing other measures

Characteristics of a Representative Average

(i) It should be simple to calculate and easy to understand.(ii) It should be rigidly defined.(iii) It should be based on all the observations.(iv) It should be least affected by fluctuations of sample.(v) It should be capable of further algebraic treatment.(vi) It should not be affected much by extreme values of data.

Types of Statistical Averages

Arithmetic Mean (Mean)

21

Page 23: Introduction of Statistics

Mean is the number obtained by dividing the total values of different items by their number. In other words, mean is defined as the sum of the values of all observations divided by the number of observations. It is generally denoted by . It can be computed in two ways –

A. Simple Arithmetic MeanB. Weighted Arithmetic Mean

Methods of Calculating Simple Arithmetic Mean

We know, there are three types of statistical series –

1. Individual Series2. Discrete Series3. Frequency distribution

Calculation of Mean in Case of Individual Series

There are three methods to calculate mean of individual series –

(i) Direct Method – According to this method, all the units are added and then their total is divided by the number of items and the quotient become the mean.

Steps of Direct Method 1. Let the items be X1, X2, ………. Xn.2. Add up the values of all the items and obtain the total i.e, ∑X.3. Find out total number of items in the series, i.e., N.4. Divide total number of items ∑X by total number of N.

=∑ XN

(ii) Short-Cut Method – This method is also called assumed mean method.

¿ A+ ∑dn

(iii) Step Deviation Method – Step deviation method further simplified the short cut method. In this method, deviations from assumed mean are divided by a common factor (h) to get step deviations.

¿ A+ ∑d '

N Xh

Illustration Calculate arithmetic mean from the following data – 30, 45, 60, 15, 65, 85, 20.Ans. Computation of Average marks

Direct Method Short-Cut Method Step Deviation Method Marks (X) Marks (X) D = X – A (A=40) Marks (X) d=X –A d’=X-A/h

3045604015658520

3045604015658520

-105200-252545-20

3045604015658520

-105200-252545-20

-2140-559-4

∑ X = 360 N=8 ∑d= 40 N = 8 ∑d’=8

22

Page 24: Introduction of Statistics

= ∑ XN

= 360/ 8 = 45

A+∑dN

= 40 + 40/8

= 45 = A +∑d '

NX h

= 40 + 40/8 = 45

Discrete Frequency Series In case of discrete, values of variable shows the repetitions, i.e, frequencies are given corresponding to different valus of variable. Mean in a discrete series can be computed by applying – (i) Direct Method – In this method, various items (x) are multiplied with their respective frequencies

(f) and the sum of products (∑fX) is divided by total of frequencies ∑f to determine mean.

= ∑fx∑ f

(ii) Short-Cut Method – This method saves considerable time in calculating mean.1. Denote the variable as X and frequency as f.2. Decide any item of the series as assumed mean (A).3. Calculate the deviations (d) of the items from the assumed mean.4. Multiply the deviations (d) with the respective frequency (f) and obtain the total to get ∑fd.

¿ A+ ∑fd∑f

(iii) Step Deviation Method – In this method, the values of the deviations (d) are divided by common factor (h).

¿ A+ ∑f d '

∑fXh

Illustration

Calculate mean from the following series –

Size 8 10 12 14 16 18 20Frequency 6 12 15 28 20 14 5

Ans. Computation of Mean in Discrete Frequency Series

Direct Method Short-Cut Method (A= 14) Step Deviation Method

X f fd x f D= x-A fd x f d d’ Fd’8101214161820

612152820145

48120180392320252100

8101214161820

612152820145

-6-4∑-20246

-36-48-300405630

8101214161820

612152820145

-6-4-20246

-3-2-10123

-18-24-150202815

100 1412

= 1412100

= 14.12

100

= A+∑fdN

=14 +12100 = 14.12

12 100= A + ∑fd ’

N× h

= 14+ 6100

×2 =

6= 14.12

6

23

Page 25: Introduction of Statistics

14.12

Calculation of Mean in Case of Frequency Distribution

In this series, the method of calculation of mean is the same as in the case of discrete series. The only difference is that in frequency series mid-point of various class intervals are required to be obtained.

(i) Direct Method - Steps (a) Obtain mid-points (m) of the classes, i.e., l1+l2/2 (b) Multiply the frequency with mid-point (fm).(c) Get the sum of products ∑fm(d) Divide ∑fm by total number of observations (N).

¿ ∑fmN

(ii) Short-Cut Method – Steps (a) Obtain mid-point.(b) Decide assumed mean (A).(c) Calculate the deviation from assumed mean.(d) Multiply deviation by frequency and get fd.

= A + ∑fdN

(iii) Step Deviation method – Formula

= A + ∑f d '

NXh

Illustration

Calculate mean of the following distribution of daily wages of workers in a factory –

Daily wages No. of Workers100-120120-140140-160160-180180-200

102030155

Ans. Computation of Mean in different methods –

Direct Method Short-Cut Method Step Deviation MethodWages f m fm Wages f m d fd X f m d’ fd’100-120120-140140-160160-180180-200

102030155

110130150170190

1100260045002550950

100-120120-140140-160160-180180-200

102030155

110130150170190

-40-2002040

-400-4000300200

100-120120-140140-160160-180180-200

102030155

110130150170190

-2-1012

-20-2001510

24

Page 26: Introduction of Statistics

80 ∑fm=11700

= ∑fm

N

=1170080

=146.25

80 ∑fd= -300

= A+∑fdN

=150+−30080

= 150-3.75=146.25

80 ∑fd’=-15

= A+∑f d '

NXh

= 150+(−15 )80

×20

= 150 – 3.75= 146.25

Calculation of Corrected Arithmetic Mean

=∑ X (wrong )+ (correct Value )−¿(Incorrect Value)/N

Illustration

Mean marks obtained by 50 students are estimated to be 40. Later on it is found that one value was read as 63 instead of 36. Find out the corrected mean.

Ans. = 2000+ 36 – 63/50

= 1973 = 39.46

Weighted Arithmetic Mean

Weighted mean refers to the average when different items of a series are given different weights according to their relative importance.

=∑wx∑ w

Illustration

Calculate the weighted mean of the following data –

Items 10 15 20 25 30 35weight 6 9 4 10 5 2

Ans. Calculation of Weighted Mean

Items (X) Weight (w) wx101520253035

6941052

601358025015070

∑w=36 ∑wx=745

25

Page 27: Introduction of Statistics

= ∑wx∑ w

= 745/36

= 20.69

Combined Mean

=N 11+N 22

N 1+N 2

Merits of Arithmetic Mean

Arithmetic mean is the most popularly used because of the following merits-

i. It is simple to understand and easy to calculate.ii. It is based on all the observations of the series. Therefore, it is the most representative

measure.iii. Its values is always definite. It is rigidly defined and not affected by personal bias.iv. It does not require any specific arrangement of data.v. It is capable of further algebraic treatment and we can use it for future mathematical

calculation in statistics.vi. It is least affected by fluctuations of sampling and ensures stability in calculation.vii. It is good base for comparison.viii. It is calculated value and not a position value like median and mode.

Demerits of Arithmetic Mean

i. It sometimes gives most absurd results which cannot possibly exist e.g., average children in a family 3.2 or 2.2. a child cannot be divided in fractions. It is not an actual item in the series and it is called a fiction average.

ii. It is affected by extreme items e.g., a General manager’s salary in a firm is ₹ 1,35,000 as compared to other employees say clerk ₹10,000 and peon ₹5,000. The average salary of the firm is ₹50,000. Average calculation is not a representative figure. It is affected by an extreme value of ₹1,35,000 paid to the General Manager.

iii. It cannot be calculated in the absence of one of the items. In open end distribution arithmetic mean is based on assumptions of the class interval.

iv. It can be a value that does not exist in the series at all e.g., 4,8 and 9 is 7.v. It gives more importance to the bigger items and less importance to the small items of the

series.vi. It cannot be decided just by observation. It needs mathematical calculations.

26

Page 28: Introduction of Statistics

Measures of Central Tendency – Median, ModeIn a statistical series, there is sometime a value which is centrally located or which occurs most frequently in the series, is called central value of the series.

Median

Median may be defined as the middle value in the data set when its elements are arranged in a sequential order, i.e., in either ascending or descending order of magnitude. Its value is so located in a distribution that it divides in half, with 50% items below it and 50% above it.

It concentrates on the middle or centre of a distribution. It that positional value of the variable which divides the distribution into two equal parts.

Computation

Median can be calculated in the following types of distributions –

A. Individual Series – To calculate median in an individual series, the following steps are needed – (i) Arrange the data in ascending or descending order.(ii) Apply the formula – Median (M) = Size of [N+1/2¿ thitem

Example – Find out median from the following data –

151, 140, 149, 142, 147, 144, 145

Ans. Arrange in ascending – 140, 142, 144, 145, 147, 149, 151

M= Size of [N+12 ]th item

M= 7+1/2 = 4

Hence, median is 145.

B. Discrete Series – In a discrete series, the value of the variable are given along with their frequencies. Steps are to be (i) Arrange the data in ascending or descending order.(ii) Denote the variables as X and frequency as f.(iii) Calculate cumulative frequency (cf)(iv) Find the median item as: M = Size of [N+1/2]th item

Example – Calculate median from the following series –

Marks 10 20 30 40 50 60 70 80No. of students 2 8 16 26 20 16 7 4

Ans.

Marks No. of Students cf1020

28

210

27

Page 29: Introduction of Statistics

304050607080

1626201674

26527288559

Total 99M= N+1/2 = 99+1/2

= 100/2 = 50

Median = 40.

C. Frequency Distribution (Continuous Series) – In case of frequency series, median cannot be located straight-forward. In this case, median lies in between lower and upper limit of class interval. Steps – a. Arrange the data in ascending or descending order.b. Calculate the cumulative frequencies c. Find the median item as M = size of [N/2]th itemd. By inspecting cumulative frequencies, find out cf which is either equal to or just greater than

this.e. Find the class corresponding to cf = N/2 or just greater than this. This class is called median

class.

M=l 1+

N2

−cf

f×h

Illustration

From the following figures, find out median:

Marks No. of Students Marks No. of Students 10-2020-3030-4040-50

15273552

50-6060-7070-8080-90

491731

Ans. Computation of median

Marks No. of students Cumulative Frequency

28

Page 30: Introduction of Statistics

10-2020-3030-4040-5050-6060-7070-8080-90

15213552491731

153671 Cf123 Median Class172189192193

Total N = ∑f=193M y= N/2 = 193/2 = 96.5th item

96.5th item lies in the group 40-50

L1=40, cf = 71, f=52, h = 10

By applying formula

M=l 1+

N2

−cf

f×h

¿40+ 96.5−7152

×10 = 44.90

Merits of Median (i) It is easy to calculate and understand.(ii) It is well defined as an ideal average should be and it indicates the value of the middle item

in the distribution.(iii) It can be determined graphically, mean cannot be graphically determined.(iv) It is proper average for qualitative data where items are not converted or measured but are

scored.(v) It is not affected by extreme value.

Demerits of Median

(i) For median data need to be arranged in ascending or descending order.(ii) It is not based on all the observations of the series.(iii) It cannot be given further algebraic treatment.(iv) It is affected by fluctuations of sampling.(v) It is not accurate when the data is not large.

Quartiles (Partition Values)

When we 1are required to divide a series into more than two parts, the dividing places are known as partition values. Suppose we have a piece of cloth 100 metres long and we have to cut it into 4 equal pieces, we will have to cut it at three places.

Quartiles are those values which divides the series into four equal parts.

29

M = 44.90

Page 31: Introduction of Statistics

Calculation of Quartiles

individual Series Discrete Series Frequency DistributionSteps Arrange the data in ascending order.Locate the item by finding out (N+1/4)th and 3(N+1/4)th items.

Arrangement of data in ascending order is necessary.Calculate less than cumulative frequencies.Locate the items (N+1/4)th and 3(N+1/4)th items.

Calculate less than cumulative frequencies. Locate the first quartile and third quartile group by cumulative frequency column where the size of respective (N/4)th and 3(N/4)th items.

Q 1=l1+

N4

−cf

f× h

Q3 =l 1+ 3 ( N+1 )−cff

× h

Mode

Mode is another important measure of central tendency, which is conceptually very useful. Mode is the value occurring most frequently in a set of observation and around which other items of the sets cluster most densely.

Mode = 3 Median – 2 Mean

Z = l1 + f 1 – f 0

2 f 1−f 0−f 2×h

Assignments for Measures of Central Tendency

1. Define median.2. When is an average known as positional average?3. Mention any two merits of median.4. Which graph is used to locate median graphically.5. Which average divides the series into two equal parts?6. Define mode.7. Give two merits of mode.8. State one merit of mode.9. Show the empirical relationship between mean, median and mode.10. Discuss merits and demerits of median.11. Discuss the steps involved for calculating mode by grouping method.

Measures of Dispersion

Average like mean, median and mode condense the series into a single figure. These measures of central tendencies indicate the central tendency of a frequency distribution in the form of an average. These averages tell us something about general level of magnitude of the distribution, but they fail to show

30

Page 32: Introduction of Statistics

anything further about the distribution. Measures central tendency are sometimes not fully representative of the data.

Dispersion is the extent to which values in a distribution differ from the average of the distribution. It indicates lack of uniformity in the size of items.

According to Conor, “Dispersion is a measure of the extent to which the individual items vary”.

Objective of Measure of Dispersion

(i) To test the Reliability of an Average – (ii) To serve as Basis for Control of Variability – (iii) To make Comparative study of two or more series –(iv) To serve as a Basis for further Statistical Analysis –

Methods of Measure of Dispersion

A. Dispersion from Spread of Values – (a) Range (b) Interquartile Range and Quartile Deviation B. Dispersion from Average – (a) Mean Deviation or Median Deviation (b) Standard Deviation C. Graphic Method – Lorenz Curve

Range

Range is the simplest measure of dispersion. It is the difference between the largest and the smallest value in the distribution.

R = L – S

Relative Range

Coefficient of Range = L– SL+S

Merits of Range

(i) It is simple to calculate and easy to understand the measure of dispersion.(ii) It gives broad pictures of the data quickly.(iii) It is rigidly defined.(iv) It depends on unit of measurement of the variable.

Demerits of Range

(i) It is not based on all the observation of series.(ii) It is very much affected by extreme items.(iii) It is influenced very much by fluctuations of sample.(iv) It cannot calculated in case of open end series.(v) It does not tell anything about the distribution of items in the series relative to a measure of

central tendency.

Interquartile Range and Quartile Deviation

31

Page 33: Introduction of Statistics

Range is a crude measure because it takes into account only two extreme values i.e., the largest and the smallest. The effect of extreme values on range can be avoided if we use the measure of interquartile range. Interquartile range refers to the difference between the values of two quartiles.

Interquartile Range=Q 3−Q1

Quartile Deviation (Semi-Interquartile Deviation)

It is known as the half of difference of upper quartile (Q3) and the lower quartile (Q1). It is half of the inter-quartile range.

QD=Q3 –Q 12

Coefficient of Quartile Deviation (CQD)

Quartile deviation is an absolute measure of dispersion. For comparative studies of variability of two distributions, we make use of relative measure, known as CQD.

CQD = Q3 –Q 1Q3+Q 1

Merits of Quartile Deviation

(i) It is quite easy to understand and calculate.(ii) It is only measure of dispersion which can be used to deal with a distribution having open-

end classes.(iii) In comparison to range, it is less affected by extreme values.

Demerits of Quartile Deviation

(i) It is not based on all the observations as it ignores the first 25% and the last 25% of the items. Thus, it cannot be regarded as a reliable measure of variability.

(ii) It is not capable of further algebraic treatment. It is in a way a positional average and does not study variation of the values of a variable from any average.

(iii) It is considerably affected by fluctuations in the sample. A change in the value of a single item, in many cases, affect its value considerably.

Mean Deviation

Mean deviation of a series is the arithmetic average of the deviation of various items from a measure of central tendency (mean, median or mode). Mean deviation is also known as ‘first moment of dispersion’.

Mean deviation is based on all the items of the series. Theoretically, mean deviation can be calculated by taking deviations from any of the three

averages. But in actual practices, mean deviation is calculated either from mean or from median.

While calculating deviations from the selected average, the signs (+ 0r -) of the deviations are ignored and the deviations are taken as positive.

32

Page 34: Introduction of Statistics

Coefficient of Mean Deviation (CMD)requencies

Mean deviation is an absolute measure of dispersion. In order to transform it into a relative measure, it is divided by the average, from which it has been calculated. It is then known as the coefficient of Mean Deviation.

CMD = MD❑

CMD from Median = MD M

M

Calculation of Mean Deviation and its Coefficient

Individual Series Discrete Series Continuous SeriesSteps –

a. Calculate the specific average (mean or median) from which mean deviation is to be calculated.

b. Obtain absolute deviation |d|of each observation from the specific average.

c. Absolute deviations are totaled up to find out ∑|d|

d. Apply the formula – e. MD from mean = ∑|

d|/NORMD from = ∑|d|/NWhere |d|= |X - M|

Steps – a. Calculate specific

average from which mean deviation is to be found.

b. Obtain the absolute deviations |d|of each observation from the specific average.

c. Multiply absolute deviation |d|with respective frequencies (f) and obtain the sum product to get ∑ f∨d∨ ¿

N¿

d. MD from mean = ∑f|d|/N

Steps –a. Calculate mean by

assumed mean method.b. Take deviations of mid-

points from mean and denote |d|.

c. Multiply these deviations by respective frequencies and find out f|d|.

d. M.D. = ∑ f∨d∨ ¿N

¿

1. Calculate mean deviation from mean and median from the following series –

X 12 10 15 19 21 16 18 9 25 11 156|d|mean 3.6 5.6 0.6 3.4 5.4 0.4 2.4 6.6 9.4 4.6 42|d|median 6.5 5.5 5.5 3.5 0.5 0.5 2.5 3.5 5.5 9.5 43

Mean deviation from mean

Mean = ∑ XN =

15610 = 15.6

Applying formula, we get

MD = ∑∨d∨ ¿N

¿

Mean deviation from median

M = size of (N+1)2

item

= 10+12 = 5.5

= 15.5

33

Page 35: Introduction of Statistics

¿ 4210

=4.2

CMD= MDMean

¿ 4.215.6

¿0.269

MD = ∑∨d∨ ¿N

¿

= 4310 = 4.3

CMD = MD

Median

= 4.310

= 0.277

Standard Deviation

The concept of standard deviation was introduced by Karl Pearson in 1893. It is most commonly used measure of dispersion. It satisfies most of the properties laid down for an ideal measure of dispersion.

Standard deviation is the square root of the arithmetic average of the squares of the deviations measured from mean.

Standard deviation is also known as root mean deviation because it is the square root of the mean of squared deviations from the arithmetic mean.

σ=√ ∑¿¿

Where x = X−¿

σ = √∑x2/N

Calculation of Standard Deviation

Individual Series Discrete Series Continuous SeriesA Actual Mean Method Steps –

i. Calculate the actual mean of the observations.

ii. Obtain deviation of the values from the mean i.e., calculate X−¿ . Denote these deviations by x.

iii. Square the deviations and obtain the total ∑x2.

iv. σ = √∑x2/N

Actual Mean Method Steps –

i. calculate actual

mean ( ) of the

series as = ∑fxN

ii. find out deviations of the items from the actual mean ( X

- ) iii. square the

deviation and multiply them by their respective frequencies (f) and obtain the total i.e., ∑fx2

Step Deviation MethodSteps –

i. take any mid-point (m) in the series as assumed mean (A)

ii. find out deviations (d) of the mid-point from the assumed mean

iii. divide these deviations by common factor (h) to obtain step deviation (d’)

iv. multiply step deviations by respective frequencies and obtain the total i.e.,

34

Page 36: Introduction of Statistics

σ =

∑fd’v. calculate the square

of the step deviations i.e., d’2

vi. multiply these squared step deviations by the respective frequencies and obtain the total to get ∑fd’2

σ =

Mean deviation Standard Deviation Absolute Measure Individual Observation/MD = ∑|X – X |/ NDiscrete and Continuous series MD = ∑f|d|N

Absolute Measure Individual Series σ = √ ∑ x2/Nx = X – X Direct Method

Calculate Standard Deviation of the following data

25, 50, 45, 30, 42, 36, 48, 34, 60

X x=X-X X2 d = X – A

d2

25 -19 361 -20 40050 6 36 5 2545 1 1 0 030 -6 36 -15 22542 -2 4 -3 9

35

Page 37: Introduction of Statistics

36 -8 64 -9 8148 4 16 3 934 -10 100 -11 121

60 16 256 15 225440 1710 -10 1720

= ∑X/N

= 440/10 = 44 = √1720/10 – (-10/10)2

σ = √∑x2/N ∑ = √171

= √1710/10 = 13.076

= √171= 13.076

Other Measure from Standard Deviation

Various measures are calculated from standard deviation. Some of the important measures are as under

(a) Coefficient of Standard Deviation – A relative measure of standard deviation is calculated to compare the variability in two or more than two series which is called ‘Coefficient of standard Deviation’. This relative measurement is calculated by dividing standard deviation by arithmetic mean of the data.CSD = SD/

(b) Coefficient of Variation – This relative measurement is developed by Karl Pearson and is most popularly used to measure relative variation of two or more than two series. It shows the relationship between standard deviation and arithmetic mean expressed in terms of percentage. This measure is used to compare uniformity, consistency and variability in two different series.

C .V .=σ⨱X100

(c) Variance – Variance is the square of standard deviation. Standard deviation and variance are measures of variability and they are closely related. The only difference between the two measurements is that the variance is the average squared deviation from mean and standard deviation is the square root of variance.Variance = σ2

Standard Deviation = √Variance

Mathematical Properties of Standard Deviation

1. The Sum of the Square of the Deviations from Arithmetic Mean is the Least, i.e., less than the sum of the squares of the deviations of the observations taken from any other value.

∑(X - )2 ˂ ∑(X – A)2

∑(X – M)˃∑(X - )

36

Page 38: Introduction of Statistics

2. Standard deviation and Normal Curve – In a normal or symmetrical distribution apart from mean, median and mode are identical, a large proportion of distributions are concentrated around mean. Following are a relationship – Mean ± 1 σ covers 68.27% of the total items.Mean ± 2 σ covers 95.45% of the total items.Mean ± 3 σ covers 99.73% of the total items.

Absolute Measure

Absolute measure is measured in the same units as the data. For instance, if the original data are in rupees, the absolute measure is also be in rupees, if the data are in kg, the measure will be in kg etc. For this reason absolute dispersion cannot be used to compare the scatter or variability in series where units of measure are different or when averages of one distribution than that in other distributions differ in size.

Relative Measure

For comparing two or more series where units of measure are different relative measures are used because they are calculated as the percentage or the coefficient of the absolute measure of dispersion.

Graphic Method (Lorenz Curve)

The graphic method of studying dispersion is known as the Lorenz Curve Method. It is named after Dr. Max O. Lorenz who used it for the first time to measure the distribution of wealth and income. Now it is also used for the study of the distribution of profits, wages, turnover etc. In this method of values the frequencies are cumulated and their percentage are calculated. These values are plotted on the graph and the curve thus obtained is called Lorenz Curve.

Steps –

(i) The size of items are made cumulative. Considering last cumulative total as equal to 100 difference cumulative total are converted into percentages.

(ii) In the same way frequencies are made cumulative. Considering the last cumulative frequency item as equal to 100, all the different cumulative frequencies are converted into percentages.

(iii) Cumulative percentages of these two variables should be plotted on X – axis and Y – axis.

Profit Cumulative profit

Cumulative Profit %

No. of companies

Cumulative number

Cumulative number %

6256084105 150170400

631911752804306001000

0.63.19.117.5284360100

611131415171014

6173044597686100

6173044597686100

37

Page 39: Introduction of Statistics

0 20 40 60 80 100 1200

20

40

60

80

100

120

% of number of Companies

% o

f Pro

fits

Assignment

Long Answer Questions

1. What is the meaning of dispersion and what are its objectives? Mention characteristics of a good measure of dispersion.

2. A measure of dispersion is a good supplement to the central value in understanding a frequency distribution. Comment.Ans. A central value summarizes the frequency distribution into single figure, which can be regarded as its representative. However, averages are not alone sufficient to describe the characteristics of a statistical data. In order to understand the frequency distribution fully, it is essential to study the variability of the observation.Measures of dispersion improves the understanding of a distribution. For example, per capita income gives only the average income. A measure of dispersion can tell about the income inequalities, thereby improving the understanding of the relative standards of living enjoyed by different strata of society.

3. Explain merits and demerits of quartile deviation.

Measures of Correlation

In the previous chapter, we have studied the statistical problems and distributions relating to one variable. We discussed various measures of central tendency and dispersion, which are confined to a single variable/ this kind of statistical analysis involving one variable is known as univariate distribution.

But we may come across a number of situations with distributions having two variables. For example, we may have data relating to income and expenditure, price and demand, height and weight etc. The distribution involving two variables is called bivariate distribution.

In a bivariate distribution, we may be interested to find if there is any relationship between the two variables under study. In day-to-day life, we observe that there exists certain relationship between two variables like between income and expenditure, price and demand and so on. Correlation is a statistical tool which studies the relationship between two variables.

Meaning of Correlation

38

Page 40: Introduction of Statistics

Correlation indicates the relationship between two variables of a series so that changes in the values of one variable are associated with changes in the values of the other variables.

Significance of correlation:

Correlation has immense utility in statistics.

i. It helps in determining the degree of relationship between variables. ii. We can estimate the value of one variable on the basis of the value of another variable

correlation serves the basis of regression. iii. Correlation is useful for economists. An economist specifies the relationship between

different variables like demand and supply, money supply and price level by way of the correlation.

Correlation and causation: It measures co-variation, not causation. It should never be interpreted as implying cause and effect relationship between two variables. The presence of correlation between two variables X and Y simply means that when one variable is found to change in one direction, the value of the other variable is found to change either in same direction or in the opposite direction.

Positive and Negative Correlation: - Correlation is classified into positive and negative correlation when two variables move in the same direction, i.e. if the value of Y increases ( or decreases) with an increase (or decrease) in the value of X, they are said to be positively related. On the other hand when two variables move in the opposite direction i.e. if the value of variable ‘X’ increase (or decrease) with the decrease or increase in the value of Y variable, they one said to be negatively correlated.

Linear and Non- linear correlation:- Correlation may be linear or non-linear . If the amount of change in one variable tends to have a constant relation with the amount of change in the other variable then the correlation is said to be liner. It is represented by a straight line. On the otherhand if the amount of change in one variable does not have constant proportional relationship to the amount of change in the other variable, then the correlation is said to be non-linear or curvi-linear.

Simple , multiple and partial correlation :- Correlation may also be simple, multiple and partial correlation. When two variables are studied to determine correlation, it is called simple correlation on the other hand when more than two variables are studied to determine the correlation it is called multiple correlation. When correlation of only two variables is studied keeping other variables constant, it is called partial correlation.

Methods of studying correlation :- The correlation between the two variables can be determined by the following three methods:-

(a) Scatter diagram

(b) Karl Pearson’s method of correlation coefficient

(c) Spearman’s method of Rank correlation.

Scatter Diagram: It is a graphic (or visual) method of studying correlation. To construct a scatter diagram, x. variable is taken on X axis and Y Variable is taken on Y-axis. The cluster of points so plotted is referred to as a scatter diagram. In a scatter diagram, the degree of closeness of scatter points and their overall direction gives us an idea of the nature of the relationship:-

39

Page 41: Introduction of Statistics

(i) If the dots move from left to the right upwards, correlation is said to be positive where as the movements of dots from left to right downward indicates negative correlation.

(ii) Dots in a straight line indicate perfect correlation. (iii) Scattered dots indicate no-correlation.

Perfect Positive correlation

Perfect Negative correlation

No correlation

Karl pearson’s coefficient of correlation:-

Karl pearson’s coefficient of correlation is an important and widely used method of studying correlation. Karl pearson has measured the degree of relationship between the two variables with help of correlation coefficient. Coefficient of correlation measures the degree of relationship between the two variables.

Computation of Karl pearsons coefficient of correlation :- The various formulae used to calculate coefficient of correlation (r) are : -

r = ∑ xy

√ x 2× y2

Some of the important properties of karl- pearson’s coefficient of correlation are : -

(i) The correlation coefficient is independent of the units of measurement of the variables:- (ii) The value of co-relation coefficient(r) lies between +1 and -1. (iii) The correlation coefficient is independent of the choice of both origin and scale of

observations. (iv) The correlation coefficient of the variables x and y is symmetric, i.e; xy yx r r .

Illustration 1. Calculate coefficient of correlation , given the following data –

Age of Husband (X) 23 27 28 29 30 31 33 35 36Age of Wife (Y) 18 23 22 27 29 29 27 28 29

Solution –

X dx=X-A dx2 Y dy=Y-A dy2 dxdy23 -7 49 18 -9 81 6327 -3 9 20 -7 49 2128 -2 4 22 -5 25 1029 -1 1 27 = A 0 0 030=A 0 0 29 2 2 031 1 1 27 0 0 033 3 9 29 2 4 635 5 25 28 1 1 536 6 36 29 2 4 12

∑dx=¿2 ∑dx2 = 134 ∑dy=-14 ∑dy2 =166 ∑dxdy= 117

40

Page 42: Introduction of Statistics

r=

∑dxdy− (∑dx .∑dy )N

¿

√∑ dx2−(∑dx )2

Nx √∑dy 2−

(∑dy )2N

¿¿

¿

r=

117−2×(−14)/9

√134−49 ×√166−1969¿

¿

= 117+3.11

√133.55×144.23

= 120.11138.78

r= 0.86

Advantages of karl Pearson’s method:-

Karl person’s method assumes a linear relationship between two variables x and y. If r = 0, it simply means there is no linear correlation between x and y. There may exist quadratic or cubic relationship between x and y. The most important advantage of this method is that it gives an idea about co-variation of the values of two variables and also indicates the direction of such relationships.

Rank Correlation :- Charles Edward spearman evolved another method of finding out correlation between different qualitative attributes of a variable. This is known, as rank correlation coefficient. When a group of individuals are arranged according to their degree of possession of a character (say, beauty, intelligence etc), they are said to be ranked.

Spearman’s formula for ranks correlation coefficient in as follows:-

rk = 1 - 6∑d 2

N 3−N¿

¿

Illustration 2. Calculate coefficient of correlation (spearman rank) from the following data –

Economics Marks 77 54 27 52 14 35 90 25 56 60English Marks 36 58 60 46 50 40 35 56 44 42

Solution –

X R1 Y R2 D = R1 – R2 D 2

77 2 36 9 -7 4954 5 58 2 3 927 8 60 1 7 4952 6 46 5 1 1

41

Page 43: Introduction of Statistics

14 10 50 4 6 3635 7 40 8 -1 190 1 35 10 -9 8125 9 56 3 6 3656 4 44 6 -2 460 3 42 7 -4 16

282

r= 1 - 6∑ D 2

N ❑3−N

= 1 - 282

10❑3−10

= 1 - 282990 = 1 – 0.28 = 0.72

r= 0.72

Questions :-

(1) What is correlation?

(2) When are the two variables said to be in perfect correlation?

(3) Define karl- Pearson’s coefficient of correlation

(4) Mention any two properties of karl Pearson’s coefficient of correlation.

(5) Define covariance?

(6) Can simple correlation coefficient measure any type of relationship?

(7) What is the difference between liner and non-liner correlation?

(8) What is scatter Diagram method and how is it useful in the study of correlation?

(9) State the merits of Spear Man’s Rank - Correlation?

(10) Name various methods of studying correlation. Describe any one.

INDEX NUMBERS

Index numbers are devices which measure the change in the level of a phenomenon with respect to time, geographical location or some other characteristic. An index number is a statistical device for measuring changes in the magnitude of a group of related variables. It is a measure of the average change in a group of related variables over two different situations.

42

Page 44: Introduction of Statistics

Meaning: Index numbers is a statistical tool for measuring relative change in a group of related variables over two or more different times.

“Index numbers are devices for measuring differences in the magnitude of a group of related variables”. – Croxton and Cowden

Features of an Index Number

a. They are expressed in percentages.

b. They are special types of averages.

c. They measure the effect of change over a period of time.

Problems in construction of Index Numbers

a. Defining the purpose of index numbers

b. Selection of items

c. Selection of base period

d. Selection of prices

e. Selection of weights

f. Choice of an average

g. Choice of the formulae

Price index are of two types

a. Simple Index Number

b. Weighted price Index numbers

Construction of simple Index Numbers:- There are two methods

a. Simple aggregate Method

P01 =∑ P1∑P0 × 100

b. Simple Average of price relative method P01 = ∑ (P1p0 × 100) /N

6. Weighted Index Numbers

There are two methods:-

a. Weighted Aggregate method:- In this method commodities are assigned weights on the basis of quantities purchased.

a) Laspeyre’s Method

Laspeyres in 1871 gave an weighted aggregated index, in which weights are represented by the quantities of the commodities in the base year.

43

Page 45: Introduction of Statistics

P01=∑ p1q 0∑ p0q0

x 100

steps –

The various steps involved are –

(i) Multiply the current year prices (P1) by base year quantity (q0) and total all such products to get ∑P1q0.

(ii) Similarly, multiply the base year prices (P0) by base year quantity (q0) and obtain the total to get ∑P1q0.

(iii) Divide ∑P1q0 by ∑p0q0 and multiply the quotient by 100. This will be the index number of the current year.

Paasche’s Method

The German statistician Paasche in 1874 constructed an index number in which weights are determined by quantities in the given year.

P01 = ∑ p1q1∑ p0 q1x100

Fisher’s Method

P01 =√ ∑ p1q0∑ p0q0

x ∑ p1q1∑ p0 q1 x 100

Why Fisher’s method is an ideal method?1. The formula is based on geometric mean which is considered to be the best average

for constructing index numbers.2. It considers both base year and current year quantities as weights. So, it avoids the

bias associated with the Laspeyre’s and Paasche’s indexes. 3. It satisfies time reversal test and factor reversal test.

Question 1. Calculate Laspeyre’s, Paasche’s and Fisher’s Index numbers from the following data:

Commodity Base Year Current Year Price (₹) Quantity Price Quantity (p0) (q0) (p1) (q1) A 10 30 12 50B 8 15 10 25C 6 20 6 30D 4 10 6 20

Solution - 690/

Commodity P0 Q0 P1 Q1 P0Q0 P0Q1 P1Q0 P1Q1A 10 30 12 50 300 500 360 600B 8 15 10 25 120 200 150 250C 6 20 6 30 120 180 120 180D 4 10 6 20 40 80 60 120

44

Page 46: Introduction of Statistics

580 960 690 1150

Laspeyre’s Index Number(P01) = ∑ p1q 0∑ p0 q0

x100

= 690580

x100 = 118.965

Paasche’s Index Number (P01) = ∑ p1q1∑ p0 q1x100

= 1150960 x100

= 119.79

Fisher’s Ideal Index Number P01 ¿√ ∑ p1q0∑ p0q0

x ∑ p1q1∑ p0q1 x 100

= √

690580

x 1150/960

❑ x100

= 119.376

b. Weighted Average of Price Relative Method:- Under this method commodities are assigned weight or the basis of base’s year value (W= P0 Q 0 ) or fixed weights (W) are used. P01 = ∑ RW /∑WWhere R = P1 × 100 P0 W = value in the base year (P0 Q 0) or fixed weights

Types of Index Numbers

(i) Consumer Price Index(CPI) – It reflects the average increase in the cost of the commodities consumed by a class of class of people so that they can maintain the same standard of living in the current year as in the base year.

They are designed to measure effects of change in prices of a basket of goods and services on purchasing power of a particular section of the society during any given (current) period with respect to some fixed (base) period.

CPI is also known as –(a) Cost of living index numbers(b) Retail price index numbers(c) Price of living index numbers

Methods of Constructing CPI

(a) Aggregate Expenditure Method – This method is similar to the Laspeyre’s method of constructing weighted index.

CPI = ∑ p1q 0∑ p0 q0x100

45

Page 47: Introduction of Statistics

(b) Family Budget Method – In this method, the family budgets of a large number of people, for whom the index is meant, are carefully studied. Then, the aggregate expenditure of an average family on various commodities is estimated. These values constitute the weights.

CPI = ∑RW∑W

Question 2. An enquiry into the budgets of the middle class families in a certain city gave following information. What is the cost of living index of 2015 as compared with 2010. Calculate – (i) Family Budget Method and (ii) Aggregate Expenditure Method

Expenses on items Food (35%) Fuel (10%) Clothing (20%) Rent (15%) Misc. (20%)Price ₹ in 201 1500 250 750 300 400Price ₹ in 2010 1400 200 500 200 250

Solution – CPI (Family Budget Method)

Items Weights (%) (W)

Price in 2010 (P0)

Price in 2015 (P1)

Relative Price

® = p1p0

x100

Weighted Relative (RW)

Food 35 1400 1500 107.14 3749.9Fuel 10 200 250 125 1250Clothing 20 500 750 150 3000Rent 15 200 300 150 2250Misc. 20 250 400 160 3200

100 13449.9 Cost of living index for 2015

CPI = ∑RW∑W

= 13449.9100 = 134.499

Items Weights (q0) P0 P1 P0q0 P1q0Food 35 1400 1500 49000 52500Fuel 10 200 250 2000 2500Clothing 20 500 750 10000 15000Rent 15 200 300 3000 4500Misc. 20 250 400 5000 8000

69000 82500Cost of living index by Aggregative Expenditure Method

CPI = ∑ p1q 0∑ p0 q0x100

= 8250069000

×100

46

Page 48: Introduction of Statistics

= 119.565

Uses of Consumer Price Index:- (CPI)

a. It is used in calculating purchasing power of money

b. It is used for grant of Dearness Allowance.

c. It is used by government for framing wage policy, price policy etc.

d. CPI is used as price deflator of income

e. CPI is used as indicator of price movements in retail market.

(ii) Whole Sale Price Index(WPI) –

Wholesale Price Index (WPI):-

a. It measures the relative change in the price of commodities traded in wholesale market.

b. It indicates the change in the general price level.

c. It does not include services

Uses of WPI

a. Basis of Dearness Allowance

b. Indicator of changes in economy

c. Measures the rate of inflation

(iii) Index number of Industrial Production (IIP) –

Index Number of Industrial Production (IIP)

It indicates the changes in level of Industrial production or a percentage change in physical volume of output of commodities in following industries

a. Mining

b. Quarrying

c. Manufacturing

d. Electricity etc.,

IIP= ∑ (q1 /q0) X100

W = relative importance of different output.

q0 = Base year quantity.

q1= Current Year Quantity.

Uses of Index Numbers.

47

Page 49: Introduction of Statistics

a. Helps us to measure changes in price level

b. Help us to know changes in cost of living

c. Help government in adjustment of salaries and allowances

d. Useful to Business Community

e. Information to Politicians

f. Information regarding foreign trade

(iv) SENSEX

SENSEX

SENSEX is the short form of Stock Exchange Sensitive Index with 1978-79 as base. It is a useful guide for the investors in the stock market. It deals with 30 stocks represented by 13 sectors of the economy.

Inflation and Index Numbers

Inflation refers to rise in the general price level in a country over a fairly long period of time. Often, inflation is measured in terms of WPI. A consistent rise in the wholesale price index over time implies a situation of inflation.

Rate of Inflation = A 2−A1

A1x100

Where A1 = whole sale price index for week 1

A2 = whole sale price index for week 2

Questions:-

1. What is an Index Number?

2. What is a Base Year?

3. What is SENSEX?

4. Mention any three problems in the construction of Index Numbers

5. Construct Cost of Living Index Number from the following data

Commodities Price in 2010 Quantity in 2010 Price in 2015ABCDE

253612628

1673.52.54

3548161028

48

Page 50: Introduction of Statistics

Revision Questions

Multiple Choice Questions (MCQs)

1. The Paasche index number is based on –(a) Base year quantities(b) Current year quantities (c) Average of current and base year (d) None of these

2. Index number for the base period is always taken as –(a) 100 (b) 1 (c) 50 (d) 200

3. Fisher’s Ideal Index is the –(a) Mean of Lespeyre’s and Paasche’s indices (b) Median of Lespeyre’s and Paasche’s indices(c) Geometric mean of Lespeyre’s and Paasche’s indices(d) None of these

4. We use price index numbers –(a) To measure and compare(b) To compare prices (c) To measure prices (d) None of these

Very short answer type questions

1. Define index number.2. State any one feature of index number.3. Define base year.4. What is meant by relative price?5. State any one use of index number.

Short Answer Type Questions

49


Recommended