+ All Categories
Home > Documents > 3. MB0040 Mba1 Stats

3. MB0040 Mba1 Stats

Date post: 06-Apr-2018
Category:
Upload: satish-patil
View: 225 times
Download: 0 times
Share this document with a friend
23
 Fall 2010 Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736 CANDIDATE NAME: Satish Patil ROLL NUMBER: 521053391 LEARNING CENTER: 01736 COURSE: Master of Business Administration SEMISTER: I SUBJECT NAME: MB0040 – STATISTICS FOR MANAGEMENT  ASSIGNMENT NO: Set-1 DATE OF SUBMISSION AT THE LEARNING CENTRE: 10 Dec 2010 FACULTY SIGNATURE:
Transcript
Page 1: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 1/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

CANDIDATE NAME: Satish Patil 

ROLL NUMBER: 521053391 

LEARNING CENTER: 01736

COURSE: Master of Business Administration

SEMISTER: I 

SUBJECT NAME: MB0040 – STATISTICS FOR MANAGEMENT

 ASSIGNMENT NO: Set-1 

DATE OF SUBMISSION AT THE LEARNINGCENTRE: 10 Dec 2010

FACULTY SIGNATURE:

Page 2: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 2/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

MBA SEMESTER 1 MB0040 – STATISTICS FOR MANAGEMENT- 4 Credits

(Book ID: B1129) 

Assignment Set- 1 (60 Marks) 

Note: Each question carries 10 Marks. Answer all the questions

1. Why it is necessary to summarise data? Explain the approaches available to

summarize the data distributions?

 Answer:

Graphical representation is a good way to represent summarized data. However, graphs

Provide us only an overview and thus may not be used for further analysis. Hence, we use

Summary statistics like computing averages. to analyses the data. Mass data, which is

Collected, classified, tabulated and presented systematically, is analyzed further to bring its Size

to a single representative figure. This single figure is the measure which can be found at Central

part of the range of all values. It is the one which represents the entire data set. Hence, this is

called the measure of central tendency.

In other words, the tendency of data to cluster around a figure which is in central location is

known as central tendency. Measure of central tendency or average of first order describes the

concentration of large numbers around a particular value. It is a single value which Represents allunits.

Statistical Averages: The commonly used statistical averages are arithmetic mean, Geometric

mean, harmonic mean.

 Arithmetic mean: is defined as the sum of all values divided by number of values and is

represented by X. 

Before we study how to compute arithmetic mean, we have to be familiar with the terms such as

discrete data, frequency and frequency distribution, which are used in this unit.

If the number of values is finite, then the data is said to be discrete data. The number of 

occurrences of each value of the data set is called frequency of that value. A systematic

Presentation of the values taken by variable together with corresponding frequencies is called a

frequency distribution of the variable.

Median: Median of a set of values is the value which is the middle most value when they are

arranged in the ascending order of magnitude. Median is denoted by M.

Page 3: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 3/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Mode: Mode is the value which has the highest frequency and is denoted by Z. 

Modal value is most useful for business people. For example, shoe and readymade garment

manufacturers will like to know the modal size of the people to plan their operations. For

discrete data with or without frequency, it is that value corresponding to highest frequency.

 Appropriate Situations for the use of Various Averages

1. Arithmetic mean is used when:

a. In depth study of the variable is needed

b. The variable is continuous and additive in nature

c. The data are in the interval or ratio scale

d. When the distribution is symmetrical

2. Median is used when:

a. The variable is discreteb. There exist abnormal values

c. The distribution is skewed

d. The extreme values are missing e. The characteristics studied are qualitativef. The data are on the ordinal scale

3. Mode is used when:a. The variable is discreteb. There exist abnormal valuesc. The distribution is skewedd. The extreme values are missing e. The characteristics studied are qualitative

4. Geometric mean is used when:a. The rate of growth, ratios and percentages are to be studiedb. The variable is of multiplicative nature5. Harmonic mean is used when:a. The study is related to speed, timeb. Average of rates which produce equal effects has to be found

Positional AveragesMedian is the mid-value of series of data. It divides the distribution into two equal portions.Similarly, we can divide a given distribution into four, ten or hundred or any other number of equal portions.

Page 4: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 4/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

2. Explain the purpose of tabular presentation of statistical data. Draft a form of 

tabulation to show the distribution of population according to i) Community by

age, ii) Literacy , iii) sex , and iv) marital status.

 Answer:

 Tabulation is an orderly arrangement of data in columns and rows systematically in a

tabular form. It is the logical listing of related quantitative data in vertical columns and

horizontal rows. The presentation of data in tables should be simple, systematic and

unambiguous.

 The purpose of tabular presentation of statistical data is to:

  Simplify complex data: Tabulation simplifies the complex data by presenting 

them systematically in columns and rows in a condensed form. It avoids all theunnecessary data that is found in a narrative form.

  Highlight important characteristics: It also helps to highlight the importantcharacteristics of the data. As it avoids all the unnecessary data that is usually found in a narrative form.

  Present data in minimum space: Tabulation achieves economy in using thespace for presenting the data. The textual matter is presented neatly in a shortform without sacrificing utility of the data.

  Facilitate comparison: The data presented in a tabular form is helpful for acomparative study. The relationship among the various items can be easily understood.

  Bring out trends and tendencies: Tabulation depicts the data and theirsignificance at first in the form of figures, which cannot be understood when thesame data are in a narrative form.

  Facilitate further analysis: The Tabulation is analytical in nature and hence ithelps in further analysis.

Marital

StatusSex Educated

Non-

Educated

 AgeBelow 20

 Yrs20-40 Yrs

 Above

40 Yrs

Below 

20 Yrs

20-40

 Yrs

 Above

40 Yrs

MarriedMale

Female

Page 5: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 5/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Unmarried Male

Female

3. Give a brief note of the measures of central tendency together with their merits

& Demerits. Which is the best measure of central tendency and why?

 Answer:

Condensation of data is necessary for a proper statistical analysis. A large number of 

big numbers are not only confusing to mind but also difficult to analyze. After a

thorough scrutiny of collected data, classification which is a process of arranging data

into different homogenous classes according to resemblances and similarities is carried

out first. Then of course tabulation of data is resorted to. The classification and

tabulation of the collected data besides removing the complexity render condensation

and comparison. An average is defined as a value which should represent the whole

mass of data. It is a typical or central value summarizing the whole data. It is also called

a measure of central tendency for the reason that the individual values in the data show 

some tendency to centre about this average. It will be located in between the minimum

and the maximum of the values in the data.

 There are five types of average which are:

1. Arithmetic Mean

2. Median

3. Mode

4. Geometric and Harmonic Mean

5. Arithmetic Mean

 The Arithmetic mean or simply the mean is the best known easily understood and

most frequently used average in any statistical analysis. It is defined as the sum of all

the values in the data.

Median: Median is another widely known and frequently used average. It is defined asthe most central or the middle most value of the data given in the form of an array. By 

an array, we mean an arrangement of the data either in ascending order or descending 

order of magnitude. In the case of ungrouped data one has to form an array first and

then locate the middle most value which is the median. For ungrouped data the median

is fixed by using,

Median = [n+1/2] the value in the array.

Page 6: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 6/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Mode: The word mode seems to have been derived French 'a la mode' which means

'that which is in fashion'. It is defined as the value in the data which occurs most

frequently. In other words, it is the most frequently occurring value in the data. For

ungrouped data we form the array and then fix the mode as the value which occurs

most frequently. If all the values are distinct from each other, mode cannot be fixed.

For a frequency distribution with just one highest frequency such data are called uni-

modal or two highest frequencies [such data are called bimodal], mode is found by 

using the formula,

Mode=l+cf2/f1+f2

 Where l is the lower limit of the model class, c is its class interval f1 is the frequency 

preceding the highest frequency and f2 is the frequency succeeding the highest

frequency.

Relative merits and demerits of Mean, Median and Mode

Mean: The mean is the most commonly and frequently used average. It is a simple

average, understandable even to a layman. It is based on all the values in a given data.

It is easy to calculate and is basic to the calculation of further statistical measures of 

dispersion, correlation etc. Of all the averages, it is the most stable one. However it has

some demerits. It gives undue weightages to extreme value. In other words it is greatly 

influenced by extreme values. Moreover; it cannot be calculated for data with open -

ended classes at the extreme. It cannot be fixed graphically unlike the median or the

mode. It is the most useful average of analysis when the analysis is made with full

reference to the nature of individual values of the data. In spite of a few shortcomings;

it is the most satisfactory average.

Median: The median is another well-known and widely used average. It is well-defined

formula and is easily understood. It is advantageously used as a representative value of 

such factors or qualities which cannot be measured. Unlike the mean, median can be

located graphically. It is also possible to find the median for data with open ended

classes at the extreme. It is amenable for further algebraic processes. However, it is an

average, not based on all the values of the given data. It is not as stable as the mean. It

has only a limited use in practice.

Mode: It is a useful measure of central tendency, as a representative of the majority of 

 values in the data. It is a practical average, easily understood by even laymen. Its

calculations are not difficult. It can be ascertained even for data with open-ended

Page 7: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 7/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

classes at the extreme. It can be located by graphical means using a frequency curve.

 The mode is not based on all the values in the data. It becomes less useful when the

data distribution is not uni-model. Of all the averages, it is the most unstable average.

4. Machines are used to pack sugar into packets supposedly containing 1.20 kg

each. On testing a large number of packets over a long period of time, it was

found that the mean weight of the packets was 1.24 kg and the standard

deviation was 0.04 Kg. A particular machine is selected to check the total

weight of each of the 25 packets filled consecutively by the machine. Calculate

the limits within which the weight of the packets should lie assuming that themachine is not been classified as faulty.

 Answer:

Mean weight of the packets = 1.24 kg 

•   Standard Deviation, SD = 0.04kg 

•     Variance = 0.04^2 = 0.0016

•   Standard Error, SE = 0.04/sqrt (25)

= 0.04/5 = 0.008

•   Considering 99.7% confidence level

 The means will lie between (1.2+3SE) and (1.2-3SE)

•   Upper limit is 1.224kg 

•   Lower Limit is 1.176kg 

5. A packaging device is set to fill detergent power packets with a mean weight of 

5 Kg. The standard deviation is known to be 0.01 Kg. These are known to drift

upwards over a period of time due to machine fault, which is not tolerable. Arandom sample of 100 packets is taken and weighed. This sample has a mean

weight of 5.03 Kg and a standard deviation of 0.21 Kg. Can we calculate that

the mean weight produced by the machine has increased? Use 5% level of 

significance.

 Answer:

Mean weight of packages, X1 = 5kg 

Page 8: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 8/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

SD1 = 0.01kg 

Sample size, N= 100

Sample mean weight, X2 = 5.03kg 

SD2 = 0.21kg 

Using 95% confidence level

Z = 1.96

1.96 = [(X-X2)/SD1]/sqrt N

1.96 = [(X-5.03)/0.01]/sqrt (100)

Mean Weight X = 5.226kgs

6. Find the probability that at most 5 defective bolts will be found in a box of 200

bolts if it is known that 2 per cent of such bolts are expected to be defective

.(you may take the distribution to be Poisson; e-4= 0.0183).

 Answer:

Poisson distribution

 A Poisson random variable is the number of successes that result from a Poisson

experiment.

 The probability distribution of a Poisson random variable is called a Poisson

distribution.

Given the mean number of successes (µ) that occur in a specified region, we can

compute the Poisson probability based on the following formula: 

Poisson Formula. Suppose we conduct a Poisson experiment, in which the average

number of successes within a given region is µ. Then, the Poisson probability is:

P( x ; µ) = (e-µ) (µx) / x!

 Where x is the actual number of successes that result from the experiment and e is

approximately equal to 2.71828.

 The Poisson distribution has the following properties:

 The mean of the distribution is equal to µ

 The variance is also equal to µ

M = 5

PX = 0.0183*4/5

=0.01464

 Thus, the probability that at most 5 defective bolts will be found in a box of 200 bolts

is 0.01464

Page 9: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 9/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

CANDIDATE NAME: Satish Patil 

ROLL NUMBER: 521053391 

LEARNING CENTER: 01736

COURSE: Master of Business Administration

SEMISTER: I 

SUBJECT NAME: MB0040 – STATISTICS FOR MANAGEMENT

 ASSIGNMENT NO: Set-2 

DATE OF SUBMISSION AT THE LEARNINGCENTRE: 10 Dec 2010

FACULTY SIGNATURE: 

Page 10: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 10/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

MBA SEMESTER 1

MB0040 – STATISTICS FOR MANAGEMENT- 4 Credits

(Book ID: B1129)

Assignment Set- 2 (60 Marks)

Note: Each question carries 10 Marks. Answer all the questions

1. What do you mean by Statistical Survey? Differentiate between

“Questionnaire” and “Schedule”.

 Answer:

 A statistical survey is a scientific process of collection and analysis of numerical data. Statisticalsurvey are use to collect numerical information about units in population. Surveys involve asking 

questions to individuals. Surveys of human populations are common in government, health,

social science and marketing sectors.

Stages of Statistical Survey-

Statistical surveys are categorized into two stages- Planning and

Execution.

 

1)  Planning a Statistical Survey-  The relevance and accuracy of data obtained in a survey 

depends upon the care exercised in planning. A properly planned investigation can lead

to best result with least cost and time.

 A.  Nature of the problem to investigate should be clearly defined in an unambiguous

manner.

B.  Objective of investigation should be stated at the outset objectives could be to:

➢Obtain certain estimates

➢ Establish a theory 

➢ Verify an existing statement

Page 11: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 11/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

➢ Find relationship between characteristics

 A . The scope of investigation has to be made clear. The scope of investigation refers to the area

to be covered, identification of units to be studied, nature of characteristics to be observed,

accuracy of measurements, analytical methods, time, cost and other resources required.

B. Whether to use data collected from primary or secondary source should be determination in

advance.

C. The organization of investigation I the final step in the process. It encompasses the

determination of the number of investigators required, their training, and supervision work 

needed

funds required.

Execution of Statistical Survey- Control methods should be adopted at every stage of carrying out the investigation to check the accuracy, coverage, methods of measurements,analysis and interpretation. The collected data should be edited, classified, tabulated andpresented in diagrams and graphs. The data should be carefully and systematically analyzed andinterpreted.

Differentiate between “Questionnaire” and “Schedule:

 A questionnaire is a list of some pre designed and systematically arranged questions pertaining to

the subject of enquiry. It is meant for the whole population group, from whom the data are

collected. Since a questionnaire is intended for the common men, it is necessary that a

questionnaire is made with due care so that necessary data may be easily collected. On the other

hand, a schedule is only a list of items, on which the data collector gathers data. So it is meant forthe investigator rather than the one who answers. Therefore unlike a questionnaire, a schedule

need not be a complete one, even for a schedule the questions may be written in incomplete

sentences, since it solely depends upon the enquirer how he is asking the questions to the people.

2. The table shows the data of Expenditure of a family on food, clothing,

education, rent and other items.

Items Expenditure

Food 4300

Clothing 1200

Education 700

Page 12: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 12/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Rent 2000

Others 600

Depict the data shown in the table using Pie chart.

 Answer:

3. Average weight of 100 screws in box ‘A’ is 10.4 gms. It is mixed with 150

screws of box ‘B’. Average weight of mixed screws is 10.9 gms. Find the

average weight of screws of box ‘B’.

 Answer:

 Avg weight of BOX A = 10.4 gms

 Avg weight of Mixed BOX = 10.9 gms

 Avg Weight of BOX B = 0.5 gms

4. (a) Discuss the rules of “Probability”.

(b) What is meant by “Conditional Probability”?

 Answer:

Page 13: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 13/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Managers very often come across with situations where they have to take decisions aboutimplementing either course of action A or course of action B or course of action C. Sometimes,they have to take decisions regarding the implementation of both A and B.

•   Addition rule

 The addition rule of probability states that:

i) If ‘A’ and ‘B’ are any two events then the probability of the occurrence of either ‘A’ or ‘B’ is

given by:

ii) If ‘A’ and ‘B’ are two mutually exclusive events then the probability of occurrence of either

 A or B is given by:

iii) If A, B and C are any three events then the probability of occurrence of either A or B or C

is given by:

In terms of Venn diagram, from the figure 5.4, we can calculate the probability of occurrence of either event ‘A’ or event ‘B’, given that event ‘A’ and event ‘B’ are dependent events. From thefigure 5.5, we can calculate the probability of occurrence of either ‘A’ or ‘B’, given that, events‘A’ and ‘B’ are independent events. From the figure 5.6, we can calculate the probability of occurrence of either ‘A’ or ‘B’ or ‘C’, given that, events ‘A’, ‘B’ and ‘C’ are dependent events.

iv) If A1, A2, A3………, An are ‘n’ mutually exclusive and exhaustive events then the

probability of occurrence of at least one of them is given by:

Page 14: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 14/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

•  Multiplication rule

If ‘A’ and ‘B’ are two independent events then the probability of occurrence of ‘A’ and ‘B’ isgiven by:

Conditional Probability 

Sometimes we wish to know the probability that the price of a particular petroleum product willrise, given that the finance minister has increased the petrol price. Such probabilities are knownas conditional probabilities.

 Thus the conditional probability of occurrence of an event ‘A’ given that the event ‘B’ has

already occurred is denoted by P (A / B). Here, ‘A’ and ‘B’ are dependent events. Therefore, wehave the following rules.

If ‘A’ and ‘B’ are dependent events, then the probability of occurrence of ‘A and B’ is given by:

It follows that:

For any bivariate distribution, there exist two marginal distributions and‘m + n’ conditional distributions, where ‘m’ and ‘n’ are the number of classifications/characteristics studied on two variables.

Page 15: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 15/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

5. (a) What is meant by “Hypothesis Testing”? Give Examples

(b) Differentiate between “Type-I” and “Type-II” Errors

 Answer:

Hypothesis Testing: The Basics

Say I hand you a coin. How would you tell if it’s fair? If you flipped it 100 times and it came up

heads 51 times, what would you say? What if it came up heads 5 times, instead?

Page 16: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 16/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

In the first case you’d be inclined to say the coin was fair and in the second case you’d be

inclined to say it was biased towards tails. How certain are you? Or, even more specifically, how 

likely is it actually that the coin is fair in each case?

Hypothesis Testing

Questions like the ones above fall into a domain called hypothesis testing . Hypothesis testing is a

 way of systematically quantifying how certain you are of the result of a statistical experiment.

In the coin example the “experiment” was flipping the coin 100 times. There are two questions

you can ask. One, assuming the coin was fair, how likely is it that you’d observe the results we

did? Two, what is the likelihood that the coin is fair given the results you observed?

Of course, an experiment can be much more complex than coin flipping. Any situation whereyou’re taking a random sample of a population and measuring something about it is an

experiment, and for our purposes this includes A/B testing.

Let’s focus on the coin flip example understand the basics.

 The Null Hypothesis

 The most common type of hypothesis testing involves a null hypothesis . The null hypothesis,

denoted H0, is a statement about the world which can plausibly account for the data you observe.

Don’t read anything into the fact that it’s called the “null” hypothesis — it’s just the hypothesis

 we’re trying to test.

For example, “the coin is fair” is an example of a null hypothesis, as is “the coin is biased.” The

important part is that the null hypothesis be able to be expressed in simple, mathematical terms.

 We’ll see how to express these statements mathematically in just a bit.

 The main goal of hypothesis testing is to tell us whether we have enough evidence to reject the

null hypothesis. In our case we want to know whether the coin is biased or not, so our null

hypothesis should be “the coin is fair.” If we get enough evidence that contradicts this

hypothesis, say, by flipping it 100 times and having it come up heads only once, then we cansafely reject it.

 All of this is perfectly quantifiable, of course. What constitutes “enough” and “safely” are all a

matter of statistics.

 The Statistics, Intuitively

Page 17: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 17/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

So, we have a coin. Our null hypothesis is that this coin is fair. We flip it 100 times and it comes

up heads 51 times. Do we know whether the coin is biased or not?

Our gut might say the coin is fair, or at least probably fair, but we can’t say for sure. Theexpected number of heads is 50 and 51 is quite close. But what if we flipped the coin 100,000

times and it came up heads 51,000 times? We see 51% heads both times, but in the second

instance the coin is more likely to be biased.

Lack of evidence to the contrary is not evidence that the null hypothesis is true. Rather, it means

that we don’t have sufficient evidence to conclude that the null hypothesis is false. The coin

might actually have a 51% bias towards heads, after all.

If instead we saw 1 head for 100 flips that would be another story. Intuitively we know that the

chance of seeing this if the null hypothesis were true is so small that we would be comfortable

rejecting the null hypothesis and declaring the coin to (probably) be biased.

Let’s quantify our intuition.

 The Coin Flip

Formally the flip of a coin can be represented by a Bernoulli trial. A Bernoulli trial is a random

 variable X such that

 That is, X takes on the value 1 (representing heads) with probability  p, and 0 (representing tails)

 with probability 1 – p1.

Now, let’s say we have 100 coin flips. Let Xi represent the i th  coin flip. Then the random variable

represents the run of 100 coin flips.

 The Statistics, Mathematically

Say you have a set of observations O and a null hypothesis H0. In the above coin example we

 were trying to calculate

Page 18: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 18/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

i.e., the probability that we observed what we did given the null hypothesis. If that probability issufficiently small we’re confident concluding the null hypothesis is false2 

 We can use whatever level of confidence we want before rejecting the null hypothesis, but most

people choose 90%, 95%, or 99%. For example if we choose a 95% confidence level we reject

the null hypothesis if 

 The Central Limit Theorem is the main piece of math here. Briefly, the Central Limit Theorem

says that the sum of any number of re-averaged identically distributed random variables

approximates a normal distribution.

Remember our random variables from before? If we let

then p is the proportion of heads in our sample of 100 coin flips. In our case, it is equal to 0.51,or 51%.

But by the central limit theorem we also know that p approximates a normal distribution. This

means we can estimate the standard deviation of  p as

 Wrapping It Up

Our null hypothesis is that the coin is fair. Mathematically we’re saying 

Page 19: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 19/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Here’s the normal curve:

 A 95% level of confidence means we reject the null hypothesis if  p falls outside 95% of the area

of the normal curve. Looking at that chart we see that this corresponds to approximately 1.98

standard deviations.

 The so-called “z-score” tells us how many standard deviations away from the mean our sample

is, and it’s calculated as

 The numerator is “p – 0.50″ because our null hypothesis is that p = 0.50. This measures how far

the sample mean, p, diverges from the expect mean of a fair coin, 0.50.

Page 20: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 20/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Example:

Let’s say we flipped three coins 100 times each and got the following data.

Data for 100 Flips of a Coin

Coin Flips Pct. Heads Z-score

Coin 1 100 51% 0.20

Coin 2 100 60% 2.04

Coin 3 100 75% 5.77

Using a 95% confidence level we’d conclude that Coin 2 and Coin 3 are biased using the

techniques we’ve developed so far. Coin 2 is 2.04 standard deviations from the mean and Coin 3

is 5.77 standard deviations.

 When your test statistic meets the 95% confidence threshold we call it statistically significant .

 This means there’s only a 5% chance of observing what you did assuming the null hypothesis

 was true. Phrased another way, there’s only a 5% chance that your observation is due to random

 variation.

B. Statistical error: Type I and Type II

Statisticians speak of two significant sorts of statistical error. The context is that there is a "null

hypothesis" which corresponds to a presumed default "state of nature", e.g., that an individual is

free of disease, that an accused is innocent. Corresponding to the null hypothesis is an

"alternative hypothesis" which corresponds to the opposite situation, that is, that the individual

has the disease, that the accused is guilty. The goal is to determine accurately if the null

hypothesis can be discarded in favor of the alternative. A test of some sort is conducted and data

are obtained. The result of the test may be negative (that is, it does not indicate disease, guilt).

On the other hand, it may be positive (that is, it may indicate disease, guilt). If the result of the

test does not correspond with the actual state of nature, then an error has occurred, but if the

result of the test corresponds with the actual state of nature, then a correct decision has been

made. There are two kinds of error, classified as "type I error" and "type II error," depending 

upon which hypothesis has incorrectly been identified as the true state of nature

 Type I error

 Type I error, also known as an "error of the first kind", an α error, or a "false positive": the error

of rejecting a null hypothesis when it is actually true. Plainly speaking, it occurs when we are

observing a difference when in truth there is none, thus indicating a test of poor specificity. An

example of this would be if a test shows that a woman is pregnant when in reality she is not, or

telling a patient he is sick when in fact he is not. Type I error can be viewed as the error of 

Page 21: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 21/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

excessive credulity 

In other words, a Type I error indicates "A Positive Assumption is False"

 Type II error

 Type II error, also known as an "error of the second kind", a β error, or a "false negative": the

error of failing to reject a null hypothesis when in fact we should have rejected the null

hypothesis. In other words, this is the error of failing to observe a difference when in truth there

is one, thus indicating a test of poor sensitivity. An example of this would be if a test shows that

a woman is not pregnant, when in reality, she is. Type II error can be viewed as the error of 

excessive scepticism.

6. From the following table, calculate Laspyres Index Number, Paasches

Index Number, Fisher’s Price Index Number and Dorbish & Bowley’s

Index Number taking 2008 as the base year.

Commodity

2008 2009

Price

(Rs) per 

Kg

Quantity in

Kg

Price

(Rs) per 

Kg

Quantity in

Kg

A 6 50 10 56

B 2 100 2 120

C 4 60 6 60

D 10 30 12 24

E 8 40 12 36

Page 22: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 22/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736

Page 23: 3. MB0040 Mba1 Stats

8/3/2019 3. MB0040 Mba1 Stats

http://slidepdf.com/reader/full/3-mb0040-mba1-stats 23/23

  Fall 2010 

Submitted By: Satish Patil Roll Number: 521053391 Learning Center: 01736


Recommended