+ All Categories
Home > Documents > LECTURE.docx

LECTURE.docx

Date post: 16-Jul-2016
Category:
Upload: sm-akash
View: 213 times
Download: 0 times
Share this document with a friend
118
Definition of Statistics: The science that deals with the collection, classification, analysis, and interpretation of numerical facts or data. Statistics is especially useful in drawing general conclusions about the population observations or sample observations selected from the population. Statistics are facts consisting of numbers, obtained from analyzing information. Statistics mainly concern with quantitative information. Terms related to Statistics: Population: Population is the collection of all individuals or items or units under consideration in a statistical study. The size of the population is denoted by N. Sample: Sample is representative part of the population from which information is to be collected. The size of the sample is denoted by n (≤ N). Population and Sample 1
Transcript

Definition of Statistics: The science that deals with the collection, classification, analysis, and interpretation of numerical facts or data. Statistics is especially useful in drawing general conclusions about the population observations or sample observations selected from the population. Statistics are facts consisting of numbers, obtained from analyzing information. Statistics mainly concern with quantitative information.

Terms related to Statistics:

Population: Population is the collection of all individuals or items or units under consideration in a statistical study. The size of the population is denoted by N.Sample: Sample is representative part of the population from which information is to be collected. The size of the sample is denoted by n ( N).

Population and Sample

For example we are interested to study the average daily expenditure of the 4th semester students in EEE, AIUB University. There are two ways to measure average: one way to collect information from all students under the study. Alternatively, instead of all students, we can consider a group of students, which can be considered as a representative part of all students to collect information. Now, we can calculate the average expenditure by these two approaches. In the first approach we have considered all students of the class, which is known as population. Alternatively, in the second approach, we have considered a group, representing all students of the whole class, which is called a sample.Types of population:

1. Finite population Example: Number of computer centers in Dhaka city Number of city in Bangladesh Number of human being in a country2. Infinite population Example: Number of fishes in a pond or sea Number of trees in Bangladesh Number of stars in the sky

Parameter and Statistic: Parameter is an unknown but constant characteristic of the population. Statistic is a function of sample observations.

Variable: A characteristic which varies from one unit to another is called a variable.

Types of variables:

Qualitative Variable:The variable which cannot be measured by numerical figure is called a qualitative variable or categorical variable.Example: Gender, Religion, Color etc. are qualitative variables.

Quantitative Variable:The variable which is measured by numerical value is called a quantitative variable.Example: Individuals Age, Weight, Height etc. are quantitative variables.

Discrete Variable:A quantitative variable which takes only integral values is called a discrete variable. It usually ranges from 0 to .Example: Number of bedrooms in each apartment in a multistoried building, number of students in each section of a statistics course etc.Continuous Variable:A quantitative variable which takes integral as well as fractional values is called a continuous variable. It usually ranges from - to.Example: Age of human beings, the life of a battery, speeds of automobile etc.

We can summarize types of variables in the following diagram: Classification of variables

Statistical data are collected in two ways:

Census: If information is collected from all units of a population, then the process is called census. You might say a census is a sample survey of 100% units of the population.

Sample survey: A survey is a data collection activity involving a sample of the population.

Array: Raw data are collected data that have not been organized numerically. An array is an arrangement of raw numerical data in ascending or descending order of magnitude.

Data:Information obtained by observing values of a variable are data. Data is a plural word and comprehend the idea of collection of pieces of information. The singular word for data (very seldom used in statistical works) is datum.A common classification of statistical data is two types:Primary data: Data which are collected from population units either by census or by sample survey are called primary data.Examples: Data collected by a student for his/her thesis or research project.

Secondary data: Data which are collected from official record or from published work are called secondary data.Examples: Census data being used to analyze the impact of education on career choice and earning.

Some Advantages of using Primary data:1. The investigator collects data specific to the problem under study.2. There is no doubt about the quality of the data collected.3. If required, it may be possible to obtain additional data during the study period.Some Disadvantages of using Primary data:1. The investigator has to contend with all the hassles of data collection.2. Ensuring the data collected is of a high standard3. Cost of obtaining the data is often the major expense in studies

Some Advantages of using Secondary data:1. There is no hassles of data collection2. It is less expensive3. The investigator is not personally responsible for the quality of data.Some disadvantages of using Secondary data:1. The investigator cannot decide what is collected (if specific data about something is required, for instance).2. One can only hope that the data is of good quality3. Obtaining additional data (or even clarification) about something is not possible (most often)Sample questions

a. Mention the importance of statistical data and write down the sources of statistical data.b. Distinguish between census and sample survey. c. What are the differences between primary and secondary data? d. What are the different types of variable? Write down some examples of quantitative and qualitative variable; discrete and continuous variable.e. What are the different types of data? Write down some examples of different types of data.

Data Representation: We can often make a large and complicated set of data in more compact and easier way to understand. Statistical data can be presented byi) Tabulation ii) Graphs and diagrams Tabulation:Frequency Distribution: A tabular arrangement of data by classes together with the corresponding number of items in each class is called a frequency distribution or frequency table. It is used to represent the value of different levels of quantitative variable. Example of frequency distribution is as:

Table: Tally marks for grouping the length of 40 laurel leavesClass Interval of lengthTallyFrequency

118-128128-138138-148148-158158-168168-178/////// ////// //// /////// ///////////3713953

Total40

Terms Associated with Frequency Distributions: Class size or width - the differences between lower and upper class limits. Cumulative frequencies are the cumulative totals of successive frequencies of a frequency distribution. Class mark or midpoint - the average of class limits.

LengthFrequencyMidpointCumulative frequency

118-128128-138138-148148-158158-168168-178371395312313314315316317331023323740

Total40

Find the number of leaves the length of which is less than 158mm.Ans: There are 3+7+13+9 = 32 leaves whose lengths are less than 158mm.

Find the percent of leaves the length of which is 148 mm or above.Ans: There are leaves whose values are 148 mm or above.

Graphs and diagrams:Different graphs and diagrams are: i) Bar diagram, ii) Pie diagram, iii) Stem-and-leaf plot, iv) Histogram, v) Frequency curve, vi) Scatter diagram.

Diagrammatic representation of data: Bar diagram and pie diagram are generally used to represent the value of qualitative variable diagrammatically.

Bar diagram: Bar diagrams are simple diagrams that are made up of a number of rectangular bars of equal widths whose heights are proportional to the quantities or frequencies they represent. The quantities which are to be shown are plotted in Y-axis against the qualities which are to be shown are plotted in X-axis. The quantity of each level of qualitative is to be shown by uniform width of the bar. There should be uniform gap from bar to bar and usually the gap should be half of the width of the bar.Pie diagram: Pie diagrams can be defined as a circle drawn to represent the totality of a given data. The circle is also divided into sectors with each sector proportional to the components of the variable it represents. Pie diagram is very useful in drawing comparison among the various components or between a part and the whole. Both diagrams are used to represent the value of different levels of qualitative variable.

Example: The following are the number of computers of different laboratories:LaboratoryNumber of computers

L135

L225

L350

L470

Total180

a) Draw a bar diagram of the data. b) Represent the data by a pie diagram.Solution: Title: Bar diagram of the number of computers of different laboratories

We have calculated below the various angles of pie-diagram:LaboratoryNumber of computersAngels =

L13570

L22550

L350100

L470140

Total180360

Title: Pie diagram of the number of computers of different laboratories

Stem-and-Leaf Plots: In statistics, data is represented in tables, charts, and graphs. One disadvantage of representing data in these ways is that the actual data values are often not retained. One way to ensure that the data values are kept intact is to graph the values in a stem-and-leaf plot. A stem-and-leaf plot is a method of organizing the data that includes sorting the data and graphing it at the same time. This type of graph uses a stem as the leading part of a data value and a leaf as the remaining part of the value. Usually the left side digit of a number is used as stem and right side digit/digits are used as leaf. Both stem and leaf are arranged in ascending order.

Example: At a local veterinarian school, the number of animals treated each day over a period of 20 days was recorded as: 28, 34, 23, 35, 16, 17, 47, 5, 60, 26, 39, 35, 47, 35, 38, 35, 55, 47, 54, and 48. Construct a stem-and-leaf plot for the data.First arrange the observations as: 05, 16, 17, 23, 26, 28, 34, 35, 35, 35, 35, 38, 39, 47, 47, 47, 48, 54, 55, 60.Title: Stem-and-leaf plot of the number of animals treated in a veterinarian schoolStem Leaf

05

16, 7

23, 6, 8

34, 5, 5, 5, 5, 8, 9

47, 7, 7, 8

54, 5

60

Example: The number of students enrolled in a research class in the past 12 years. The number of students are 81, 94, 100, 84, 93,102, 103, 85, 86, 110, and 111. Represent the data set by stem-and-leaf plot.

First arrange the observations as: 081, 084, 085, 091, 093, 100, 102, 103, 110, 111.

Title: Stem-and-leaf plot of number of students enrolled in a research classStem Leaf

081, 84, 85, 91, 93

100, 02, 03, 10, 11

Histogram: It is the graphical representation of continuous classes of frequency distribution, where class intervals are plotted in X-axis and frequency of a class are shown in Y-axis against the whole width of the class by drawing rectangle. The height of the rectangle is proportional to the class frequency.Example: A frequency distribution the changing the size of the bin is given as:

Class IntervalFrequency

0-101

10-203

20-306

30-404

40-502

Draw a histogram using the given data set.Title: Histogram of the changing the size of the bin

Example: The following is the distribution of weights (in kg) of 50 persons:Weight (in kgs)50-5555-6060-6565-7070-7575-8080-8585-90Total

Number of persons12854576350

Draw a histogram for the above data. (Do yourself in the class)

Hints: We represent the class limits along the X-axis on a suitable scale and the frequencies along the Y-axis on a suitable scale. Since the scale on the X-axis starts at 50, a kink (break) is indicated near the origin to signify that the graph is drawn to scale beginning at 50, and not at the origin.

Difference between bar diagram and histogram:a. Bar diagram is used to represent the qualitative variable and histogram is used to represent quantitative variable.b. Bar diagram is one dimensional histogram is two dimensional.c. In histogram all bars are adjacent and in bar diagram there are gaps from bar to bar.Frequency curve: It is a smooth graph of the class frequency plotted against the mid value. It can be obtained by connecting the midpoints of the tops of the rectangles in the histogram by free hand/ smooth hand.Example: The following data represents the number of miles run by 20 randomly selected runners during a recent road race. Represent the data by frequency curve.DistanceFrequencyMid value

6-1118.5

11-16313.5

16-21218.5

21-26423.5

26-31528.5

31-36333.5

36-41238.5

Total20

Title: Frequency curve of selected runners during a recent road race

Example: The following data represents the marks made by 40 students on a math 10 test.MarksNo. of studentsMid value

20-30325

30-40435

40-50645

50-601055

60-701265

70-80875

80-90585

90-100295

Total40

a) Represent the data by histogram.b) Draw a frequency curve of the math score data.Solution: Draw histogram and then draw a frequency curve to represent the data.

Scatter diagram: A scatter (XY) Plot has points that show the relationship between two sets of data. It is a diagram of scatter of points, where points are plotted in Y-axis against the values shown in X-axis. It is used to show the pair of values of the two variables X and Y, where the values of the variable are observed from sample unit of a sample. As two variables are observed from same unit, they are expected to be correlated and one variable, usually Y, depends on another variable, usually X.Example: The local ice cream shop keeps track of how much ice cream they sell versus the noon temperature on that day. Here are their figures for the last 12 days:Temperature C (X)Ice Cream Sales (Y)

14.2215

16.4325

11.9185

15.2332

18.5406

22.1522

19.4412

25.1614

23.4544

18.1421

22.6445

17.2408

Title: Scatter diagram of temperature and ice-cream sales

It is now easy to see that warmer weather leads to more sales, but the relationship is not perfect.

Sample questions1. a. Write down the differences between histogram and bar diagram?

b. Mention the name of the variables the values of which are presented by bar diagram and by pie diagram. Also mention some examples of the variable the value of which are presented by histogram and by frequency curve.

c. Mention important names of graphs and diagrams used to represent statistical data. Which of the graphs and diagrams are used for presenting qualitative data and which are used for quantitative data?

2. The following are the number of calls received by person in different days: Days : 1 2 3 4 5 TotalNumber of calls :7 8 5 15 10 45

Represent the number of calls of different days by bar diagram and by pie diagram. 3. Frequency distribution of the resting pulse rate in healthy volunteers (N = 63)Pulse/minNo. of volunteers

60-652

65-707

70-7511

75-8015

80-8510

85-909

90-956

95-1003

Total63

a) Represent the data by histogram and frequency curve.b) Find the percentage of volunteers for whom less than 85 pulses are counted.c) Find the percentage of volunteers for whom 70 & above pulses are counted.d) Find the number of volunteers for whom less than 90 pulses are counted.

4. The following are the number of e. mails received in different days by different organizations : Days (x) : 5 8 3 10 15No. of mails received (y): 54 65 12 128 158

Draw scatter diagram of the data.

5. The following are the number of customers visited a mobile-operator's office in first hour of different days: 18,25, 10,22, 18,23,20, 15,8,20,5,32,26,9,20,36,38,42,28,25,33, 12, 19,46,21.

Represent the data by a stern-and-leaf plot.

Measures of Central Tendency: A measure which usually tends to fall in the centre of the array is called measure of central tendency. There are different types of measures of central tendency; each has its own advantages and disadvantages. 1) Meana) Arithmetic meanb) Geometric meanc) Harmonic mean2) Median3) Mode

Arithmetic mean: For ungrouped data set:Let are n variates, then, the arithmetic mean is defined by

The marks scored by 6 students in quiz in statistics are: X: 5, 8, 10, 12, 13, 6. Arithmetic mean, For Frequency distribution:Let are n variates with frequencies, then, the arithmetic mean is defined by

Where, x= mid value , f = frequency, n= total frequency = f

Example: Calculate the arithmetic mean of mark of the following distribution:Marks10-2020-3030-4040-5050-60Total

No. of students3593222

Solution: The calculation is shown in the table below:Marks10-2020-3030-4040-5050-60Total

No. of students (fi)3593222

Mid value (xi)1525354555

45125315135110730

Geometric mean: It is usually calculated if data are given in rates or ratios. The geometric mean of a set of n values of a variable is the nth root of their product.

For ungrouped data:Let a variable x assumes n non-zero and positive values. Then its geometric mean is defined by: / GM = ()1/n

For a frequency distribution:Let are n variates with frequencies, then, the geometric mean is defined by / GM =)1/n []Example: What is the geometric mean of 4, 9, 9, and 2?Solution: Example: Find the geometric mean of the following values: x: 15, 12, 13, 19, 10.Solution: The calculation is shown in the table below:x1512131910Total

logx1.17611.07921.11391.272215.648

Example: Find the geometric mean of the following data:

Marks0-1010-2020-3030-4040-50Total

No. of students48106735

Solution: The calculation is shown in the table below:ClassMid-value (x)logxff logx

0 - 1050.699042.7960

10 - 20151.176189.4088

20 - 30251.39691013.9790

30 - 40351.544169.2646

40 - 50451.6532711.5724

Total3547.0208

Harmonic Mean: It is used if data are given in rates or ratios.For ungrouped data:Let a variable x assumes n non-zero and positive values. Then its harmonic mean is defined by: / For a frequency distribution:Let are n non-zero and positive values with frequencies , Then the harmonic mean is defined by: [ n = f ]Example: Find the harmonic mean of the following data X:8, 9, 6, 11, 10, 5.Solution: Example: Find the harmonic Mean for the following data:Class10-2020-3030-4040-5050-6060-7070-8080-9090-100

Frequency312871526531

Solution: The calculation is shown in the table below:

ClassFrequency (f)Mid value (x)f / x

10-203150.200

20-3012250.480

30-408350.229

40-507450.156

50-6015550.273

60-7026650.400

70-805750.067

80-903850.035

90-1001950.011

Total n = 801.849

Exercise:a) A fellow travels from city A to city B. For the first 10 miles, he drove at the constant speed of 20 miles per hour. Then he (instantaneously) increased his speed and, for the next 10 miles, kept it at 30 miles per hour. Find the average speed of the movement.

b) A person runs 10 programs at a speed of 400 MB 20 program at a speed of 550 MB. Find the average speed of run per program.Show by an example that: AM GM HM.Solution: Let a data set X: 2, 3, 4, 5, 6. thenAM = = = 4GM = = = 3.7279HM = = 3.4483Here, AM > GM > HM. If data set is as X: 5, 5, 5, and 5; then, AM=GM=HM=5.

Median: A value which divides the array of data set into equal two parts. To find the median, arrange the data in array and find the positional value which divides the array into two equal parts.

For ungrouped data:Make an array

= The value of [ th number observation + ( + 1) th number observation] []

Exercise: Find the Median of the given data set, x: 8, 4, 3, 3, 5, 7, 1.Solution: Array , x:1 , 3 , 3 , 4 , 5 , 7 , 8 ; n= 7 (an odd number)

Exercise: Find the Median of the given data set, X:8, 4, 3, 3, 5, 7, 1, 6.Solution: Array , x:1 , 3 , 3 , 4 , 5 , 6 , 7 , 8. n= 8 (an even number) Me = the value of [ th number observation + ( + 1) th number observation] [] = the value of [ th number observation + ( + 1) th number observation]

For a frequency distribution:

Median class is that class for which c.f.

Example: Following distribution of statistics scores of 20 studentsScores1-22-33-44-55-6Total

No. of students1386220

Find the value of median of scores.

Solution: The calculation is shown in the table below:Class intervalFrequency, fCumulative frequency

1-211

2-334

3-4812

4-5618

5-6220

Total20

Mode: The mode is the number which appears most often. There can be more than one mode in a data set. If the two values are tied for being the most common values in the set, the data set can be said to be bimodal, whereas if three values are tied, the set is tri-modal, and so on.

Example: The data set is X: 2, 3, 3, 4. The value of mode is 3. The data set is X: 2, 3, 3, 4, 6, 6. The value of mode is 3and 6. The data set is X: 2, 3, 3, 4, 4, 4, 6. The value of mode is 4. The data set is X: 2, 3, 4, 6. There is no mode. For frequency distribution:

Example: The frequency distribution is given as:Class Interval85-9191-9797-103103-109109-115Total

Frequency (f)21056427

Calculate the value of mode.Answer: Mode of the distribution is:

Central tendency and skewness: Relationship between mean, median and mode may give some indicating of the shape of the frequency distribution without having to create a histogram/ frequency curve.Skewness: The asymmetric property of frequency distribution is called skewness. Measures of skewness based on central tendency:i) Absolute measures of skewness.ii) Relative measures of skewness.Absolute measures of skewness:

Mean = Median= Mode [the distribution is symmetric]Mean > Median> Mode [the distribution is positively Skewed]Mean < Median < Mode [the distribution is negatively Skewed]

Measure of skewness,SK = mean-medianif, SK = 0 [the distribution is symmetric]Or, = mean mode SK > 0 [the distribution is positively Skewed] SK < 0 [the distribution is negatively Skewed]

Skewness using graph:

Find the mean, median, and mode for each set of data:1. 4, 6, 9, 12, 5; mean = 7.2; median = 6; mode = no mode2. 7, 13, 4, 7; mean = 7.75; median = 7; mode = 73. 10, 3, 8, 15; mean = 9; median = 9; mode = no mode4. 9, 9, 9, 9, 8; mean = 8.8; median = 9; mode = 95. 300, 24, 40, 50, 60; mean = 96.8; median = 50; mode = no mode6. 23, 23, 12, 12; mean = 17.5; median = 17.5; mode = 12, 23Comment on the skewness of the above distributions.

Sample Questions

1. The distribution of miss calls recorded in a mobile set of an organization in different days is shown below: Class Interval of miss calls4-88-1212-1616-2020-24Total

No. of days (f)824105350

Calculate arithmetic mean, geometric mean and harmonic mean of yield miss calls. Find median and mode of miss calls also comment on skewness.

2. Define measure of central tendency and measure of location. Write down the difference between these measures. Why median is the best measure of central tendency?

3. Find arithmetic mean, geometric mean, harmonic mean, median and mode of the observations (x): 10, 18, 5, 7, 12, and 5.

4. A train moves 1st 80 km at a speed of 75 km/h, 2nd 70 km at a speed of 85 km/h, 3rd 85 km at a speed of 66 km/h and 4th 55 km at a speed of 50 km/h. Find the average speed throughout the journey.

5. The distribution of service time (in hours) of battery used in portable personal computer is given below: Class Interval of service time2.0-2.52.5-3.03.0-3.53.5-4.04.0-4.5

No. of batteries (f)1028831

Represent the data by a frequency curve and comment on the skewness of the curve. Measures of location: It is a measure which is located in different places in the array. The measures divide the array into several equal parts. The different measures are:i) Quartilesii) Decilesiii) percentilesQuartiles: It is a measure which divides the array into 4 equal parts. It is denoted by Qi ; i = 1, 2, 3. Q1 is a measure which is the maximum value of the first 25% observations of the array. Q3 is a measure which is the minimum value of the last 25% observations of the array.For ungrouped data:Let X1, X2,,Xn. be the set of observation. Then,

Example: A data set is given as X: 2, 5, 3, 6, 7, 4, 9. Calculate the value of.Solution: The array is, x: 2,3,4,5,6,7,9. Here, n=7(an odd number).in the array in the array in the array in the array in the array in the array in the array

Example: A data set is given as X: 2, 5, 3, 6, 7, 4, 9, 13. Calculate the value of.Solution: The array is, x: 2, 3, 4, 5, 6, 7, 9, 13. Here, n=8 (an even number). in the array in the array in the array in the array in the arrayQ3, (Do yourself).For frequency distribution:

Quartile () group is that group for which c.f.

Example: Find Q1 from these grouped data:Class LimitFrequencyCumulative frequency

0-1022

10-2035

20-30510

30-40212

40-50618

50-60220

Solution: , so . . . Deciles: It is a measure which divides the array into 9 equal parts. It is denoted by Di ; i = 1, 2,.,9. D1 is the value of the array that is maximum of the 1st 10% observations & so on.For ungrouped data:Let X1, X2,,Xn. be the set of observation. Then,

Example: A data set is given as X: 2, 5, 3, 6, 7, 4, 9. Calculate the value of. (Try)Example: A data set is as X: 2, 5, 3, 6, 7, 4, 9, 13. Calculate the value of. (Try)For frequency distribution:

Deciles () group is that group for which c.f.

Example: Find D4 from these grouped data: (Try)Class LimitFrequencyCumulative frequency

0-1022

10-2035

20-30510

30-40212

40-50618

50-60220

Percentiles: It is a measure which divides the array into 100 equal parts. It is denoted by Pi ; i = 1, 2,...,100. P1 is the maximum value of the first 1% observations of the array. P99 is the minimum value of the last 99% observations of the array.For ungrouped data:Let X1, X2,,Xn. be the set of observation. Then, Example: A data set is given as X: 2, 5, 3, 6, 7, 4, 9. Calculate the value of (Try)Example: A data set is as X: 2, 5, 3, 6, 7, 4, 9, 13. Calculate the value of . (Try)

For frequency distribution:

Percentile () group is that group for which c.f. Example: Determine P80 from the following distribution:Class intervalFrequencyCumulative frequency

0-52020

5-101535

10-153166

15-202288

20-251098

25-302100

Solution: , so . The cumulative frequency just greater than 80 is 88, so the class (15-20) contains P80 .

Sample Questions

1. The distribution of direct solar intensity measurement (Watts / m2) observed from different experiments is given below: Class interval of intensity500-600600-700700-800800-900900-1000

Number of experiments (f)2032221412

Find the values of Q1, Q2, Q3, D8 and P60 of melting points.

2. The following are the number of customers visited a mobile-operator's office in first hour of different days, Observations (x):18, 25, 10, 22, 18, 23, 20, 15, 8, 12. Find the values of Q1, Q2, Q3, D6 and P80 of customers visited.

The measures of central tendency are not adequate to describe data. Two data sets can have the same mean but they can be entirely different. Thus to describe data, one needs to know the extent of variability. This is given by the measures of dispersion. Dispersion means deviations of observations from some central value. Another way of examining single variable data is to look at how the data is spread out, or dispersed about the mean. Consider two data sets to understand dispersion:X1: 23, 24, 25, 26, 27; n1=5; mean= 25X2: 5, 10, 16, 28, 66; n2=5; mean= 25We easily observe that the mean for both the data is 25. But the set are not identical as the values are either scattered widely or closely packed.

Measure of dispersion: The average deviation of observations from some central value is called measure of dispersion. There are two types measure of dispersion:

Absolute measures of dispersion: It gives us the information of average deviation from central value. Measures arei) Rangeii) Mean deviationiii) Standard deviation Relative measures of dispersion: It gives us percentage of average deviation from central value compared to central value. Measures arei) Coefficient of rangeii) Coefficient of mean deviationiii) Coefficient of standard deviationiv) Coefficient of variation(C.V)

Range: The range is the difference between the largest and the smallest observation in the data. Let, a data set is as X1, X2,..,Xn. Then,Range, R = largest observation - smallest observation

Example a :Teacher took 7 math tests in one marking period. What is the range of her test scores? X: 89,73,84,91,87,77,94.

Solution:Arranging the values in ascending order, we get:

X: 73, 77, 84, 87, 89, 91, 94

Range, R= largest - smallest = 94 - 73 = 21

Example b: A frequency distribution is given as:Class Interval10-1515-2020-2525-3030-3535-40Total

Frequency58151911765

Calculate the value of range.

Solution: Range, 40-10=30= Upper limit of last class and

Coefficient of range: Coefficient of range: For example a: Coefficient of range = = = 12.57%For example b: Coefficient of range = = = 60%Mean deviation: The mean deviation or average deviation is the arithmetic mean of the absolute deviations from mean. For ungrouped data: M.D.() =

Example a: Calculate the mean deviation of the data, X: 9, 3, 8, 8, 9, 8, 9, and 18.

For grouped data: M.D.() Example b: Calculate the mean deviation of the following distribution:Class IntervalFrequency (fi)

10-153

15-205

20-257

25-304

30-352

Total21

Solution:Class Intervalxifixi fi| || | fi

10-1512.5337.59.28627.858

15-2017.5587.54.28621.43

20-2522.57157.50.7144.998

25-3027.541105.71422.856

30-3532.526510.71421.428

Total21457.598.57

Coefficient of mean deviation: The mean deviation is the absolute measure of dispersion. Its relative measure is called coefficient of mean deviation, defined as:

For example a: For example b:

Standard deviation: The standard deviation is defined as the positive square root of the mean of the square deviations taken from arithmetic mean of the data.For ungrouped data: For grouped data:

Coefficient of Standard Deviation: The standard deviation is the absolute measure of dispersion. Its relative measure is called standard coefficient of dispersion or coefficient of standard deviation. It is defined as:Coefficient of Standard Deviation =Coefficient of Variation: The most important of all the relative measure of dispersion is the Coefficient of Variation (CV), defined as: CV = . Thus CV is the value of SD when is assumed equal to 100. It is a pure number and the unit of observations is not mentioned with its value. It is written in percentage form like 20% or 25%. When its value is 20%, it means that the observations vary, on an average, 20% with respect to mean.Example: Calculate the coefficient of standard deviation and coefficient of variation for the observations, X: 2, 4, 8, 6, 10, and 12.Solution: x

2(27)2=25

4(47)2=9

8(87)2=1

6(67)2=1

10(107)2=9

12(127)2=25

x = 42= 70

S2 = S =

Coefficient of Standard Deviation == 0.48.86Coefficient of Variation : %Example: Calculate coefficient of standard deviation and coefficient of variation from the following distribution of marks:MarksNo. of Students

1340

3530

5720

7910

Solution: Marksfxfx()2f ()2

13402804160

3530412000

57206120480

791088016160

Total100400400

S2 = S =

Coefficient of Standard Deviation = = 0.50 Coefficient of Variation: %Covariance: In probability theory and statistics, covariance is a measure of how much two random variables change together. It is defined as:

Cov (x, y) =

Where, (x1,y1) , (x2,y2) ,,(xN,yN) are the N pairs of values of two variables X & Y.

Example: The table below describes the rate of economic growth (xi) and the rate of return on the S&P 500 (yi) :Economic growth % (xi)S&P 500 returns % (yi)

2.18

2.512

4.014

3.610

Calculate the value of covariance of x and y.Solution:

Cov (x, y) = = = 1.53Sample questions1. Define dispersion and measure of dispersion. What are the different measures of dispersion?

2. Why C.V. is the best measure of dispersion? Explain, why do we need relative measures of dispersion?

3. Calculate mean square deviation from mean (variance), coefficient of mean deviation and C.V. of the observations (x): 2, 8, 7, 0, 5, - 9, and 12.

4. The distribution of diameter ( in mm ) of a hole drilled in a sheet metal component is shown below: Class interval of diameter10-1212-1414-1616-1818-20

Number of holes (f)81612104

Calculate standard deviation of diameter. Find mean deviation about mean and the corresponding relative measures of dispersion. Calculate C. V. of the distribution of diameter.

5. Write down the advantage of relative measure of dispersion and disadvantage of standard deviation. What do you mean by covariance?

6. The data on wire bond pull strength ( y) and wire length ( x ) observed from different experiments are shown below: Y9.5820.510.61215.8209

X224205330542

Represent the data by a scatter diagram. Calculate covariance of x and y.

Moments: Mean deviation of r-th order is known as moments. There are two types of moments. These are:i) General momentsii) Raw momentsiii) Central moments

General moments: r-th general moment is the mean deviation of r-th order when deviation is taken from a general value, say A.

Raw moment: r-th raw moment is the mean deviation of r-th order when deviation is taken from 0 (zero).

Central moment: r-th central moment is the mean deviation of r-th order when deviation is taken from arithmetic mean ().

Relative measures of skewness:Coefficient of skewness (in terms of central tendency):

Coefficient of skewness (in terms of moments):

= 0; (the distribution is symmetric) 0; (the distribution is right skewed)0; (the distribution is left skewed)

Kurtosis: Height characteristics of frequency curve are known as kurtosis.

Coefficient of kurtosis: =3; (the distribution is meso kurtic)3; (the distribution is lepto kurtic)3; (the distribution is platy kurtic)

Example: Calculate the coefficient of skewness based on central tendency of the values, X: 2, 4, 8, 6, 10, and 12. (Try yourself)Example: Calculate the coefficient of skewness based on central tendency of the following distribution of marks: (Try yourself)MarksNo. of Students

1340

3530

5720

7910

Total100

Example: Here are data for heights of 100 randomly selected male students:Height (inches)Frequency, f

59.562.55

62.565.518

65.568.542

68.571.527

71.574.58

Total100

a) Calculate coefficient of skewness based on moments and comment.b) Calculate coefficient of kurtosis and comment.Solution:

Mid value, xFrequency, fxf(xx)(xx)f(xx)f(xx)4f

615305-6.45208.01-1341.688653.84

64181152-3.45214.25-739.152550.05

67422814-0.458.51-3.831.72

702718902.55175.57447.701141.63

7385845.55246.421367.637590.35

Total67450852.75269.3319937.60

= = 8.53 = =

It is left skewed and platy kurtic.

Example: Calculate the coefficient of skewness and coefficient of kurtosis based on moment of the data set, X: 2, 4, 11, 6, 10, 14, 5.

Sample questions

1. Define skewness and kurtosis. What are the different measures of skewness and kurtosis? How would you comment on skewness and kurtosis?

2. The distribution of service time (in hours) of battery used in portable personal computer is given below: Class interval of service time2 - 2.52.5 - 33 - 3.53.5 - 44 - 4.5

Number of batteries (f)1028831

Represent the data by a frequency curve and comment on the shape of the curve. Calculate coefficient of skewness of the distribution in terms of measures of central tendency. Calculate 1 and 2 of the distribution and comment.

3. Calculate coefficient of skewness and coefficient of kurtosis in terms of moments and comment.Observations (x): 10, 18, 5, 7, 12, 5.

Moment Generating Function (M.G.F): If X is a random variable, then its m.g.f is defined byMx(t) = E[etX] = , if X is discrete = , if X is continuous

For discrete random variable X, we haveMx(t) = = [1 + tx ++++ ..] p(x) = p(x) + t xp(x) + x2p(x) + x3p(x) + x4p(x) + .

We know, E(X) = = =

For different values of x, we have = , = , =

Therefore, Mx(t) = 1 + t + + + + . [ p(x)=1]

It is seen that, = coefficient of t in Mx(t) = coefficient of in Mx(t) = coefficient of in Mx(t) = coefficient of in Mx(t)

In general, = coefficient of in Mx(t)As Mx(t) generates all the general moment about origin, Mx(t) is known as moment generating function. Thus, after expanding Mx(t), we can pick up the coefficient of in Mx(t). This coefficient is known as moment. The r-th general moment about origin can also be found out from the r-th derivative of Mx(t) with respect to r and putting t=0 in r-th derivative. Thus

= [ ] t = 0 = + + + ] t = 0 = = Mean = [ ] t = 0 = + + ] t = 0 =

In general, = [ ] t = 0

It has already been seen that there is relation between central moment and general moment (raw moment). We have seen that: 2 = = Variance. So, after calculating moment generating function, we can calculate central moments as:

3 = + 2 and 4 = + 6 - 3

Hence, we can study the shape characteristic of the frequency curve of the distribution. The shape characteristic is studied by:1 = and 2 = Thus moments are very important to study the shape characteristic of the distribution.

Cumulant Generating Function (C.G.F): It is defined by

Kx(t) = ln Mx(t) = ln [1 + t + + + . ]

= [t+ + + ] - [t + + + ...]2 + [t+ + + ...]3 - It is seen from Kx(t) that: Coefficient of t in Kx(t) = = Mean Coefficient of in Kx(t) = = 2 = Variance Coefficient of in Kx(t) = + 2 = 3

After calculating Kx(t) we can pick up the coefficient in Kx(t). Let Kr = Coefficient of in Kx(t)

If r = 1, K1 = Coefficient of t = If r = 2, K2 = Coefficient of = 2 If r = 3, K3 = Coefficient of = 3 If r = 4, K4 = Coefficient of = 4 - 322 , So 4 = K4 + 322

Example: The probability function of discrete random variable X is given by P(x) = ; x=0, 1, 2, ., n ; p + q = 1 Calculate mean and variance of X.

Solution: For discrete random variable X, the m.g.f. is Mx(t) = = = = qn + + + . = (q + )n

We know, = [ ] t = 0 . Then

= []t=0 = []t=0 = [n(q + )n 1 t=0 = n(q + )n-1p = np = []t=0 = [n(q + )n 1+(q + )n 1n] t = 0 = np2 + np

2 = = Variance = np2 + np n2 p2 = np(1-p) = npq [p+q=1]Here, np > npq , as q is a fractional value.

Example: The probability function of discrete random variable X is given by ; 0 , x= 0, 1, 2, 3. Calculate mean, variance, 1 and 2 of X and comment.

Solution: For discrete random variable X, the m.g.f. is Mx(t) = = = = [ 1+ ] = = Kx(t) = ln Mx(t) = ln = =

Kr = Coefficient of in Kx(t). K1 = = , K2 = 2 = , K3 = 3 = . K4 = 4 - 322 , 4 = K4 + 322 = + 3 2. 1 = = = > 0 2 = = = 3 + > 3 So the distribution is positively skewed and leptokurtic.

Example: The probability function of discrete random variable X is given by f(x) = ; x Calculate mean, variance, 1 and 2 of X and comment.Solution: For discrete random variable X, the m.g.f. isMx(t) = =

Let z = ; z , then x = + z , dx = .

Mx(t) = = =

Let u = , du = dz ; u . Then,

Mx(t) =

If z = , then = = 1. So,

Mx(t) = Kx(t) = ln Mx(t) = ln = t + Kr = Coefficient of in Kx(t)K1 = = = meanK2 = 2 = = varianceK3 = 3 = 0K4 = 4 - 322 , 4 = K4 + 322 = 0 + 3 = 31 = = 02 = = = 3So the distribution is symmetric and mesokurtic.Basics concepts of Probability: Experiment: It is an act that can be repeated under similar conditions. Outcomes: The results of an experiment are known as outcomes. Example: If tossing of a coin is an experiment, then getting head or tail are the outcomes. Random experiment: It is an experiment whose outcomes cannot be predicted with certainty in advance, and these outcomes depend on chance. Example: If an unbiased die is thrown once, any of the outcomes 1 or 2 or 3 or 4 or 5 or 6 may appear. Thus throwing a die is a random experiment.

Exhaustive outcomes: All possible outcomes of a random experiment are exhaustive outcomes.Example: In throwing an unbiased coin, the exhaustive outcomes are Head (H) and Tail (T). Mutually exclusive outcomes: If two or more outcomes cannot occur together, then they are called mutually exclusive outcomes.Example: In tossing a coin, if Head (H) occurs then Tail (T) does not occur.

Equally likely cases: If all the exhaustive outcomes of a random experiment have equal chance to occur, then the cases are called equally likely cases. Example: Let, an unbiased coin is tossed, where the equally likely outcomes are denoted by n. Sample space is S = {H, T}. If the probability of showing head is P (H) =1/2 and also for tail is P (T) =1/2, then we can say that all 2 outcomes in the sample space are equally likely.

Not Equally likely cases: If different outcomes have different chances of occurrence, then they are called not equally likely cases.Example: Let, a biased coin is tossed, where the sample space is S = {H, T}. If the probability of showing head is P (H) =2/3 and also for tail is P (T) =1/3, then we can say that 2 outcomes in the sample space are not equally likely.

Sample point: Any of the possible outcomes of a random experiment is known as sample point.Example: In tossing a coin, Head (H) is a sample point and Tail (T) is another sample point. Sample space: A set or collection of all possible outcomes of a random experiment is known as sample space. It is denoted by S.Example: In tossing a coin, the possible outcomes are Head (H) and Tail (T). So, the sample space for this random experiment is S = {H, T}.

Event: Any statement regarding one or more of the outcomes of a sample space recorded from a random experiment is known as event. It is denoted by A/B/C/Example: In throwing a die, the sample space is S = {1, 2, 3, 4, 5, 6}. Let, event of even numbers is A = {2, 4, 6} and another event of occurring just 5 number on the die is B = {5}.

Favorable outcomes: Number of outcomes in favor of an event is known as favorable outcomes. It is denoted by m ( n).Example: In the previous example, for event A, m=3 and for event B, m=1.

Mutually Exclusive Events: If two or more events have no common outcome(s), they are called mutually exclusive events.Example: In the above case, event A and B are mutually exclusive events as they have no common outcome i.e. AB=. Probability: If a random experiment shows n exhaustive, mutually exclusive and equally likely outcomes and if m ( n) outcomes are in favor of an event A, then the probability of an event A is measured by :

P (A) = = ; where, 0 P (A) = 1

Complementary Event: If there are n equally likely outcomes in a sample space and if an event A is defined with m ( n) outcomes, then with the remaining (n-m) outcomes another event can be defined. This latter event is known as complementary event. It is denoted by , where

P () = = 1- = 1- P (A)=> P (A) = 1- P () P (A) + P () = 1.

Independent Events: Two events A and B are said to be independent, if and only if P(AB) = P(A) P(B).

Example: Let, 2 balls are randomly selected one by one with replacement from a box of 3 red and 2 black balls, then the selection of second red/black ball is not affected by the selection of first red/black ball. Here, second selection is independent of first selection. Again, if 2 balls are randomly selected one by one without replacement from a box of 3 red and 2 black balls, then the selection of second red/black ball is affected by the selection of first red/black ball selection. Here, second selection is not independent of first selection.

Problem: In a box there are six balls numbered 1, 2, 3, 4, 5, and 6. Two balls are selected one by one (A) with replacement, (B) without replacement, (C) at random. Find the probability that (i) both balls are of same number, (ii) sum of the numbers of the balls are 10 or more, (iii) sum of the number of the balls is less than 8, (iv) sum of the numbers of the balls are 9 or first selected ball bears the number 3, (v) sum of the numbers of the balls are 8 under the condition that second selected ball bears the number 4.

Solution: The sample space of the experiment is112131415161

122232425262

132333435363

142434445464

152535455565

162636465666

With replacement: 2 balls are selected one by one from 6 balls in = 36 ways.Without replacement: 2 balls are selected one by one from 6 balls in = 30 ways.At random: 2 balls are selected from 6 balls at randomly in = 15 ways.

(i) Let, A be the event that both balls are of same number.

With replacement: Favorable cases to A, m=6; where, A = {11, 22, 33, 44, 55, 66} P (A) = = = Without replacement: Favorable cases to A, m=0. P (A) = = = At random: Favorable cases to A, m=0. P (A) = = =

(ii) Let, B be the event that sum of the numbers of the balls are 10 or more.

With replacement: Favorable cases to B, m=6; where, B = {46, 55, 64, 56, 65, 66} P (B) = = = Without replacement: Favorable cases to B, m=4; where, B = {46, 64, 56, 65} P (B) = = = At random: Favorable cases to B, m=2; where, B = {46, 56} P (B) = =

(iii) Let, C be the event that sum of the number of the balls is less than 8.

With replacement: Favorable cases to C, m=21; where, C={11, 12, 21, 13, 22, 31, 14, 23, 32, 41, 15, 24, 33, 42, 51, 16, 25, 34, 43, 52, 61} P (C) = = = Without replacement: Favorable cases to C, m=18; where, C= {12, 21, 13, 31, 14, 23, 32, 41, 15, 24, 42, 51, 16, 25, 34, 43, 52, 61} P (C) = = = At random: Favorable cases to C, m=9; where, C= {12, 13, 14, 23, 15, 24, 16, 25, 34} P (C) = = =

(iv) Let, D be the event that sum of the numbers of the balls are 9 and E be the event that first selected ball bears the number 3.

With replacement: Favorable cases to D, m=4; where, D = {36, 45, 54, 63} Favorable cases to E, m=6; where, E = {31, 32, 33, 34, 35, 36} Favorable cases to DE, m=1; where, DE = {36}P(D or E) = P(DE) = P(D) + P(E) P(DE) = + = = P(DE) is called Additional Rule of Probability for two not mutually exclusive events. Without replacement: Favorable cases to D, m=4; where, D = {36, 45, 54, 63} Favorable cases to E, m=5; where, E = {31, 32, 34, 35, 36} Favorable cases to DE, m=1; where, DE = {36}P(DE)= P(D) + P(E) P(DE) = + = = At random: Not applicable.

(v) Let, F be the event that sum of the numbers of the balls are 8 and G be the event that second selected ball bears the number 4.

With replacement: Favorable cases to F, m=5; where, F = {26, 35, 44, 53, 62} Favorable cases to G, m=6; where, G = {14, 24, 34, 44, 54, 64} Favorable cases to FG, m=1; where, FG = {44} P (F/G) = = = given P(G)0P (F/G) is called Conditional Probability of F under the condition that G occurs first. Without replacement: Favorable cases to F, m=4; where, F = {26, 35, 53, 62} Favorable cases to G, m=5; where, G = {14, 24, 34, 54, 64} Favorable cases to FG, m=0; where, FG = P (F/G) = = = 0 given P(G)0 At random: Not applicable.Problem: An urn contains 4 red and 3 white balls. Two balls are drawn one after another (a) with replacement, (b) without replacement. Find the probability that (i) both balls are white (ii) one ball is white and another one is red.

Solution: The sample space to draw 2 balls one after another is S = {WW, WR, RW, RR}

(i) Let, A: both balls are white; where, A = {WW}a. P(A) = P(WW) = = b. P(A) = P(WW) = = =

(ii) Let, B: one ball is white and one is red; where, B = { RW, WR }(a) P(B) = P(RW) + P(WR) = + = + = (b) P(B) = P(RW) + P(WR) = + = + = Problem: Two students A and B have started a game to win a prize. He will win the prize that will be able to get head first if an unbiased coin is tossed once. Find the probability of winning the prize by (i) A, (ii) B, if A starts the game.

Solution: The sample space of the experiment is S: H, TH, TTH, TTTH, TTTTH, TTTTTH, A B A B A B Here, the sample points H, TTH, TTTTH, . are favor of winning by A. P(A) = P(H) + P(TTH) + P(TTTTH) + . = + + + = = P(B) = 1 P(A) = 1 =

Problem: In a box there are 30 bulbs. The bulbs are identified by identity number 1 to 30. One bulb is selected at random. Find the probability that the number of selected bulb has the identity number (i) either multiple of 3 or 5, (ii) either multiple of 5 or 7, (iii) even under the condition that it is multiple of 3.

Solution: One bulb is selected randomly from 30 bulbs in n = = 30 ways.(i) Let, A: multiple of 3, m=10; where, A={3, 6, 9, 12, 15, 18, 21, 24, 27, 30} B: multiple of 5, m=6; where, B = {5, 10, 15, 20, 25, 30} For AB, m=2; where, AB = {15, 30}P(AB)= P(A) + P(B) P(AB) = + = =

(ii) Let, B: multiple of 5, m=6; where, B = {5, 10, 15, 20, 25, 30} C: multiple of 7, m=4; where, C = {7, 14, 21, 28} For BC, m=0; where, BC = P(BC) = P(B) + P(C) = + = = P(BC) is called Additional Rule of Probability for two mutually exclusive events.(iii) Let, D: even no., m=15; where, D={2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30} A: multiple of 3, m=10; where, A= {3, 6, 9, 12, 15, 18, 21, 24, 27, 30} For DA, m=5; where, DA = {6, 12, 18, 24, 30} P (D/A) = = =

Problem: In a bag there are 20 cards bearing numbers 1 to 20. Three cards are taken at random and these are arranged in ascending order. Find the probability that the card in second position bears the number 12. Solution: Three cards are taken randomly from 20 cards in n = = 1140 ways.Let, A be the event that there is one card numbered below 12 and one card numbered above 12. There are 11 cards bearing number 1 to 11(12) and there are 8 cards bearing number 13 to 20(12). Number of favorable cases to A are m = = 88 ways. P(A) = = =

Problem: In an area there are 4 centers to send e-mails. Four persons have gone there to send mails. Find the probability that (i) all 4 persons have entered in the same center, (ii) four persons have entered in 4 different centers.

Solution: Four persons can be gone 4 centers to send e-mails in n = = 256 ways.(i) Let, A be the event that 4 persons have entered in the same center. This can be done in m = 4 ways. P(A) = = = (ii) Let, B be the event that 4 persons have entered in 4 different centers.This can be done in m = 4! = 24 ways. P(A) = = =

Problem: From a pack of well shuffled 52 cards 2 cards are taken at random. Find the probability that (i) all are aces, (ii) all are kings, (iii) all are spades, (iv) one is spade and one is club, (v) all cards are of same color, (vi) all cards are of same number.

Solution: Two cards are taken randomly from 52 cards in n = = 1326 ways.(i) Let, A: two cards are aces. There are 4 aces. Where, m= = 6 ways.P(A) = = = (ii) Let, B: two cards are kings. There are 4 kings. Where, m= = 6 ways(iii) P(B) = = = (iv) Let, C: two cards are spades. There are 13 spades. Where, m= = 78 ways.P(C) = = = (v) Let, D: one is spade and one is club. There are 13 spades and 13 clubs. Where, m= = 169 ways. P(D) = = = (vi) Let, E: two cards are of same color. There are 26 cards of black color and 26 cards of red color. Two black cards are drawn in = 325 ways. Similarly, two red cards are drawn in 325 ways. Where, m= 325+325 = 650 ways. P(E) = = = (vii) Let, F: two cards are of same number. There are 4 varieties card. Each variety has 13 cards. Two cards of any one number from 4 varieties can be drawn in = 6 ways. Since, there are 13 number of one variety, two cards of same number can be drawn in m=136 = 78ways. P(F) = = =

Problem: Eighty five per cent e-mails sent from a cyber cafe reach to the destination properly. Once 3 mails are checked randomly, Find the probability that (i) all 3 reach properly, (ii) two reach properly, (iii) at least one reaches properly, (iv) at best two reach properly.

Solution: Let, R = reach properly the e-mail and N = not reach properly the e-mail. Sent of 3 e-mails can occur in n = = 8 ways. The sample space is- S = { RRR, RRN, RNR, RNN, NRR, NRN, NNR, NNN } Given, P(R) = 0.85 and P(N) = 10.85 = 0.15. So, R and N are not equally.

(i) Let, A: all 3 reach properly; where, A = {RRR}P(A) = P(RRR) = 0.850.850.85 = 0.614125

(ii) Let, B: two reach properly; where, B = {RRN, RNR, NRR} P(B) = P(RRN)+ P(RNR)+ P(NRR) = (0.850.850.15)+(0.850.150.85)+(0.150.850.85) = 0.325125(iii) Let, C: at least one reaches properly; where, C = {RRR, RRN, RNR, RNN, NRR, NRN, NNR} = {NNN} P(C) =1- P () = 1- P(NNN) = 1- (0.150.150.15) = 0.996625

(iv) Let, D: at best two reach properly; where, D = {RRN, RNR, RNN, NRR, NRN, NNR, NNN} = {RRR} P(D) =1- P () = 1- P(RRR) = 1- (0.850.850.85) = 0.385875

Problem: In an office there are 30 computers, out of which 12 are Philips and 18 are Samsung. The computers are investigated and found that 8 Philips and 12 Samsung computers are good. One computer is selected at random. Find the probability that the selected computer is (i) Philips given that it is not good, (ii) Samsung and good, (iii) Samsung or good.

Solution: Let, P: Philips computer, : Samsung computer G: Good, : Not Good

GPG Total

P

8 4

12 612

18

Total20 3030

(i) P(P/) = = = (ii) P(G) = = (iii) P(G) = P() + P(G)P(G) = + = =

Bayes theorem: Let, S is the sample space having n equally likely outcomes. With some of outcomes let us define an event E. With some of outcomes of E we can define, separate mutually exclusive events H1, H2, .. Hk . Then-

.HkH2H1

.HkEH1EH2E S

E We have, E = E + E + .. + E P(E) = P(E) + P(E) + . + P(E)Again, P(E/) = ; i = 1, 2, ., k P(E) = P(E/)Now, Bayes theorem states that, P(/E) = ; i = 1, 2, ., k

Problem: In a box there are 70% mathematics books and 30% electrical engineering books. Among mathematics books 40% are foreign books and among electrical engineering books 50% are foreign books. A foreign book is selected. What is the probability that the selected one is an electrical engineering book?

Solution: Let, E: Foreign book : Mathematics book : Electrical engineering bookGiven, P() = 0.7, P(E/) = 0.4; P(E) = P(E/) = 0.70.4 = 0.28 P() = 0.3, P(E/) = 0.5; P(E) = P(E/) = 0.30.5 = 0.15 P(E) = P(E) + P(E) = 0.28 + 0.15 = 0.43 So, P(/E) = = =

Sample questions

1. Two digits are (a) randomly, (b) one by one with replacement, (c) one by one without replacement selected from the digits 1 , 2, 3, 4, 5 . Find the probability that (i) both the digits will be odd number, ( ii ) sum of the digits will be even.

2. A student becomes successful in 80% cases to write a program. One day he is asked to write 3 programs. Find the probability that (i) all 3 programs are written successfully, (ii) exactly two programs are written successfully, (iii) at best 2 programs are written successfully, (iv) no program is written successfully.

3. Out of 20 electrical installations 12 are installed by Company A and 8 are installed by Company B. Eight installations of A and 6 installations of B served well. One installation is chosen at random to observe its performance. Find the probability that the selected installation is- ( i ) of Company A under the condition that its performance is good, ( ii ) of Company B and its performance is not good, (iii ) performing well, (iv) either an installation of Company A or an installation of good service.

4. In an electrical installation there are 80% graduates who are of the field EEE and 20% are of other fields. Ten per cent of EEE and 20% of other fields are not satisfied with the authority. Once one unsatisfied graduate is identified. Find the probability that he is a graduate of EEE.

5. Signals are sent from Station - 1 and Station - 2. Fifty signals from Station- 1 and 30 signals from Station - 2 are sent. It is known that 20% sent from Station - 1 and 30% sent from Station - 2 do not reach properly. One day on random investigation it is found that one signal is not reached properly. Find the probability that the signal is sent from Station - 2.6. In a class there are 32 male and 8 female students. Eight students are selected at random. Find the probability that (i ) all 8 are female students, ( ii ) all are male students, ( iii ) five male and 3 female students.

7. Two students A and B have started independently to develop a program to solve a mathematical problem. It is known that A becomes successful in 80% cases and B becomes successful in 60% cases. Find the probability that- ( i ) the program will be developed, ( ii ) A becomes successful under the condition that B fails, ( iii ) both of them fail.

8. In an office there are 15 Philips computers, out of which 5 are defective. Four computers are selected at random. Find the probability that, out of 4 computers, 2 are defective and 2 are good computers.

9. In a class there are 40 students. They are identified by the serial number 1 to 40. One student is selected at random. Find the probability that the identification number of the selected student is either multiple of 3 or multiple of 5.

10. From a pack of 52 cards one card is selected at random. Find the probability that the selected card is an ace under the condition that it is a spade.

11. In a mobile operators office 12 electrical engineers and 8 computer engineers are working. Among electrical engineers 5 are experienced and 7 are newly appointed. The corresponding figures among computer engineers are 3 and 5. One of the engineers is selected at random. Find the probability that the selected one is experienced under the condition that he is computer engineer.

12. Two unbiased dice are thrown once. Find the probability that- (i) both dice show same number, (ii) first die shows even number, (iii) both dice show even number, (iv) sum of the upper faces of the dice is 8 or more, (v) sum of the upper faces of the dice is above 10, (vi) sum of the upper faces of the dice is less than 7, (vii) second dice shows number 5 or more.

13. In a packet there are 6 books, three of which are on mathematics and 3 are on statistics. Two books are taken at random. Find the probability that- (i) the drawn books are on mathematics, (ii) the drawn books are on statistics, (iii) one of the drawn books is on mathematics and another one is on statistics.

14. An urn contains 6 red and 4 black balls. Three balls are taken at random from the urn. Find the probability that- (a) all three are red, (b) two balls are red, (c) one ball is red.

15. In a box there are 30 tickets numbered 1, 2, 3, 4, 5, , 30. Five tickets are drawn at random from the box and these are arranged in ascending order. Find the probability that the ticket in third position bears the number 20.

16. In a university 70% students are from city centre and 30% are from outside city. Among the students of city centre 90% wear ties. The corresponding percentage among students outside city is 50%. Find the probability that a student wear a tie comes from out of the city.

17. A surgeon operates 70% male patients and 30% female patients. If in a day he operates 3 patients, what is the probability that- (i) 3 male patients are operated, (ii) at best 2 male patients are operated, (iii) no male patient is operated, (iii) at least one male patient is operated? If the male and female patients are equiprobable, find the probabilities of these events.

Probability distribution: It is the distribution of random variable. Since, random variable is of two types- (i) Discrete random variable and (ii) Continuous random variable; thats why probability distribution is also two types. These are (i) Discrete Probability Distribution and (ii) Continuous Probability Distribution.

Under Discrete probability distribution, we will learn(a) Binomial distribution (b) Poisson distributionUnder Continuous probability distribution, we will learn(a) Normal distribution (b) Exponential distribution (c) Rayleigh distribution

Binomial distribution: Binomial experimental results are of two types, usually denoted by Success S with probability p and Failure F with probability q = 1p. Assume that, out of n trials there are x success and n-x failures. If X is a discrete random variable indicating the number of successes, then probability distribution of x successes is called Binomial distribution. It is given by

P(X=x) = ) ; x=0, 1, 2, ., n .

Here, n is finite no. and p+q =1. Mean, E(X) = np and variance, V(X) = npq.

Examples of Binomial distribution: Randomly selected products of an industry, where products are either good or defective. Randomly selected literate or illiterate persons of a society. Randomly selected candidates who are successful or not successful in getting a job. Randomly selected inured or non-injured industrial worker. Randomly selected machines operated successfully or not. Problem: A group of students are equally well trained to develop a program and all of them have 50 % chance to develop the program successfully. Ten students are selected at random and they are asked to develop the program separately. Find the probability that- (i) none becomes successful, (ii) five become successful, (iii) at best 1 becomes successful, (iv) at least 2 become successful.

Solution: Let, X be the number of successful students. The probability of success a student to develop a program is p= 0.5, q=1p = 0.5 and n=10.Therefore the probability distribution of X is

P(X=x) = = = ; x=0, 1, 2, . ,10 (i) P(X=0) = = 0.00098(ii) P(X=5) = = 0.2461(iii) P(X1) = [ P(X=0)+P(X=1) ] = [ + ] = 0.0107(iv) P(X 2) = 1 P(X2) = 1 [ P(X=0)+P(X=1) ] = 1 0.0107 = 0.9893 Problem: Eighty percent devices of a workshop work properly. One day 10 devices are selected at random. Find the probability that, out of 10 devices, (i) all work properly, (ii) at best 2 works properly, (iii) at least 3 work properly, (iv) 2 to 4 work properly. What are expected number of devices and variance of the number of devices which work properly?

Solution: Let, X be the number of properly worked devices. The probability of properly worked of a device is p= 0.8, q=1p=0.2 and n=10. Therefore the probability distribution of X is

P(X=x) = = ; x=0, 1, 2, . , 10(i) P(X=10) = = 0.1073(ii) P(X2) = P(X=0) + P(X=1) + P(X=2) = + + = 0.000078(iii) P(X 3) = 1 P(X3) = 1 [ P(X=0)+P(X=1)+P(X=2) ] = 1 0.000078 = 0.99992 [ using no.(ii) ](iv) P(2X4) = P(X = 2) + P(X = 3) + P(X = 4) = + + = 0.006365 Expected no. of devices is, E(X) = np = 100.8 = 8 and V(X) = npq = 100.80.2 = 1.6

Poisson distribution: Poisson distribution is a limiting case of the binomial distribution under the following conditions:a. n (large)b. p (very small)c. The mean of binomial distribution is np = ; where, is finite and positive real number i.e. 0.So, = np = mean/expected no. of successes, p = and q = 1

If X is a Poisson random variable, then the Poisson distribution is given by

P(X=x) = ; x= 0, 1, 2, 3 E(X) = = V(X) , = , = 3+

Examples of Poisson distribution:

Number of defective materials produced in an industry. Number of telephone calls received at a particular time in a telephone exchange. Number of wrong connections received at a telephone exchange. Number of printing mistakes at each page of a book. Number of faded out signals sent from a station.

Problem: The average number of signals sent from a station is 3 per day which do not reach properly to the another station. Find the probability that the signals which are sent in a day but not reached properly are ( i ) 3 , ( ii ) at best 2, ( iii ) at least 3.

Solution: Let, X be the no. of not properly reached signals. Given, = 3.Then, the probability distribution of X is- P(X=x) = = ; x= 0, 1, 2, 3.(i) P(X=3) = = 0.22404(ii) P(X2) = P(X=0)+ P(X=1)+ P(X=2) = [ + + ] = 0.42319(iii) P(X3) = 1 P(X 3) = 1 P(X3) = 1 [ P(X=0)+ P(X=1)+ P(X=2) ] = 1 0.42319 = 0.57681 [ using no.(ii) ]

Problem: Two percent mobile sets produced by a company are usually found defective. The company produces 200 sets per day. Find the probability that in a days production there will be (i) 4, (ii) at best 2, (iii) at least 3 defective sets.

Solution: Let, X be the no. of defective mobile sets. Given, n = 200 and p = = 0.02. So, = np = 200 0.02 = 4. Then, the probability distribution of X is P(X=x) = = ; x= 0, 1, 2, 3(i) P(X=4) = = 0.19537(ii) P(X2) = P(X=0)+ P(X=1)+ P(X=2) = [ + + ] = 0.2381(iii) P(X3) = 1 P(X 3) = 1 P(X3) = 1 [ P(X=0)+ P(X=1)+ P(X=2) ] = 1 0.2381 = 0.7619 [ using no.(ii) ] Normal Distribution: Normal distribution is the limiting case of binomial and Poisson distribution i.e. . Let, X be a continuous random variable. This X is called normal variable if its probability density function is given byf(x) = ; x where, E(X) = , V(X) = , = 0 , = 3.

This distribution is called Normal distribution. It is usually written as XN( , ).

Examples: Ages, height, weight, time, lifetime of light bulbs etc; if these are observed from random experiment.

Characteristics of Normal distribution: The normal curve is bell-shaped as shown below- x = Mean, Median and Mode of normal distribution coincide. The normal curve is symmetric and mesokurtic, i.e. = 0, = 3. If XN( , ), then Z = N( , 1). This Z is called standard normal variable. If 95% values of any continuous random variable fall in the limit 2 (i.e. 2 to +2), then this variable is called normal variable, and follows normal distribution.

Problem: The life length (in days) of electric bulb follows N (400, 5000). Find the probability that the life length of a randomly selected bulb is (i) less than 450 days, (ii) more than 350 days, (iii) between 300 to 500 days.Solution: Let, X be the life length (in days) of electric bulb.

Since, X N (400, 5000); we have, = 400, =5000 = = 70.71

(i) P(X450) = P ( = P (Z 0.71) = 0.2389(ii) P(X350) = P ( ) = P (Z 0.71) = 1 P (Z 0.71) = 1 0.2389 = 0.7611(iii) P(300X500) = P ( ) = P (1.41 Z 1.41) = P (Z 1.41) P (Z 1.41) = 0.9207 0.0793 = 0.8414

Exponential distribution: If X is a continuous random variable, then the exponential distribution is given by : f(x) = ; x0, 0Here, X = service time/length of time/waiting time = average of service time/length of time/ waiting time. For this distribution; E(X) = , V(X) = , = 4, = 9.For Math:(i) P( X x) = dx = dx = [ = (0) = (ii) P( X x) = 1 (iii) P( X ) =

The another forms of exponential distribution area. f(x) = ; x0, =1b. f(x) = ; x0, = Examples of Exponential distribution: Time needed to send an email from a cyber caf. Time needed to receive a signal. Time needed to get a service in a bank. Time needed to repair a mobile phone in a repairing shop.

Problem: The average time needed to open a computer is 2 minutes. Find the probability that a computer will be opened (i) after 3 minutes, (ii) before 2 minutes, (iii) within 2 to 3 minutes. If a man is in queue for half - an hour, what is the probability that he will be able to open the computer- (iv) after 35 minutes, (v) before 35 minutes, (vi) within 32 to 35 minutes.

Solution: Let, X be the time needed to open a computer. Given, = 2.

(i) P( X 3) = = = = 0.2231(ii) P( X 2) = 1 = 1 = 0.63212(iii) P( X ) = = 0.14478(iv) As the person is in queue for 30 minutes to open the computer, he will open the computer after (3530) = 5 minutes. P( X 5) = = = = 0.08208(v) He will open the computer before (3530) = 5 minutes.P( X 5) = 1 = 1 = 1 0.08208 = 0.91792(vi) He will open the computer within (3230) = 2 minutes to (3530) = 5 minutes. P( X ) = = 0.2858[Theory: The exponential distribution is also called distribution of Lack of Memory, since before the occurrence of the event the experimenter may need to wait for some time which is not counted for the occurrence of the event (queue time).]

Rayleigh distribution: If X is a continuous random variable, then the Rayleigh distribution is given by : f(x) = ; x0, 0 = Mode of the Rayleigh distribution and it is given by = = For math:a. P( X x) = dx = dx = b. P( X x) = 1 c. P( X ) = Examples: Wind speed, Power of signals etc.

Problem: The average wind speed of a day is 4.5(knot). Find the probability that in a randomly selected day, the wind speed (i) will exceed 4 knot, (ii) will be less than 3, (iii) will be between 2 to 5 knot.

Solution: Let, X be the wind speed. Given, = 4.5; then, = = = 3.59(i) P( X 4) = = 0.5376(ii) P( X 3) = 1 = 0.2947(iii) P( X ) = = 0.4772

Problem: The mode of the density of faded out signal is 0.5. Find the probability that the density will be (i) more than 0.8, (ii) less than 0.4, (iii) between 0.4 to 0.6.Solution: Let, X be the density of faded out signal. Given, = 0.5.(i) P( X 0.8) = = 0.278(ii) P( X 0.4) = 1 = 0.2739(iii) P( X ) = = 0.2393

Sample questions

1. Write down 5 examples of each of binomial variable, Poisson variable, normal variable, exponential variable, and write 2 examples for Rayleigh variable.

2. Under what conditions a binomial variable transforms to Poisson variable? 3. How would you decide that a continuous variable would follow normal distribution? Write down the characteristics of normal distribution.

4. Eighty percent computers of a laboratory are good. Once 8 computers are selected at random. Find the probability that, out of 8 computers; (i) at least 2 are good, (ii) 3 are good, (iii) at best 3 are good, (iv) 2 to 4 are good computers. Find mean and variance of number of good computers.

5. Seventy percent emails received by an organization are from foreign countries. Ten emails are randomly selected. Find the probability that, out of 10 emails; (i) at best 3 are from foreign countries, (ii) at least 2 are from foreign countries, (iii) 4 are from foreign countries, (iv) none from foreign countries. Find mean and variance of number of received emails from foreign countries.

6. Two percent signals sent from a station dont reach properly. The station sends 100 signals per hour. Find the probability that in a randomly selected hour- (i) more than 3 signals will not reach properly, (ii) less than 2 signals will not reach properly, (iii) 5 signals will not reach properly.7. Three percent computers of a company are found defective. The company produces 100 computers per day. Find the probability that in a randomly selected day- (i) at least 2, (ii) at best 3, (iii) 1 to 4 are defective computers.

8. Two percent items produced by a company are usually found defective. The company produces 200 items per hour. Find the probability that in a randomly selected hour there will be- (i) at least 3 defective items, (ii) at best 2 defective items, (iii) 2 to 4 defective items.

9. The average number of defective circuits produced in an industry per day is 3. Find the probability that in a randomly selected day there will be- (i) at least 2, (ii) at best 3, (iii) 1 to 3 defective circuits.

10. The average number of faded out signals sent from a station per day is 4. Find the probability that in a randomly selected day- (i) more than 2, (ii) less than 3, (iii) 2 to 4 signals will be faded out.

11. The average number of defective items produced in an industry is 3 per hour. Find the probability that in a randomly selected hour- (i) at least 3, (ii) at best 2, (iii) 3 defective items will be produced.

12. The average number of defective spare parts produced by a company is 3 per hour. Find the probability that in a randomly selected hour- (i) at least 2, (ii) at best 3, (iii) 5 will be defective spare parts will be produced.

13. The time ( in minutes) to write a program follows N (30, 505). Find the probability that a program will be written- (i) after 25 minutes, (ii) before 30 minutes, (iii) between 20 to 25 minutes.

14. The working hours of computers follows N (8.5, 4.6). Find the probability that the working hours of a randomly selected computer will be- (i) more than 10 hours, (ii) less than 7 hours, (iii) between 12 to 15 minutes.

15. The amount of CO in air (mm/m3) follows N(0.8, 0.56). Find the probability that in a randomly selected day the amount of CO will- (i) exceed 0.52, (ii) below 0.57, (iii) between 0.50 to 0.58.

16. The amount of polluted matter (parts/m3) in air follows N (2.5, 6.75). Find the probability that in randomly selected day the polluted matter will- (i) exceed 2.0, (ii) below 3.0, (iii) between 1.5 to 2.5.

17. The consumption of electricity per day (in MW) follows N (3800, 45000). Find the probability that in a randomly selected day the consumption (i) exceeds 4500 MW, (ii) is less than 3700 MW, (iii) is between 3500 MW to 4200 MW.

18. The amount of arsenic (gm/liter) follows N (1.2, 4.5); find the probability that the amount of arsenic will (i) exceed 0.8, (ii) below 1.0, (iii) between 1 to 2.

19. The average time needed to get an internet connection is 5 minutes. If a man is in queue for 15 minutes, what is the probability that he will get connection- (i) before 18 minutes, (ii) after 20 minutes, (iii) between 17 to 20 minutes?

20. The average time (in minutes) to get a service in a mobile operators office is 30. Find the probability that a man will be served before 20 minutes.If the man is in queue for one hour, what is the probability that he will be served- (i) after 2 hours, (ii) before 3 hours, (iii) between 3 to 5 hours?

21. The average time needed to send an email is 15 minutes. If a man is in queue for 45 minutes, what is the probability that he will be able to send email (i) before 70 minutes (ii) after 80 minutes, (iii) between 50 to 70 minutes?

22. The average time needed to send an email is 12 minutes. Find the probability that an email will be sent- (i) before 15 minutes, (ii) after 20 minutes, (iii) between 15 to 20 minutes?

23. The average time needed to repair a mobile phone set is 1 hour. Find the probability that a set will be repaired ( i ) after 1 hour 30 minutes , ( ii ) by 2 hours , ( iii ) between 1 hour 20 minutes to 1 hour 40 minutes . If a customer is in queue for 1 hour, what is the probability that his set will be repaired within 90 minutes?

24. The average power of signal is 1.2. Find the probability that the power of signal will be- (i) between 0.8 and 1.4, (ii) before 1.2, (iii) after 1.6.

25. The average power (x watts) of signal is 3. Find the probability that the power of the signal will be (i) less than 2, (ii) between 2.5 and 3.5, (iii) more than 3.

32


Recommended