+ All Categories
Home > Documents > Chapter 7: Statistical Analysis with Excel

Chapter 7: Statistical Analysis with Excel

Date post: 06-Jan-2016
Category:
Upload: danton
View: 55 times
Download: 7 times
Share this document with a friend
Description:
Spreadsheet-Based Decision Support Systems. Chapter 7: Statistical Analysis with Excel. Prof. Name [email protected] Position (123) 456-7890 University Name. Overview. 7.1 Introduction 7.2 Understanding Data - PowerPoint PPT Presentation
75
Chapter 7: Statistical Analysis with Excel Spreadsheet-Based Decision Support Systems Prof. Name [email protected] Position (123) 456-7890 University Name
Transcript
Page 1: Chapter 7: Statistical Analysis with Excel

Chapter 7: Statistical Analysis with Excel

Spreadsheet-Based Decision Support Systems

Prof. Name [email protected] (123) 456-7890University Name

Page 2: Chapter 7: Statistical Analysis with Excel

2

Overview

7.1 Introduction 7.2 Understanding Data 7.3 Relationships in Data 7.4 Distributions 7.5 Summary

Page 3: Chapter 7: Statistical Analysis with Excel

3

Introduction

Performing basic statistical analysis of data using Excel functions

Statistical features of the Data Analysis ToolPak

Trend curves for analyzing data patterns

Basic linear regression techniques in Excel

Several different distribution functions in Excel

Page 4: Chapter 7: Statistical Analysis with Excel

4

Understanding Data

Statistical Functions

Descriptive Statistics

Histograms

Page 5: Chapter 7: Statistical Analysis with Excel

5

Statistical Functions

AVERAGE– Finds the mean of a set of data.– =AVERAGE(range or range_name)

MEDIAN– Finds the middle number in a list of sorted data.– =MEDIAN(range or range_name)

STDEV– Finds the standard deviation of a set of data.– This is equal to the square root of the variance, which measures the

difference between the mean of the data set and the individual values.=STDEV.P(range or range_name)=STDEV.S(range or range_name)

Page 6: Chapter 7: Statistical Analysis with Excel

6

Figures 7.1 and 7.2

Page 7: Chapter 7: Statistical Analysis with Excel

7

Figures 7.3 and 7.4

Page 8: Chapter 7: Statistical Analysis with Excel

8

Data Analysis ToolPak

An Excel Add-In which includes several statistical analysis techniques

To ensure that it is an active Add-in: – Display Excel Options dialog box:

Select Options from the list of options in the File tab.

– Select the Add-Ins tab on the left side of the dialog box.

– Select Analysis ToolPak listed on the Add-ins window.

Page 9: Chapter 7: Statistical Analysis with Excel

9

Descriptive Statistics

Provides a list of statistical information about your data set including– Mean

– Median

– Standard deviation

– Variance

Click on Data > Analysis > Data Analysis command to display the Data Analysis dialog box.

Choose the Descriptive Statistics option and click OK.

Page 10: Chapter 7: Statistical Analysis with Excel

10

Descriptive Statistics (cont’d)

The Input Range refers to the location of the data set.

Check the option button Columns or Rows to indicate how your data is grouped.

If there are labels in the first row of each column of data, then check the Labels in First Row box.

The Output Range refers to where the results of the analysis will be displayed in the current worksheet.

Check the Summary Statistics box to calculate the most commonly used statistics.

Page 11: Chapter 7: Statistical Analysis with Excel

11

Figure 7.7

Quarterly stock returns for three different companies are recorded. We want to know – Average stock return

– Variability of stock returns

– Which quarters had the highest and lowest stock returns

Page 12: Chapter 7: Statistical Analysis with Excel

12

Figures 7.8 and 7.9

Page 13: Chapter 7: Statistical Analysis with Excel

13

Figure 7.10

Almost all of the data points lie between +2s and –2s from the mean.

Outliers are data that are inconsistent with the main pattern of data.

Page 14: Chapter 7: Statistical Analysis with Excel

14

Figure 7.11

The standard deviation is used to identify outliers in a data set.

Page 15: Chapter 7: Statistical Analysis with Excel

15

Figure 7.12

Conditional Formatting with the Formula Is option is used to identify outliers.– Select the column of values in the data set; and fill in the Conditional

Formatting dialog box to highlight outlier points.

Page 16: Chapter 7: Statistical Analysis with Excel

16

Figure 7.13

The cell that holds an outlier is highlighted.

Page 17: Chapter 7: Statistical Analysis with Excel

17

More Descriptive Statistics

Confidence Level for Mean– The mean is calculated using the specified confidence level (for example, 95% or

99%), the standard deviation, and the size of the sample data.– The confidence level and calculated mean are then added to the analysis report.– You can compare the actual mean to this calculated mean based on the specified

confidence level.

Kth Largest– Gives the largest ranked data value for a specified value of k.– For k = 1, the maximum data value would be returned.

Kth Smallest– Gives the smallest ranked data value for a specified value of k.– For k = 1, the minimum data value would be returned.

Page 18: Chapter 7: Statistical Analysis with Excel

18

Descriptive Statistics Functions

PERCENTILE.INC – Returns a value for which a desired percentile k of the specified data_set falls

below.= PERCENTILE.INC(data_set, k)

For example, for the MSFT data, the value for which 95% of the data falls below is

=PERCENTILE.INC(B4:B27,0.95) = 0.108

PERCENTILE.EXC – Excludes the value of k-th percentile from the calculations

= PERCENTILE.EXC (data_set, k)

For the MSFT data, the value for which 95% of the data falls below is=PERCENTILE.EXC(B4:B27,0.95) = 0.135

Page 19: Chapter 7: Statistical Analysis with Excel

19

Descriptive Statistics Functions (cont’d)

PERCENTRANK.INC – Returns the percentile of the data_set which falls below a given value.

=PERCENTRANK.INC(data_set, value)

For example, percent of the MSFT data falls below the value 0.108, inclusive of 0.108 is

=PERCENTRANK.INC(B4:B27, 0.108) = 0.95, or 95%

PERCENTRANK.EXC – Calculates the same percentile, exclusive of the value of k.

=PERCENTRANK.EXC(data_set, value)

For example, percent of the MSFT data falls below the value 0.135, exclusive of 0.135 is

=PERCENTRANK.EXC(B4:B27, 0.135) = 0.95, or 95%

Page 20: Chapter 7: Statistical Analysis with Excel

20

Histograms

Histograms calculate the number of occurrences, or frequency, with which values in a data set fall into various intervals.

Choose the Histogram option from the Analysis ToolPak list.

Page 21: Chapter 7: Statistical Analysis with Excel

21

Histograms (cont’d)

The Input Range is the range of the data set.

The Bin Range is used to specify the location of the bin values. – Bins are the intervals into which values can fall; they can be defined by the

user or can be evenly distributed among the data by Excel.

The Output Range is the location of the output, or the frequency calculations for each bin.

The chart options include a simple Chart Output (the actual histogram), Cumulative Percentage for each bin value, and a Pareto organization of the chart.

Page 22: Chapter 7: Statistical Analysis with Excel

22

Figures 7.15 and 7.16

Page 23: Chapter 7: Statistical Analysis with Excel

23

Figures 7.17 and 7.18

To create your own bin values, make a list of upper bounds for each interval.

Page 24: Chapter 7: Statistical Analysis with Excel

24

Figure 7.19

Page 25: Chapter 7: Statistical Analysis with Excel

25

Histograms (cont’d) To change the format of a Histogram:

– Click on the histogram to activate the Chart Tools contextual tabs.

– Use the commands listed on these tabs to change the design, layout and format of the histogram.

Page 26: Chapter 7: Statistical Analysis with Excel

26

Histograms (cont’d)

There are four basic shapes to a histogram: – Symmetric: has peaks and dips with equal amplitude

A curve with only one peak is also symmetric; that is, there is a central high part and almost equal lower parts to the left and right of this peak.

– Positively skewed: has a peak on the left and many lower points (stretching) to the right.

– Negatively skewed: has a peak on the right and many lower points (stretching) to the left.

– Multiple peaks: imply that more than one source, or population, of data is being evaluated.

Page 27: Chapter 7: Statistical Analysis with Excel

27

Relationships in Data

Trend Curves

Regression

Page 28: Chapter 7: Statistical Analysis with Excel

28

Data Relationships

Relationships in data are usually identified by comparing two variables: the dependent variable and the independent variable.

– The dependent variable is the variable we are most interested in. By understanding its current behavior we can better predict its future behavior.

– The independent variable is the variable we use as the comparison in order to make this prediction.

Page 29: Chapter 7: Statistical Analysis with Excel

29

Trend Curves

Trend curves are used to graph and analyze these relationships between data.

Trend curves graph the data with – The independent variable on the x-axis

– The dependent variable on the y-axis

To add a trend curve to your chart:– Click on the data points in an XY Scatter chart to activate Chart Tools

contextual tabs.

– Click on the Chart Tools Layout > Analysis > Trendline command.

– Select a trend curves from the trendlines options listed.

Page 30: Chapter 7: Statistical Analysis with Excel

30

Trend Curves (cont’d)

There are six types of trend curves which Excel can model:– Exponential

– Linear

– Logarithmic

– Polynomial

– Power

– Moving Average

Page 31: Chapter 7: Statistical Analysis with Excel

31

Trend Curves (cont’d)

Double click on a trendline to activate the Format Trendline dialog box.

We can modify: – The type of the trendline by

selecting one of the options listed.

– The trendline’s name.

We can specify a period forward or backward for which we want to predict the behavior of our dependent variable.

Page 32: Chapter 7: Statistical Analysis with Excel

32

Linear Trend Curves

Number of Units Produced each month and the corresponding Monthly Plant Cost are recorded.

The company needs to estimate plant costs based on the planned production amounts.

The dependent variable is therefore the Monthly Plant Cost and the independent variable is the Units Produced.

Page 33: Chapter 7: Statistical Analysis with Excel

33

Figure 7.25

Begin this analysis by making an XY Scatter chart of the data.

Page 34: Chapter 7: Statistical Analysis with Excel

34

Figure 7.26

Right-click on any of the data points and choose Add Trendline from the short-cut menu.

The Format Trendline dialog box appears.

Select Linear from the Types listed.

Select Display Equation on Chart checkbox.

Page 35: Chapter 7: Statistical Analysis with Excel

35

Figure 7.27

The trendline and the equation are then added to the chart.

Page 36: Chapter 7: Statistical Analysis with Excel

36

Figure 7.28

Use the displayed equation to predict future values.

Check the accuracy of the equation by calculating the error from the known data.

Linear trends have the relationship: y = a*x - b

Page 37: Chapter 7: Statistical Analysis with Excel

37

Figure 7.29

Copy the formula for “Predicted Cost” to the rest of the rows to calculate the predicted monthly costs.

Page 38: Chapter 7: Statistical Analysis with Excel

38

Exponential Trend Curves

Sales data for ten years is recorded.

We want to predict sales for the next few years.

The independent variable is Years and our dependent variable is Sales.

Page 39: Chapter 7: Statistical Analysis with Excel

39

Figure 7.31

Exponential trends have the following relationship: – y = a*e^(b*x) or

– y = a*EXP(b*x)

Build a XY Scatter chart of the data.

Right-click on a data point to add the trendline.

Choose the Exponential curve to fit the data.

Page 40: Chapter 7: Statistical Analysis with Excel

40

Figures 7.32 and 7.33

Page 41: Chapter 7: Statistical Analysis with Excel

41

Figure 7.34

We use the formula to predict sales values for future years.

However, the Exponential trend curve has a sharply increasing slope that may not be accurate for many situations.

Page 42: Chapter 7: Statistical Analysis with Excel

42

Power Trend Curves

We are given yearly Production values and yearly Unit Cost for production.

We want to determine the relationship between Unit Cost and Production in order to predict future Unit Costs.

Page 43: Chapter 7: Statistical Analysis with Excel

43

Figure 7.36

Power trends have the relationship: y = a*x^b

Begin by creating the XY Scatter chart.

Right-click on a data point to add a trendline.

Choose a Power curve to fit the data.

Page 44: Chapter 7: Statistical Analysis with Excel

44

Figures 7.37 and 7.38

Page 45: Chapter 7: Statistical Analysis with Excel

45

Regression Analysis

We can use some regression analysis parameters to ensure that the relationships we have chosen for our data are “good” fits.

These parameters include– R-Squared value

– Standard error

– Slope

– Intercept

Page 46: Chapter 7: Statistical Analysis with Excel

46

R-Squared Value

The R-Squared value measures the amount of influence the independent variable has on the dependent variable.

The closer the R-Squared value is to 1, the stronger the relationship is between the independent and dependent variables.

If the R-Squared value is closer to 0, then there may not be a relationship between these two variables.

Page 47: Chapter 7: Statistical Analysis with Excel

47

Figure 7.39

We fit a Linear trendline to the Monthly Plant Cost per Units Produced chart.

The R-Squared value is 0.8137, which is fairly close to 1, implying a good fit.

Page 48: Chapter 7: Statistical Analysis with Excel

48

Figure 7.40

We fit an Exponential trendline to the Sales per year chart.

The R-Squared value is 0.9828, which is fairly close to 1, implying a sound fit.

Page 49: Chapter 7: Statistical Analysis with Excel

49

Figure 7.41

We fit a Power trendline to the Unit Cost per Cumulative Production chart.

The R-Squared value is 0.9062, which is fairly close to 1, implying a good fit.

Page 50: Chapter 7: Statistical Analysis with Excel

50

Figure 7.42

The RSQ Excel function can calculate the R-squared value from a set of data.– =RSQ(y_range, x_range)

Note that this function only works with Linear trend curves.

Page 51: Chapter 7: Statistical Analysis with Excel

51

Standard Error

The standard error measures the accuracy of any predictions made.

It can be calculated in Excel using the STEYX function=STEYX(y_range, x_range)

This function can also only be used for Linear trend curves.

Page 52: Chapter 7: Statistical Analysis with Excel

52

Slope and Intercept

Two Excel functions can be used with a linear regression line of a collection of data.

SLOPE function=SLOPE(y_range, x_range)

INTERCEPT function=INTERCEPT(y_range, x_range)

Page 53: Chapter 7: Statistical Analysis with Excel

53

Distributions

Many distributions have Excel functions associated with them. – These functions are equivalent to using distribution tables.

– That is, given certain parameters of a set of data for a particular distribution, you would look at a distribution table to find the corresponding area from the distribution curve.

Some common distributions are– Normal

– Exponential

– Uniform

– Binomial

– Poisson

– Beta

– Weibull

Page 54: Chapter 7: Statistical Analysis with Excel

54

Normal Distribution

The parameters for this distribution are simply the value we are interested in finding the probability for, and the mean and standard deviation of the set of data.

The function we use with the Normal distribution is NORM.DIST=NORM.DIST(x, mean, std_dev, cumulative)

Page 55: Chapter 7: Statistical Analysis with Excel

55

Normal Distribution (cont’d)

The cumulative parameter will be seen in many Excel distribution functions.

This parameter can take the values True or False to determine if you want the value returned from the cumulative distribution function or the probability density function, respectively. – The cumulative distribution function (cdf) will find the probability that a value

in the data set is less than or equal to x.

– The probability density function (pdf) will find the probability that a value is exactly equal to x.

Page 56: Chapter 7: Statistical Analysis with Excel

56

Figure 7.45

Annual drug sales at a local drugstore are Normally distributed with a mean of $40,000 and standard deviation of $10,000.

The probability that the actual sales for the year are $42,000 is 0.58, or 58%.

Page 57: Chapter 7: Statistical Analysis with Excel

57

Figure 7.46 What is the probability that annual sales will be between $35,000 and

$49,000?

To find this value, we will subtract the cdf values for these two bounds.– =NORM.DIST(49000, 40000, 10000, True) –

NORM.DIST(35000, 40000, 10000, True)

This will return a 0.51 probability, or 51% chance.

Page 58: Chapter 7: Statistical Analysis with Excel

58

Standard Normal Distribution

The Standard Normal distribution function is a Normal distribution function with mean 0 and the standard deviation 1.

The STANDARDIZE function will convert the x value from a data set with mean not equal to 0 and a standard deviation not equal to 1 into a value which does assume a mean of 0 and a standard deviation of 1.=STANDARDIZE(x, mean, std_dev)

The resulting standardized value is then used as the main parameter in the NORM.S.DIST function=NORM.S.DIST(standardized_x, cumulative)

Page 59: Chapter 7: Statistical Analysis with Excel

59

Figure 7.47

Consider the same example used previously to find the probability that a drugstore’s annual sales are $42,000.

Page 60: Chapter 7: Statistical Analysis with Excel

60

Uniform Distribution

The Uniform distribution does not actually have a corresponding Excel function.

A simple formula can also be used to model this discrete distribution.– 1 / (b – a)

Given that a value x is Uniformly distributed between a and b, we can use this formula to determine the probability that x will take an integer value in this interval.

Page 61: Chapter 7: Statistical Analysis with Excel

61

Figure 7.48

Consider any values for a and b, then use the formula to calculate the Uniform value.

Page 62: Chapter 7: Statistical Analysis with Excel

62

Poisson Distribution

The Poisson distribution has only one parameter, the distribution mean.

The function we use for this distribution is POISSON.DIST =POISSON.DIST(x, mean, cumulative)

The value returned by the Poisson distribution is the probability that the number events which occur within a time interval is either between 0 and x (cdf), or equal to x (pdf).

Page 63: Chapter 7: Statistical Analysis with Excel

63

Figure 7.49

For example, consider a bakery which serves an average of 20 customers per hour.

Find the probability that at most 35 customers will be served in the next two hours.

Page 64: Chapter 7: Statistical Analysis with Excel

64

Exponential Distribution

The Exponential distribution has only one parameter: lambda = 1 / mean of the data set.

The function we use for this distribution is EXPON.DIST=EXPON.DIST(x, lambda, cumulative)

The Exponential distribution is commonly used for modeling interarrival times.

Page 65: Chapter 7: Statistical Analysis with Excel

65

Figure 7.50

Let us use the same example with the bakery data.

Arrival rate is said to be 20 customers per hour.

Interarrival mean, or the Exponential mean, is 1 / arrival rate. Therefore, for this example, the interarrival mean is 1/20 hours per customer arrival.

To find the probability that a customer arrives in 10 minutes, we would set – x = 10/60 = 0.17 hours

– lambda = 1/(1/20) = 20 hours

– =EXPON.DIST(0.17, 20, True)

Page 66: Chapter 7: Statistical Analysis with Excel

66

Binomial Distribution

The Binomial distribution has the following parameters: the number of trials and the probability of a success.

We are trying to determine the probability that the number of successes is less than or equal to (using cdf) or equal to (pdf) some x value.

The function we use for this distribution is BINOM.DIST =BINOM.DIST(x, trials, prob_success, cumulative)

Page 67: Chapter 7: Statistical Analysis with Excel

67

Figure 7.51

Suppose a survey shows that 40 percent of people pay more attention to ads in the newspaper, and 60 percent pays more attention to ads on television.

What is the probability that out of 100 people surveyed, 50 of them respond more to ads on television?

Page 68: Chapter 7: Statistical Analysis with Excel

68

Beta Distribution

The Beta distribution has the following parameters: alpha, beta, A, and B. – Alpha and beta are determined from the data set

– A and B are optional bounds on the x value for which you want the Beta distribution value

The function we use for this distribution is BETA.DIST =BETA.DIST(x, alpha, beta, A, B)

If A and B are omitted, then a standard cumulative distribution is assumed and they are given the values 0 and 1, respectively.

Page 69: Chapter 7: Statistical Analysis with Excel

69

Figure 7.52 Determine the probability that a team can complete a project in 10 days.

Estimated total time needed is1 to 2 weeks; these estimates will be the bound values, or the A and B parameters.

Use a mean and standard deviation of 12 and 3 days to compute the alpha and beta parameters.

Page 70: Chapter 7: Statistical Analysis with Excel

70

Weibull Distribution

The Weibull distribution has two parameters: alpha and beta.

The function we use for this distribution is WEIBULL.DIST =WEIBULL.DIST(x, alpha, beta, cumulative)

The Weibull distribution is most commonly used to model reliability functions.

Page 71: Chapter 7: Statistical Analysis with Excel

71

Figure 7.53

On average, a lightbulb will last 1,200 hours, with a standard deviation of 100 hours. We can use these values to calculate alpha and beta.

We can now use the WEIBULL distribution to determine the probability that a lightbulb will be reliable for 55 days = 1,320 hours.

Page 72: Chapter 7: Statistical Analysis with Excel

72

Inverse Functions

When we build simulation models, we need to generate random numbers in Excel which are within a given distribution.

To accomplish this we must use the inverse function of the corresponding distribution function.

An inverse function returns the inverse of the cumulative probability function.

These functions are listed under the Formulas > Function Library > More Functions drop-down menu on the Ribbon.

Some of the inverse functions of more common distributions are BETA.INV, BINOM.INV, LOGNORM.INV, NORM.INV.

Page 73: Chapter 7: Statistical Analysis with Excel

73

Figure 7.54 The format of the inverse functions is

=DIST.INV(probability, distribution_parameters)

The probability parameter is a number between 0 and 1 associated with the given distribution.

We will use the RAND function as the value for this parameter to generate a number between 0 and 1.

Page 74: Chapter 7: Statistical Analysis with Excel

74

Summary The Analysis ToolPak is an Excel Add-In that includes statistical analysis

techniques such as Descriptive Statistics, Histograms, Exponential Smoothing, Correlation, Covariance, Moving Average, and others.

The Descriptive Statistics option provides a list of statistical information about a data set, including the mean, median, standard deviation, and variance.

Histograms calculate the number of occurrences, or frequency, which values in a data set fall into various intervals.

Relationships in data are usually identified by comparing the dependent variable and the independent variable.

There are six basic trend curves that Excel can model: Exponential, Linear, Logarithmic, Polynomial, Power, and Moving Average.

Some of the more common distributions that can be recognized when performing a statistical analysis of data are the Normal, Exponential, Uniform, Binomial, Poisson, Beta, and Weibull distributions.

Inverse distribution functions such as BETA.INV, BIONOM.INV, LOGNORM.INV and NORM.INV are used in simulation models to generate random numbers from a specific distribution.

Page 75: Chapter 7: Statistical Analysis with Excel

75

Additional Links

(place links here)


Recommended