CS101 Lecture 10: Excel Data Analysis - Computer … 1 Aaron Stevens 30 September 2010 CS101 Lecture...

Post on 03-May-2018

216 views 2 download

transcript

1

1

Aaron Stevens30 September 2010

CS101 Lecture 10:Excel Data Analysis

"There are three kinds of lies:lies, damned lies, and statistics.”

- Mark Twain paraphrasing Benjamin Disraeli

2

What You’ll Learn Today

– How do we describe data?– How do we find relationships within data?– How do we analyze data in Excel?– To what extend do two datasets vary

together?– Can we describe relationships between

data as equations?

2

3

What is Data Analysis?Data analysis is the process used to getfrom raw data to the results that can beused to make decisions.

Results of data analysis can be used for:– Detecting trends– Making predictions

4

Example Data

We have some datadescribing how wellmovies did at the BoxOffice and Videosales:(sales in $ millions)

3

5

Descriptive Statistics

Descriptive Statistics answer basicquestions about the central tendency anddispersion of data observations.– Range of values– Middle value– Frequency distribution

6

Descriptive Statistics

Calculatingdescriptivestatisticsusing Excel:

Menu:Tools ->Data Analysis

4

7

Descriptive Statistics- Mean, Std Error, Median, mode, StandardDeviation, Range, Min, Max, Sum, Count.

8

Histogram

An histogram describes the frequency distributionof the data observations as grouped into “buckets.”

5

9

Relationships Between SeriesA Dot Plot graphically shows the relationship betweenpairs of observations in two data series.

10

Relationships Between SeriesThis plot shows an apparent relationshipbetween box office revenue and videosales revenue.

6

11

CorrelationCorrelation is the extent to which variables in twodifferent data series tend to move together (or apart)over time.

12

Inverse CorrelationAn inverse correlation exists when two data series movein opposite directions.

7

13

CorrelationExample of a weak Correlation

14

Describing CorrelationA correlation coefficient describes the strength of thecorrelation between two series. Values in range (-1.0, 1.0)

Positive correlation: large values of one set areassociated with large values of the other and vice versa.Negative correlation: small values of one set areassociated with large values of the other and vice versa.Zero correlation: the values in the two sets are notcorrelated linearly.

8

15

What exactly is therelationship?

Correlation measures whether a linear relationshipexists between two series of data.

Linear Regression attempts to find the relationshipbetween the two series and expresses this relationwith a linear equation.

Linear equation in the form: y = mx + b

16

Linear RegressionTo run a linear regression:

Select a dependent variable (y) and an independentvariable (x).

9

17

Linear Regression Analysis

What does this output tell us?It describes the relationship in terms of an equation:

Video sales = -140 + 4.33 (Box office sales)

18

Linear Regression Analysis

What does this output tell us?It describes the relationship in terms of an equation:

Video sales = -140 + 4.33 (Box office sales)

10

19

Linear Regression Plot

20

How good is the fit?R-Square statistic describes how muchof the variation in Y variable wasexplained by variation in X variable.– R-Square = 1 is perfect.– R-Square > 0.5 is considered good.

11

21

How good is the fit?P-Value statistic describes thelikelihood of randomness explaining thevalue for the equation’s coefficients.– P-value > 0.05 or 0.10 indicates randomness.– P-value 0.0 indicates non-randomness.

22

What You Learned Today– Data Analysis– Descriptive Statistics– Correlation– Linear Regression

12

23

Student To Dos– HW04 (Excel Data Analysis) due WED 10/6– Quiz 2 is on TUE 10/5

• Covers lectures 7, 8, 9, 10 (Excel)