+ All Categories
Home > Documents > DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can...

DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can...

Date post: 14-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
22
2/6/2013 1 DATA WRANGLING: STATISTICAL ANALYSIS TOOLS Presentation By Dr Tapan Rai Senior Lecturer (Statistical Consulting), School of Mathematical Sciences University of Technology Sydney Email: [email protected] Summary SPSS The SPSS Data File Graphing: A histogram Descriptive Statistics: Explore Confidence Intervals Hypothesis Tests Where Can I learn more? Minitab: An Alternative to SPSS Can I use Excel? Should I use Matlab/Maple/Mathematica? An Introduction to R
Transcript
Page 1: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

1

DATA WRANGLING:STATISTICAL ANALYSIS TOOLS

Presentation By

Dr Tapan Rai

Senior Lecturer (Statistical Consulting),

School of Mathematical Sciences

University of Technology Sydney

Email: [email protected]

Summary

� SPSS� The SPSS Data File

� Graphing: A histogram

� Descriptive Statistics: Explore

� Confidence Intervals

� Hypothesis Tests

� Where Can I learn more?

� Minitab: An Alternative to SPSS

� Can I use Excel?

� Should I use Matlab/Maple/Mathematica?

� An Introduction to R

Page 2: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

2

The SPSS Data File: Data View

� Variables are organised in columns

� Usually, one row per case

The SPSS Data File: Variable View

� This is where you enter details about each variable

Page 3: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

3

The SPSS Data File: Variable View

� When you enter a variable name, SPSS automatically fills in some of the other information

� It is important that you verify this information

The SPSS Data File: Variable View

� It is especially important that you verify the column marked “Measure”; this refers to data level

Page 4: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

4

The SPSS Data File: Back to Data View

� Variable names show up as column headings

� Data entry is similar to Excel

The Graphing Menu

Page 5: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

5

An Example: Creating a Histogram

Histogram of HR

Page 6: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

6

Descriptive Statistics: The Explore Menu

The Explore Dialog Box

Page 7: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

7

The “Explore” Output

Descriptives

Statistic

Std.

Error

HR Mean 71.33 1.218

95% Confidence Interval for Mean Lower Bound 68.84

Upper Bound 73.82

5% Trimmed Mean 71.39

Median 70.50

Variance 44.506

Std. Deviation 6.671

Minimum 56

Maximum 85

Range 29

Interquartile Range 9

Skewness -.104 .427

Kurtosis -.093 .833

Confidence Intervals

� According to the Central Limit Theorem,

95% of sample means lie within two standard errors of the population mean

size sample

deviation standard error standard where, =

Page 8: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

8

Determining Confidence Intervals

Limits of the 95% confidence interval for the population mean

95% of standard means

One Standard Error

� In SPSS confidence intervals for means are obtained from descriptive statistics routines

Output for SPSS Explore Menu

Descriptives

Statistic Std. ErrorHR Mean 71.33 1.218

95% Confidence Interval for Mean Lower Bound 68.84

Upper Bound 73.82

5% Trimmed Mean 71.39

Median 70.50

Variance 44.506

Std. Deviation 6.671

Minimum 56

Maximum 85

Range 29

Interquartile Range 9

Skewness -.104 .427

Kurtosis -.093 .833

Page 9: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

9

Hypothesis Tests

� Hypothesis tests enable researchers to make inferences about a population based on data collected for a small sample

� Usually designed to test the null hypothesis that there is no significant difference between two or more populations against the alternative hypothesis that there is a significant difference

Steps in Hypothesis Testing

� Set up the null (H0) and alternative (H1) hypotheses

� Assume that the null hypothesis H0 is true

� Calculate the probability, p, that the data could occur by chance if H0 were true

� If the p is low (usually if p < 0.05), reject H0

� If the p is not low (usually, if p ≥ 0.05), do not reject H0

Page 10: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

10

Performing a Hypothesis Test

� We use a statistical package to calculate p.

� However, the hypotheses need to be set up correctly.

An Example

� It is known that the heart rate of a healthy population is 72 beats per minute.

� You are interested in determining whether this is true for the student body of UTS

� You take a sample of 30 UTS students and measure their heart rates.

� You need to perform a hypothesis test to compare the mean heart rate from your sample to 72.

Page 11: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

11

One-sample t-test: Hypotheses and Significance Level

� Write down the hypotheses:

� Null: Heart rate is not significantly different from 72

� Alternative: Heart rate is significantly different from 72

� Decide the significance level to be used:

� α= 0.05 is standard in most cases

Performing a one-sample t-test in SPSS

Page 12: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

12

Performing a one-sample t-test

Output for the one-sample t-test

One-Sample Statistics

N MeanStd.

DeviationStd. Error

MeanHR 30 71.33 6.671 1.218

One-Sample Test

Test Value = 72

t df Sig. (2-tailed)Mean

Difference

95% Confidence Interval of the Difference

Lower Upper

HR -.547 29 .588 -.667 -3.16 1.82

Since the p-value (sig.) is greater than the significance level of 0.05, the decision is to not reject the null hypothesis.Conclusion: The heart rate is not significantly different from 72.

Page 13: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

13

Other Hypothesis Tests

� Paired (matched) t-tests (paired readings from same sample)

� Independent samples t-tests

� Chi-squared tests

� ANOVA

� The general linear model

� Logistic Regression

These tests are implemented in most statistical software packages

Where can I learn more?

� Workshops offered by Graduate Research School:� Survey Design and Analysis

� February 18-21

� June 24-27

� Introduction to Design of Experiments� April 8-11

� July 29 - August 1

� Regression Analysis� May 27-30

� September 30 - October 3

For details and to register, contact Ella Chavez ([email protected])

Page 14: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

14

Minitab: An Alternative to SPSS

Minitab: Graph Menu

Page 15: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

15

Can I Use Excel Instead?

� Excel has built-in functions to calculate mean (average), standard deviation, median etc.

� Excel also has a “Analysis ToolPak” add in that can be easily installed.

� The analysis toolpak includes a suite of more advanced statistical tools such as hypothesis tests

Problems with using Excel 1

� More complicated than it appears

� For example there are four different functions for standard deviation: stdev, stdevp, stdeva, stdevpa, each of which may give different results:

VALUE

7

9

11

AVERAGE 9

STDEV 2

STDEVA 4.787136

STDEVP 1.632993

STDEVPA 4.145781

Page 16: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

16

Problems with Excel 2

� Many problems with results from Excel’s analysis toolpak have been documented and published in refereed journals

� Some referees/examiners therfore take issue with all results obtained from Excel

Documented Issues with Excel 1

“Excel 2007, like its predecessors, fails a standard set ofintermediate-level accuracy tests … Microsoft’s continuing inability tocorrectly fix errors is discussed. No statistical procedure in Excel shouldbe used until Microsoft documents that the procedure is correct; it is notsafe to assume that Microsoft Excel’s statistical procedures give thecorrect answer.”

Page 17: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

17

Documented Issues with Excel 2

“We find that the accuracy of various statistical functions in Excel 2007range from unacceptably bad to acceptable but significantly inferiorin comparison to alternative implementations. In particular, …, it ispossible to obtain results with zero accurate digits as shown withnumerical examples.”

Documented Excel Issues 3

“This paper shows that Excel graphics defaults do not embody the appropriate principles [of statistical graphics]. Users who want to use Excel are advised to know the principles of good graphics well enough so that they can choose the appropriate options to override the defaults.”

Page 18: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

18

Should I use Matlab or Maple or Mathematica?

� Not unless…

� … you are doing mathematical modelling rather than statistical analysis

� … you are a glutton for punishment

� If you need advanced statistical procedures, consider using R.

What is R?

� R is a Statistical Programming Package

� R is available as Free Software under the terms of the Free Software Foundation's GNU License

� R runs on a number of platforms, including Unix/Linux, Windows and Mac

Page 19: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

19

Obtaining R

� The standard distribution of R can be downloaded from http://www.r-project.org

� Other (more user-friendly) versions include:

� R Studio

� R Studio can be downloaded free from www.rstudio.com/ide/

� Revolution R Enterprise

� Revolution R is a commerical distribution of R

� A free license is available to university staff and students, after registering at www.revolutionanalytics.com/downloads/free-academic.php

The Revolution R Workspace

Page 20: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

20

Entering Data in R

� There are several ways to enter data in R

� R also reads data in various forms

� My preference is to create a files in a spreadsheetpackage (e.g. Excel) and save it as a comma separated variable file (.csv)

Reading a .csv file into R

� To read the csv file, “houses.csv”, use:

� houses <- read.csv(“houses.csv”, header=TRUE)

� This assigns the data in the file to the object “houses”

� The “header = TRUE” option tells R that the variable names are stored in the first row.

� To view the data on the screen, use� houses

Page 21: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

21

Graphing Data

� plot(houses)

Other Graphs: Specific Variables

� hist(houses$Floor) � boxplot(houses$Floor)

Page 22: DATA WRANGLING: STATISTICAL ANALYSIS TOOLS...Excel also has a “Analysis ToolPak” add in that can be easily installed. The analysis toolpakincludes a suite of more advanced statistical

2/6/2013

22

Obtaining Descriptive Statistics

� summary(houses[4:6])

Rooms Age CentralHeating

Min. : 5.00 Min. : 0.000 no : 9

1st Qu.: 5.75 1st Qu.: 1.850 yes: 11

Median : 6.00 Median : 4.250

Mean : 6.00 Mean : 4.205

3rd Qu.: 6.25 3rd Qu.: 6.300

Max. : 7.00 Max. : 9.200

Pros and Cons of R

� Pros

� Free

� Extremely Powerful

� Latest statistical techniques implemented in R, long before they are implemented elsewhere

� Cons

� Steep learning curve: not menu driven

� Output format may not be the best

� Minimal technical support


Recommended