+ All Categories
Home > Documents > Organizing Your Data - University of Tennessee College of...

Organizing Your Data - University of Tennessee College of...

Date post: 11-Jun-2018
Category:
Upload: lydien
View: 213 times
Download: 0 times
Share this document with a friend
48
Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013
Transcript

Organizing Your Data

Jenny Holcombe, PhD

UT College of Medicine Nuts & Bolts Conference

August 16, 3013

Learning Objectives

Identify Different Types of Variables

Appropriately Naming Variables

Constructing a Variable Code Book

Developing Excel Spreadsheets & Effectively Entering Data

Identifying Differences between: Descriptive & Inferential Statistics

Parametric & Nonparametric Statistics

2

TYPES OF VARIABLES

3

Variables

A characteristic or condition that changes or has different values for different individuals

Anything that can be measured

Operational Definition – a definition of the variable in terms of how, specifically, it is to be measured

4

Qualitative Variables

Differ in kind rather than amount

Differ in quality, not quantity or magnitude

Also referred to as categorical or nominal

Examples – favorite color, treatment group, gender, race

5

Quantitative Variables

Assigned number values that represent differing quantities of the characteristics

Examples – medication dosage, # of doctor visits, annual income

Quantitative data can either be: Discrete – a finite number of values (i.e., # of

doctor visits last year)

Continuous – infinite continuum of possible real number values (i.e., # of minutes it takes to finish a book)

6

Quantitative Variables

Three types of quantitative variables: Ordinal – categorical scales that have a natural

ordering of values (i.e., SES Class – low, middle, high)

Interval – distances between adjacent scores are equal & consistent throughout the scale with no absolute zero point (i.e., IQ scores, temperature)

Ratio – same as interval with a true zero point (i.e., length, distance, time)

7

Variables – Final Points

It is possible to measure data on more than one scale

Variables should always be measured on the highest scale possible Ratio

Interval

Ordinal

Nominal

8

Fictitious Data

Four measurement levels for daily amount of sodium intake

9

Participant Ratio - Actual mg

Interval - Values above 2500mg

Ordinal - Rank order

Nominal - 1=not high 2=high

Alan 4000 1500 3 1

Nathan 7500 5000 6 2

Chris 2500 0 1 1

Mike 3500 1000 2 1

Vadim 6000 3500 5 2

Daniel 5500 3000 4 2

Source: Polit, D. F. (2010). Statistics and data analysis for nursing research. (2nd ed.). Boston: Pearson. ISBN: 78-0135085073

NAMING VARIABLES

10

Naming Variables

The first row should include variable names this makes transfer to other

programs easier (i.e., SPSS, SAS)

Variable names can be up to 32 characters in length but anything more than 8-12 becomes very cumbersome to manage

Each variable name must be unique; duplication is not allowed & names are not case sensitive 11

Naming Variables

Variable names should begin with a letter

Avoid periods, #, @, $, and only use underscores within the variable name (not at the beginning or end)

No spaces are allowed in variable names

Use meaningful names for variables Makes variables more self explanatory

Some exceptions – balance length/meaning

12

Naming Variables

Acceptable Names

Q1; Q_1

Question1; Question_1

Q1_food

Food

DRS1; DRS_1

Unacceptable Names

Q 1; 1Q; Q-1

Question 1; Question-1

Q1 food; Q1-food

_Food_

DiabetesRiskScale1

The main thing is to be consistent when naming variables

13

CONSTRUCTING A VARIABLE CODE BOOK

15

Variable Code Books

Purpose: To create a data entry system

To assist with data entry

For statistical analysis

When archiving data files for follow-up

16

Code Book Construction

Elements to include:

1) Description of the Study

2) Sampling Information

3) Technical Information

4) Structure of the Data

Variable Name

Variable Label

Value Labels

5) Text of the Questions/Survey Instrument

17

Code Book Construction

Word or Excel format is acceptable

A columned list or table is acceptable

All variables should be included with appropriate labeling information

Variable labels can be any length but no longer than 256 characters is recommended

The variable labels can contain spaces & characters not allowed in variable names

18

Code Book Examples

Polit (2010) Data Files

Swedish Institute for Social Research

ACHA NCHA II

19

Code Book – Final Points

Be consistent in your coding!

Update the code book as you enter your data – if you make a change while entering your data, make sure you update your code book as well

Check & double check – your code book acts as a form of communication between you and your data analyst (and possibly

between you & your future self!)

20

DEVELOPING EXCEL SPREADSHEETS & DATA ENTRY

21

Proper Data Layout

Allows you the ability to: Combine data

Separate data

Create charts that give insight into what the raw data has to say

When you enter your data without consideration of how you will use the data later, it becomes much more difficult to conduct any data analysis

22

Excel Basics

Each individual row of data is known as a record, an observation, a case Do not leave any blank rows

There cannot be information about an item in more than one row

Each column is a field labeled to identify the data it contains All data in each column should be formatted the

same

Do not leave blank columns in the table 23

Excel Basics

Once a database is created you can use Excel tools to manage the data Sorting Data

Filtering Data

24

Missing Values

Should be entered consistently use „9‟

or „99‟ or „999‟

The value should be something that cannot represent a real numeric value for the variable in question

Excel will recognize these „missing‟ values as real values so be careful if you are using Excel for analysis

25

Additional Points

Ensure rows below data are not „activated‟ so they are not mistaken during transfer as additional cases/observations

Numeric values are always best to use for data entry regardless of the type of variable (quantitative vs. qualitative) Values/labels can always be assigned in a code

book or data analysis program

26

DESCRIPTIVE VS. INFERENTIAL STATISTICS

27

Descriptive vs. Inferential

• Descriptive Statistics • Used to summarize, organize, and simplify data for better understanding

• Means, standard deviations, percents, frequencies, proportions, etc.

• Inferential Statistics • Statistical procedures that allow researchers to study samples & then make generalizations about the population from which they were selected

• Allows the researcher to draw conclusions

28

Descriptive Statistics in Excel

29

This is the status bar. It will display various information about a selected set of values in the spreadsheet. To change the information displayed you simply right click on the status bar.

Central Tendency in Excel

The AVERAGE Function Calculates the arithmetic mean

=AVERAGE(A1:A100)

The MEDIAN Function Calculates the median (center value)

=MEDIAN(A1:A100)

The MODE Function Calculates the most frequently occurring value

=MODE(A1:A100)

30

Variability in Excel

There is no range function in Excel, but… =MAX(A1:A100) – MIN(A1:A100)

The VAR Function Calculates sample variance

=VAR(A1:A100)

The STDEV Function Calculates sample standard deviation

=STDEV(A1:A100)

Remember: STDEV = sqrt(VAR); STDEV2 = VAR

31

PARAMETRIC VS. NONPARAMETRIC STATISTICS

32

Parametric Statistics

A class of inferential statistical tests that involves: assumptions about the distribution of the variables,

the estimation of a parameter, and usually

the use of interval or ratio measures

Statistical tests designed to be used when data have certain characteristics – when they approximate a normal distribution & are measured with interval or ratio scales

33

Parametric Statistics

Bivariate

One-sample t test

Two-sample t test

Analysis of variance (ANOVA)

Repeated measures ANOVA

Pearson‟s product moment correlation (r)

Multivariate

Multiple correlation/regression

ANCOVA

MANOVA

MANCOVA

Mixed design RM-ANOVA

Canonical analysis

Discriminant analysis

Logistic regression

Factor analysis 34

Nonparametric Statistics

A general class of inferential statistical tests that does not involve rigorous assumptions about the distribution of the variables; most often used with small samples, when data are measured on the nominal or ordinal scales, or when a distribution is severely skewed

Statistical tests that are designed to be used when data being analyzed depart from the distributions that can be analyzed with parametric statistics 35

Nonparametric Statistics

Chi-square goodness-of-fit test

Chi-square test of independence

Fisher‟s exact test

McNemar test

Cochran‟s Q test

Mann-Whitney U test

Kruskal-Wallis test

Wicoxon signed ranks test

Friedman test

Spearman‟s rank order correlation

Kendall‟s tau

36

Comparison of Parametric & Nonparametric Statistics

There is at least one nonparametric test equivalent to a parametric test

These tests fall into several categories 1. Tests of differences between groups

(independent samples)

2. Tests of differences between variables (dependent samples)

3. Tests of relationships between variables

37

Differences Between Independent Groups

Two groups/samples – compare mean value for some variable of interest

Multiple groups

Parametric Nonparametric

t-test for independent samples

Wald-Wolfowitz runs test

Mann Whitney U test

Kolmogorov-Smirnov two sample test

38

Parametric Nonparametric

Analysis of Variance (ANOVA/MANOVA)

Kruskal-Wallis analysis of ranks

Median test

Differences Between Dependent Groups

Compare two variables measured in the same sample

If more than two variables are measured in same sample

39

Parametric Nonparametric

t-test for dependent samples

Sign test

Wilcoxon‟s matched pairs test

Parametric Nonparametric

Repeated measures ANOVA

Friedman‟s two way analysis of variance

Cochran Q

Relationships Between Variables

Two variables of interest are categorical

40

Parametric Nonparametric

Correlation coefficient

Spearman R

Kendal Tau

Coefficient Gamma

Chi-Square

Phi coefficient

Fisher exact test

Kendall coefficient of concordance

Parametric vs. Nonparametric

41

Parametric Nonparametric

Assumed Distribution Normal Any

Assumed Variance Homogenous Any

Typical Data Ratio or Interval Ordinal or Nominal

Data Set Relationships Independent Any

Usual Central Measure Mean Median

Benefits Can draw more conclusions

Simplicity; Less affected by outliers

EXCEL ANALYSIS TOOLPAK

42

43

More Statistics Using Excel

To get more statistics power from Excel, you need to add in the Analysis ToolPak

Refer to the screenshots on the next few pages

I followed this process in my version of Excel 2007 and had no trouble adding the ToolPak

Analysis ToolPak

Allows you to conduct: Summary descriptive statistics

Correlation

Histograms

Rank & Percentile

Regression

z-tests

t-tests

ANOVAs

44

45

Add in the Analysis ToolPak

Click the Microsoft Office button, then Excel Options

46

Add in the Analysis ToolPak

Click Add ins. In the “Manage” box, select Excel Add ins. Click “Go”

47

Add in the Analysis ToolPak

Click the checkbox for the Analysis ToolPak, then „OK‟

Install it if it is not installed

When you have added it in, it will appear on the „Data‟ tab all the way on the right hand side of your screen

Questions?

Jenny Holcombe, PhD

UTC School of Nursing

[email protected]

(423) 425-5542

48


Recommended