+ All Categories
Home > Documents > Exercise 1

Exercise 1

Date post: 23-Oct-2014
Category:
Upload: noraini-ismail
View: 20 times
Download: 2 times
Share this document with a friend
Popular Tags:
16
1. Exploratory Data Analysis A normality test to check whether the data meet the assumption of the population must be normally distributed. SPSS provides two statistics: (i) Kolmogorov-Smirnov (ii) Shapiro-Wilk Case Processing Summary strat a Cases Valid Missing Total N PercentN PercentN Percent totfinw ell urban 480 100.0% 0 .0% 480 100.0% rural 320 100.0% 0 .0% 320 100.0% There is no missing data from 800 samples, which is assumed to be randomly selected. 1.1 From the descriptives table below, several observations can be made: (i) Mean Mean is 75.53 (ii) Trimmed Mean To obtain this value, SPSS removes the top and bottom 5 per cent of the cases and calculates a new mean value. If we compare the original mean (75.53) and this new trimmed mean (75.50), we can see whether our extreme scores are having a strong influence on the mean.Trimmed Mean is 75.50, which is very similar to the mean. 1
Transcript
Page 1: Exercise 1

1. Exploratory Data Analysis A normality test to check whether the data meet the assumption of the population

must be normally distributed. SPSS provides two statistics:

(i) Kolmogorov-Smirnov

(ii) Shapiro-Wilk

Case Processing Summary

strata

CasesValid Missing TotalN Percent N Percent N Percent

totfinwell urban 480 100.0% 0 .0% 480 100.0%rural 320 100.0% 0 .0% 320 100.0%

There is no missing data from 800 samples, which is assumed to be randomly

selected.

1.1 From the descriptives table below, several observations can be made:

(i) Mean

Mean is 75.53

(ii) Trimmed Mean

To obtain this value, SPSS removes the top and bottom 5 per cent of the

cases and calculates a new mean value. If we compare the original mean

(75.53) and this new trimmed mean (75.50), we can see whether our

extreme scores are having a strong influence on the mean.Trimmed Mean

is 75.50, which is very similar to the mean.

(iii) Skewness

Skew is the tilt of the distribution, skew should be within +2 to -2 range

when the data are normally distributed. In this case, skew is -.314

and .136, which is not within the range of accepted as normally distributed.

1

Page 2: Exercise 1

(iv) Kurtosis

Kurtosis, on the other hand, provides information about the ‘peakedness’

of the distribution. If the distribution is perfectly normal, we would obtain a

skewness and kurtosis value of 0 (rather an uncommon occurrence in the

social sciences). Positive skewness values indicate positive skew (scores

clustered to the left at the low values). Negative skewness values indicate

a clustering of scores at the high end (right-hand side of a graph). Positive

kurtosis values indicate that the distribution is rather peaked (clustered in

the centre), with long thin tails. Kurtosis values below 0 indicate a

distribution that is relatively flat (too many cases in the extremes). With

reasonably large samples, skewness will not ‘make a substantive

difference in the analysis’ (Tabachnick & Fidell 2007, p. 80). Kurtosis can

result in an underestimate of the variance, but this risk is also reduced with

a large sample (200+ cases: see Tabachnick & Fidell 2007, p. 80). In this

case, the value of kurtosis is -.070 and .272, and the sample for this case

is large – 800 samples.

Descriptives

strata Statistic Std. Error

totfinwell urban Mean 75.53 .778

95% Confidence Interval for Mean

Lower Bound 74.00

Upper Bound 77.06

5% Trimmed Mean 75.50

Median 77.50

Variance 290.425

Std. Deviation 17.042

Minimum 19

Maximum 120

Range 101

2

Page 3: Exercise 1

Interquartile Range 24

Skewness -.079 .111

Kurtosis .109 .222

rural Mean 72.86 .997

95% Confidence Interval for Mean

Lower Bound 70.90

Upper Bound 74.82

5% Trimmed Mean 73.11

Median 75.00

Variance 318.000

Std. Deviation 17.833

Minimum 16

Maximum 120

Range 104

Interquartile Range 26

Skewness -.314 .136

Kurtosis -.070 .272

1.2 Kolmogorov-Smirnov and Shapiro-Wilk statistic

Tests of Normality

strata

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

totfinwell urban .058 480 .001 .994 480 .048

rural .064 320 .003 .988 320 .011

a. Lilliefors Significance Correction

3

Page 4: Exercise 1

Kolmogorov-Smirnov and Shapiro-Wilk statistic assess the normality of the

distribution of scores. A non-significant result (Sig. value of more than .05) indicates

normality.

In this exercise, the Sig. value are .001,.003 and .048, .011 suggesting violation of

the assumption of normality. The results of the KS and SW test show that significant

value < .05, therefore, reject null hypothesis. Meaning that the data is not normally

distributed. This is quite common in larger samples.

1.3 Histograms

4

Page 5: Exercise 1

Histograms are used to display the distribution of a single continuous variable.

Inspection of the shape of the histogram provides information about the distribution

of scores on the continuous variable. Many of the statistics discussed in this manual

assume that the scores on each of the variables are normally distributed (i.e. follow

the shape of the normal curve). In this exercise, the scores are reasonably normally

distributed, with most scores occurring in the centre, tapering out towards the

extremes. It is quite common in the social sciences, however, to find that variables

are not normally distributed. Scores may be skewed to the left or right or,

alternatively, arranged in a rectangular shape. The actual shape of the distribution

for each group can be seen in the Histograms.

1.4 Normal Q-Q Plot

In this exercise, scores appear to be reasonably normally distributed. This is also

supported by an inspection of the normal probability plots (labelled Normal Q-Q

Plot). In this plot, the observed value for each score is plotted against the expected

value from the normal distribution. A reasonably straight line suggests a normal

distribution.

5

Page 6: Exercise 1

1.5 Detrended Normal Q-Q Plots

The Detrended Normal Q-Q Plots are obtained by plotting the actual deviation

of the scores from the straight line. There should be no real clustering of points, with

most collecting around the zero line.

6

Page 7: Exercise 1

1.6 Boxplots

The final plot that is provided in the output is a boxplot of the distribution of

scores for the two groups. The rectangle represents 50 per cent of the cases, with

the whiskers (the lines protruding from the box) going out to the smallest and

largest values. Sometimes we will see additional circles outside this range—these

are classified by SPSS as outliers. The line inside the rectangle is the median value.

Any scores that SPSS considers are outliers appear as little circles with a number

attached (this is the ID number of the case). SPSS defines points as outliers if they

extend more than 1.5 box-lengths from the edge of the box. Extreme points

(indicated with an asterisk, *) are those that extend more than three box-lengths from

the edge of the box. In the exercise below there are three outliers (two for urban and

one for rural ): ID numbers 550 and 686 and 711 for rural samples.

Boxplots are useful when we wish to compare the distribution of scores on variables.

We can use them to explore the distribution of one continuous variable (e.g. positive

affect) or, alternatively, we can ask for scores to be broken down for different groups

(e.g. age groups). We can also add an extra categorical variable to compare (e.g.

males and females).

In the exercise presented below, the distribution of scores of total financial wellbeing

between urban and rural population is very similar.

7

Page 8: Exercise 1

1.7 Interpretation and Conclusion

In this exercise, which is to test the normality of data on total financial wellbeing

between urban and rural population, since the data used are large samples (800),

we cannot rely on the Descriptives Table, KS, and SW test, then we have to look at

the Histograms, Normal Q-Q plots, Detrended Normal Q-Q Plots and Boxplots.

In the histograms, look at the tails of the distribution. There are almost invisible data

points sitting on their own, out on the extremes, it means that there are no potential

outliers. Furthermore, the scores drop away in a reasonably even slope.

In the Normal Q-Q Plots, the observed value for each score is plotted against the

expected value from the normal distribution. We see a reasonably straight line which

suggests a normal distribution of the data.

The same is observed in the Detrended Q-Q Normal Plots where there is actual

deviation of the scores from the straight line. There is no real clustering of points,

with most collecting around the zero line. It also indicates the normality of the data.

In the Boxplots, it is observed that the median line of the box is placed in the middle

for the rural population, whereas the line of the urban is almost to the middle of the

box which also indicates the normality of the data is not violated.

To conclude, with the consideration of using big sample size, and the observations

from several analyses above, the data is fulfill the assumption of normal.

8

Page 9: Exercise 1

2. Assumptions for t-test

T-tests are used when we have two groups (e.g. males and females) or two sets of

data (before and after), and we wish to compare the mean score on some

continuous variable. There are two main types of t-tests:

(i) Paired sample t-tests

(also called repeated measures) are used when we are interested in changes

in scores for participants tested at Time 1, and then again at Time 2 (often

after some intervention or event). The samples are ‘related’ because they are

the same people tested each time.

(ii) Independent sample t-tests

are used when we have two different (independent) groups of people (males

and females), and we are interested in comparing their scores. In this case,

we collect information on only one occasion but from two different sets of

people.

2.1 Independent-samples t-test

Independent samples t-test is to compare the mean of two groups on a single

interval or ratio variable.

Example of research question:

Are the urban population more financially satisfied than rural population?

This exercise will use the categorical independent variable with only two groups (e.g.

strata : urban/rural) and one continuous dependent variable (e.g. totfinwell).

Respondents can belong to only one group.

9

Page 10: Exercise 1

2.1.1 Assumptions

(i) Level of measurement

- Involved continuous data for the DV – interval and ratio data

(ii) Random sampling:

- assuming that data are obtained using a random sample from the population

(iii) Independent of observations

- the observations for each variable must be independent of one another i.e. not influenced by other variable/s.

(iv) Normal distribution

- assuming that the population from which the samples are taken are normally distributed

(v) Homogeneity of variance

- assuming that the samples are taken from population of equal variance.

2. 2. Hyphothesis

Null hypothesis (Ho): there is no difference between the two means of the financial

wellbeing between the urban and rural residents.

Alternate hypothesis (HA): there is a difference between the two means of the

financial wellbeing between the urban and rural residents.

2.3 IV and DV

IV is categorical variable (financial wellbeing); and DV is continuous variable (strata,

urban and rural area)

10

Page 11: Exercise 1

2.4 The result

Group Statistics

strata N MeanStd. Deviation

Std. Error Mean

totfinwell urban 480 75.53 17.042 .778

rural 320 72.86 17.833 .997

Independent Samples Test

Levene's Test for Equality of Variances t-test for Equality of Means

F Sig. t df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the Difference

Lower Upper

totfinwell Equal variances assumed

1.474

.225 2.132 798 .033 2.671 1.253 .211 5.130

Equal variances not assumed

2.112 662.218

.035 2.671 1.264 .188 5.154

2.4 Interpretation of the result

2.4.1 Check assumptions using table of Independent Samples Test

- Result of Levene’s test for equality of variance

In the Levene’s test, the result is not significant, f > 0.05

11

Page 12: Exercise 1

0.225 is > 0.05, then the analysis will be using the t-test result in the first row as

there is equal variance.

Referring to the result above, since Levene’s test is not significant, row 1 is used

(equal variance assumed); t = 2.132 & p = 0.033

2.4.2 The result showed that p>0.05 = 0.225 > 0.05, thus fail to reject null

hypothesis.

Therefore, it can be concluded that there is no significant difference in the mean

scores of financial wellbeing for each of the urban and rural residents.

2.4.3 The effect size (eta squared) can be calculated :

Eta squared = t2 + (N1 + N2 – 2)

= (2.132)2 = 0.45 (2.132)2 + (480 + 320 – 2)

The effect size or the magnitude of the difference is very small. Only 0.45 percent of

the variance (effect size x 100) in financial wellbeing is explained by residential

areas. The strength of difference is very low.

12


Recommended