1
Extra Exercises Basic Statistics:
Exercise 1:
Data results.txt (see Results on web-page: Goodyear)
This dataset contains the results of the students from the following study
disciplines some years ago: Chemistry, Biology and Geography. The
variables are as follows:
length : length of students in cm;
gender : Male (M) of Female (F);
high_school : study results of the last year in high school (in
percentages);
bachelor : study results of the first bachelor year (in percentages);
study_direction : Chemistry (Ch), Biology (B) or Geography (G);
color : preferable color of the car Light (L), Dark (D) or Red (R).
Check if the bachelor score is significantly higher than the high school score.
2
Exercise 2:
Data chol.txt
The data contains information about the cholesterol level of 200 persons.
AGE : age of a person;
HEIGHT : height of a body;
WEIGHT : weight of a body;
CHOL : cholesterol level;
SMOKE : nosmo/pipe/sigare;
BLOOD : blood group a/ab/b/o
MORT : alive/dead
and other variables.
a. Make a box-plot of the cholesterol level of the smokers- and non-
smokers groups.
b. Check if the average cholesterol level of the smokers is significantly
different from that of the non-smokers:
H0: µsm = µnon-sm vs. H1: µsm ≠ µnon-sm
3
Solution:
Exercise 1:
H0: µbach = µh_sch vs. H1: µbach > µh_sch
Step 1:
Create a new variable D=bachler-high_school
Step 2: Reformulate the hypothesis
H0: µD = 0 vs. H1: µD > 0
Step 3: Check the normality of the variable D:
On the data window:
Analyze Distribution: Select Y, Columns: D
On D menu: Continuous Fit Normal
On D menu: Normal Quantile plot
On Fitted Normal menu: Goodness of Fit
4
On the histogram and Q-Q plot we do not see a departure from normality.
From the Shapiro-Wilk test we obtain p=0.7028 > 0.05. Hence, we do not
have a reason to reject normality.
Step 4:
Since the data is normal, we can apply a one sample t-test to test the
significance of the mean:
On the data window:
Analyze Matched paires
5
JMP took high_school-bachler difference. Hence, we reformulate H1 as
follows:
H1: µh_sch < µbach or, equivalently, H1: µD < 0 .
6
Then, the corresponding p-value will be “Prob<t” and equal 1.00. So, we will
not reject H0.
Remark: the same result could be obtained if you apply a sample test based
on the difference D. Here, we will test H0: µD = 0 vs. H1: µD > 0, as it was
formulated.
On the Distribution window:
In D menu: Test Mean
7
Exercise 2:
(a. )
Step 1: Create a variable Sm_Status with 2 levels: Smoker/Non-Smoker:
Make a new column Sm_Status .
On the variable window:
Column Properties Formula
Edit Formula
On the formula window:
Functions (grouped) Conditional : Select: Match
Table Columns: Select: Smoke
Bottom: ^ (= insert)
8
Make a grouped Box plot:
Analyze Fit Y by X: Y, Response: Chol; X, Factor: Sm_Status
In the Oneway menu: Display Options: Box Plots
9
(b.)
Step 1:
Make a bar plot to get an idea of the sample sizes:
Graph Chart: Statistics: N(Sm_Status)
In the data we have more than 40 (=49, from the data) non-smokers and
more than 150 (= 151, from the data) smokers:
Sample sizes are larger than 30, but their difference is also large;
In this case, if the distributions are skewed, then a t-test is not suited
for a mean comparison. Hence, we will check normality.
10
Step 2: H0: µsm = µnon-sm vs. H1: µsm ≠ µnon-sm
First, split the column:
Tables Split: Split Columns: Chol; Split By: Sm_Status
Then, test normality :
The group of the smokers is skewed to the right. We will try to transform
the data to improve the normality. Try a square root transformation:
11
The normality is satisficatory. Hence, we will apply a t-test on the
transformed data.
Step 3: Transform Chol data to sqrt(Chol): Sqrt_Chol.
Now, we will test the following:
H0: µsqrt_sm = µsqrt_non-sm vs. H1: µsqrt_sm ≠ µsqrt_non-sm.
We can reach a conclusion only about the equality of the means of the
transformed measurements of the samples.
Step 4: Check the equality of the variances.
H0: σsqrt_sm = σsqrt_non-sm
Analyze Fit Y by X
12
In Oneway window: Unequal Variances
p=0.0126 < 0.05 we reject the equality of the variances. We will apply a
t-test for unequal variances.
Step 5: t-test for unequal variances
p=0.1337 > 0.05, we will not reject H0: µsqrt_sm = µsqrt_non-sm . The square root
transformed cholesterol measurements of the groups of smokers and non-
smokers are not significantly different.