Biostatistics, statistical software VI.Relationship between two continuous variables, correlation, linear regression, transformations.Relationship between two discrete variables, contingency tables, test for independence.Krisztina Boda PhDDepartment of Medical Informatics, University of Szeged
INTERREG
Krisztina Boda
Relationship between two continuous variablescorrelation, linear regression, transformations.
INTERREG
Krisztina Boda
Imagine that 6 students are given a battery of tests by a vocational guidance counsellor with the results shown in the following table:
Variables measured on the same individuals are often related to each other.
INTERREG
Krisztina Boda
Let us draw a graph called scattergram to investigate relationships.Scatterplots show the relationship between two quantitative variables measured on the same cases.In a scatterplot, we look for the direction, form, and strength of the relationship between the variables. The simplest relationship is linear in form and reasonably strong.Scatterplots also reveal deviations from the overall pattern.
INTERREG
Krisztina Boda
Creating a scatterplotWhen one variable in a scatterplot explains or predicts the other, place it on the x-axis.Place the variable that responds to the predictor on the y-axis.If neither variable explains or responds to the other, it does not matter which axes you assign them to.
INTERREG
Krisztina Boda
Possible relationshipspositive correlationnegative correlationno correlation
INTERREG
Diagram1
550
535
535
520
455
420
410
math
math score
language
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Diagram2
51
55
58
63
85
95
math
math score
retailing
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Diagram3
30
60
90
50
30
90
math
math score
theater
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Krisztina Boda
Describing linear relationship with number: the coefficient of correlationCorrelation is a numerical measure of the strength of a linear association. The formula for coefficient of correlation treats x and y identically. There is no distinction between explanatory and response variable. Let us denote the two samples by x1,x2,xn and y1,y2,yn , the coefficient of correlation can be computed according to the following formula
INTERREG
Krisztina Boda
Properties of rCorrelations are between -1 and +1; the value of r is always between -1 and 1, either extreme indicates a perfect linear association. 1r 1.a) If r is near +1 or -1 we say that we have high correlation.
b) If r=1, we say that there is perfect positive correlation. If r= -1, then we say that there is a perfect negative correlation.
c) A correlation of zero indicates the absence of linear association. When there is no tendency for the points to lie in a straight line, we say that there is no correlation (r=0) or we have low correlation (r is near 0 ).
INTERREG
Diagram4
30
60
90
50
30
90
math
math score
theater
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Diagram5
550
535
535
520
455
420
410
math
math score
language
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Diagram6
51
55
58
63
85
95
math
math score
retailing
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Krisztina Boda
Effect of outliersEven a single outlier can change the correlation substantially. Outliers can create an apparently strong correlation where none would be found otherwise, or hide a strong correlation by making it appear to be weak.
r=-0.21r=0.74r=0.998r=-0.26
INTERREG
Diagram1
30
60
90
50
30
90
160
math
math score
theater
Munka1
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe160800
Munka1
math
math score
theater
Munka2
Munka3
Diagram3
30
60
90
50
30
90
math
math score
theater
Munka1
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe160800
Munka1
math
math score
theater
Munka2
math
math score
theater
Munka3
Diagram5
550
535
535
520
455
420
410
math
math score
language
Munka1
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
Munka1
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
Diagram6
550
535
535
520
455
420
410
math
math score
language
Munka1
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
Munka1
math
math score
theater
Munka2
math
math score
theater
Munka3
math
math score
language
math
math score
language
Krisztina Boda
Two variables may be closely related and still have a small correlation if the form of the relationship is not linear.r=2.8 E-15r=0.157
INTERREG
Diagram9
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
30
60
90
50
30
90
160
math
math score
theater
Munka2
30
60
90
50
30
90
math
math score
theater
Munka3
550
535
535
520
455
420
410
math
math score
language
550
535
535
520
455
420
410
math
math score
language
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
y
Diagram8
9
7.84
6.76
5.76
4.84
4
3.24
2.56
1.96
1.44
1
0.64
0.36
0.16
0.04
0
0.04
0.16
0.36
0.64
1
1.44
1.96
2.56
3.24
4
4.84
5.76
6.76
7.84
9
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
30
60
90
50
30
90
160
math
math score
theater
Munka2
30
60
90
50
30
90
math
math score
theater
Munka3
550
535
535
520
455
420
410
math
math score
language
550
535
535
520
455
420
410
math
math score
language
xy
-39
-2.87.84
-2.66.76
-2.45.76
-2.24.84
-24
-1.83.24
-1.62.56
-1.41.96
-1.21.44
-11
-0.80.64
-0.60.36
-0.40.16
-0.20.04
00
0.20.04
0.40.16
0.60.36
0.80.64
11
1.21.44
1.41.96
1.62.56
1.83.240
24
2.24.84
2.45.76
2.66.76
2.87.84
39
y
Krisztina Boda
Correlation and causationa correlation between two variables does not show that one causes the other. Causation is a subtle concept best demonstrated statistically by designed experiments.
INTERREG
Krisztina Boda
Correlation by eyehttp://onlinestatbook.com/stat_sim/reg_by_eye/index.html) . This applet lets you estimate the regression line and to guess the value of Pearson's correlation. Five possible values of Pearson's correlation are listed. One of them is the correlation for the data displayed in the scatterplot. Guess which one it is. To see the correct value, click on the "Show r" button.
INTERREG
Krisztina Boda
When is a correlation high?What is considered to be high correlation varies with the field of application. The statistician must decide when a sample value of r is far enough from zero, that is, when it is sufficiently far from zero to reflect the correlation in the population.
INTERREG
Krisztina Boda
Testing the significance of the coefficient of correlation
The statistician must decide when a sample value of r is far enough from zero to be significant, that is, when it is sufficiently far from zero to reflect the correlation in the population.H0: =0 (greek rho=0, correlation coefficient in population = 0)Ha: 0 (correlation coefficient in population 0)This test can be carried out by expressing the t statistic in terms of r. The following t-statistic has n-2 degrees of freedom
Decision using statistical table: If |t|>t,n-2, the difference is significant at level, we reject H0 and state that the population correlation coefficient is different from 0.If |t| 2.776, we reject H0 and claim that there is a significant linear correlation between the two variables at 5 % level.
INTERREG
Krisztina Boda
Example 2., cont.
INTERREG
Krisztina Boda
Example 3.The correlation coefficient between math skill and theater skill was found r= -0.2157. Is significantly different from 0?H0: the correlation coefficient in population = 0, =0.Ha: the correlation coefficient in population is different from 0.Let's compute the test statistic:
Degrees of freedom: df=6-2=4The critical value in the table is t0.05,4 = 2.776. Because |-0.4418|=0.4418 < 2.776, we do not reject H0 and claim that there is no a significant linear correlation between the two variables at 5 % level.
INTERREG
Krisztina Boda
Example 3., cont.
INTERREG
Krisztina Boda
Prediction based on linear correlation: the linear regression
When the form of the relationship in a scatterplot is linear, we usually want to describe that linear form more precisely with numbers. We can rarely hope to find data values lined up perfectly, so we fit lines to scatterplots with a method that compromises among the data values. This method is called the method of least squares. The key to finding, understanding, and using least squares lines is an understanding of their failures to fit the data; the residualsA straight line that best fits the data: y=bx + ais called regression lineGeometrical meaning of a and b.b: is called regression coefficient, slope of the best-fitting line or regression line;a: y-intercept of the regression line.
INTERREG
Krisztina Boda
Equation of regression line for the data of Example 1.y=1.016x+15.5 the slope of the line is 1.016 Prediction based on the equation: what is the predicted score for language for a student having 400 points in math?ypredicted=1.016 400-15.5=421.9
INTERREG
Krisztina Boda
How to get the formula for the line which is used to get the best point estimates
INTERREG
The general equation of a line is y = a + b x. We would like to find the values of a and b in such a way that the resulting line be the best fitting line. Let's suppose we have n pairs of (xi, yi) measurements. We would like to approximate yi by values of a line . If xi is the independent variable, the value of the line is a + b xi.
We will approximate yi by the value of the line at xi, that is, by a + b xi. The approximation is good if the differences
are small. These differences can be positive or negative, so let's take its square and summarize:
This is a function of the unknown parameters a and b, called also the sum of squared residuals. To determine a and b: we have to find the minimum of S(a,b). In order to find the minimum, we have to find the derivatives of S, and solve the equations
The solution of the equation-system gives the formulas for b and a:
and
It can be shown, using the 2nd derivatives, that these are really minimum places.
Krisztina Boda
Computation of the correlation coefficient from the regression coefficient.There is a relationship between the correlation and the regression coefficient:
where sx, sy are the standard deviations of the samples .From this relationship it can be seen that the sign of r and b is the same: if there exist a negative correlation between variables, the slope of the regression line is also negative . It can be shown that the same t-test can be used to test the significance of r and the significance of b.
INTERREG
Krisztina Boda
Coefficient of determination The square of the correlation coefficient multiplied by 100 is called the coefficient of determination. It shows the percentages of the total variation explained by the linear regression. Example.The correlation between math aptitude and language aptitude was found r =0,9989. The coefficient of determination, r2 = 0.917 . So 91.7% of the total variation of Y is caused by its linear relationship with X .
INTERREG
Krisztina Boda
Regression using transformationsSometimes, useful models are not linear in parameters. Examining the scatterplot of the data shows a functional, but not linear relationship between data.
INTERREG
Krisztina Boda
ExampleA fast food chain opened in 1974. Each year from 1974 to 1988 the number of steakhouses in operation is recorded. The scatterplot of the original data suggests an exponential relationship between x (year) and y (number of Steakhouses) (first plot)Taking the logarithm of y, we get linear relationship (plot at the bottom)
INTERREG
Krisztina Boda
Performing the linear regression procedure to x and log (y) we get the equationlog y = 2.327 + 0.2569 xthat isy = e2.327 + 0.2569 x=e2.327e0.2569x= 1.293e0.2569x is the equation of the best fitting curve to the original data.
INTERREG
Krisztina Boda
log y = 2.327 + 0.2569 xy = 1.293e0.2569x
INTERREG
Krisztina Boda
Types of transformationsSome non-linear models can be transformed into a linear model by taking the logarithms on either or both sides. Either 10 base logarithm (denoted log) or natural (base e) logarithm (denoted ln) can be used. If a>0 and b>0, applying a logarithmic transformation to the model
INTERREG
Krisztina Boda
Exponential relationship ->take log yModel: y=a*10bxTake the logarithm of both sides:lg y =lga+bxso lg y is linear in x
INTERREG
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram7
1.1
1.9
4
8.1
16
y
x
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram8
0.0413926852
0.278753601
0.6020599913
0.9084850189
1.2041199827
y
x
log y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Krisztina Boda
Logarithm relationship ->take log xModel: y=a+lgx
so y is linear in lg x
INTERREG
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram9
0.1
2
3.01
3.9
y
x
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram10
0.1
2
3.01
3.9
y
log10 x
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Krisztina Boda
Power relationship ->take log x and log yModel: y=axbTake the logarithm of both sides:lg y =lga+b lgxso lgy is linear in lg x
INTERREG
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram11
2
16
54
128
y
x
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram12
0.3010299957
1.2041199827
1.7323937598
2.1072099696
log y
log x
log y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Krisztina Boda
Reciprocal relationship ->take reciprocal of xModel: y=a +b/xy=a +b*1/xso y is linear in 1/x
INTERREG
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram13
1.1
0.45
0.333
0.23
0.1999
y
x
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Diagram14
1.1
0.45
0.333
0.23
0.1999
y
1/x
y
adatok
Nameretailtheatermathlanguage
Pat5130525550
Sue5560515535
Inez5890510535
Amie6350495520-0.2157385322
Gene85304304550.7413007121
Bob9590400420
Joe1608004100.9989149375
-0.258397254
adatok
math
math score
theater
nemlineris
math
math score
theater
transzformcik
math
math score
language
math
math score
language
math
math score
retailing
xy
-3-0.1411200081
-2.8-0.3349881502
-2.6-0.5155013718
-2.4-0.6754631806
-2.2-0.8084964038
-2-0.9092974268
-1.8-0.9738476309
-1.6-0.999573603
-1.4-0.98544973
-1.2-0.932039086
-1-0.8414709848
-0.8-0.7173560909
-0.6-0.5646424734
-0.4-0.3894183423
-0.2-0.1986693308
00
0.20.1986693308
0.40.3894183423
0.60.5646424734
0.80.7173560909
10.8414709848
1.20.932039086
1.40.98544973
1.60.999573603
1.80.97384763090.7964355021
20.9092974268
2.20.80849640380.1574204489
2.40.6754631806
2.60.5155013718
2.80.3349881502
30.1411200081
0
0.1986693308
0.3894183423
0.5646424734
0.7173560909
0.8414709848
0.932039086
0.98544973
0.999573603
0.9738476309
0.9092974268
0.8084964038
0.6754631806
0.5155013718
0.3349881502
y
xylg y
01.10.0413926852
11.90.278753601
240.6020599913
38.10.9084850189
4161.2041199827
xylog x
10.10
420.6020599913
83.010.903089987
163.91.2041199827
xylog xlog y
1200.3010299957
2160.30102999571.2041199827
3540.47712125471.7323937598
41280.60205999132.1072099696
xy1/x
11.11
20.450.5
30.3330.3333333333
40.230.25
50.19990.2
y
x
y
y
x
log y
y
x
y
y
log10 x
y
y
x
y
log y
log x
log y
y
x
y
y
1/x
y
Krisztina Boda
Example from the literature
INTERREG
Krisztina Boda
INTERREG
Krisztina Boda
Relationship between two discrete variables, contingency tables, test for independence
INTERREG
Krisztina Boda
Comparison of categorical variables (percentages): 2 tests (chi-square)Example: rates of diabetes in three groups: 31%, 27% and 25%*.Frequencies can be arranged into contingency tables.H0: the occurrence of diabetes is independent of groups (the rates are the same in the population)
INTERREG
Krisztina Boda
2 tests, assumptionsIf H0 is true, the expected frequencies can be computed (Ei=row total*column total/total)2 statistics: 2 =(Oi-Ei)2/EiDF (degrees of freedom: (number of rows-1)*(number of columns-1)Decision based on table: 2 > 2 table, , dfAssumption: cells with expected frequencies