Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | darrell-blair |
View: | 214 times |
Download: | 1 times |
Math 15Introduction to Scientific Data Analysis
Lecture 5Association Statistics & Regression Analysis
University of California, Merced
Week Date Concepts Project Due
1
2 January 28 Introduction to the data analysis
3 February 4 Excel #1 – General Techniques
4 February 11 Excel #2 – Plotting Graphs/Charts Quiz #1
5 February 18 Holiday
6 February 25 Excel #3 – Statistical Analysis Quiz #2
7 March 3 Excel #4 – Regression Analysis
8 March 10 Excel #5 – Interactive Programming Quiz #3
9 March 17 Introduction to Computer Programming - Part - I
March 24 Spring Recesses
10 March 31 Introduction to Computer Programming - Part - II Project #1
11 April 7 Programming – #1 Quiz #4
12 April 14 Programming – #2
13 April 21 Programming – #3 Quiz #5
14 April 28 Programming – #4
15 May 5 Programming - #5 Quiz #6
16 May 12 Movies / Evaluations Project #2
Final May ??? Final Examination
Course Lecture Schedule
Quiz Next Week!
UC Merced 3
Project #1 – Due March 31st, 2008
Projects can be performed individually or in groups of three, with following rules: Teams turn in one project report and get the same grade. A team consists of at most 3 people—no copying between
teams! Team project report must include a title page, where a team
describe each team member’s contribution. 10% bonus for projects done individually Individual projects must not be copied from anyone else No late project will be accepted!
Project #1 will be posted at UCMCROP by Next Monday!
UC Merced 4
Review:Measures of dispersion or variability
Variance or Standard Deviation The one on the left is more dispersed than the one
on the right. It has a higher variance or standard deviation.
Average
Mode
UC Merced 5
Which is more precise measurement?
Although the standard deviation is a good measure of the precision of a given set of data, it can be difficult to compare the standard deviation from two different types of measurements directly.
You might need to do such a comparison to determine the largest source of uncertainty in an experimentally determined answer
446 35.49
Average
mg ml
s (standard Deviation)= 23
s = 4.5
UC Merced 6
Get the Right Tool for the Job!
UC Merced 7
Measures of dispersion or variability
One way to do this comparison A relative standard deviation, RSD, is simply the ratio of the
standard deviation over the mean
446 35.49
Average
mg ml
s = 23
s = 4.5
xRSD
100
RSD = 100x(23/446) = 5.2
RSD = 100x(4.5/35.49) = 12.7
UC Merced 8
Any Questions?
UC Merced 9
Common Practice for Data Analysis
A common task in data analysis is to investigate an association between two variables. To see if two variables vary together
To see how one variable affect another.
Correlation
Regression
UC Merced 10
Correlation
A correlation tells us whether the two variables vary together. i.e. as one goes up the other goes up
(or goes down)
Correlation Coefficient(Pearson product-moment correlation coefficient or Pearson’s
r)
Correlation Coefficient(Pearson product-moment correlation coefficient or Pearson’s
r)
UC Merced 11
Correlation Coefficient
Vary from +1 (perfect correlation) through 0 (no correlation) to -1 (perfect negative correlation)
sales A
9
12
15
18
0 1 2 3 4 5 6 7
day
sa
les
Series 4
0
10
20
30
0 2 4 6 8 10 12Series 1
0
5
10
15
20
0 2 4 6 8 10 12
1r 1r
0r
UC Merced 12
Correlation Coefficient – cont.
Always draw a diagram to check There are no OUTLIERS. If there are outliers,
the following may not apply. The relation is not curved (r only refers to
LINEAR correlation)
r (approx.
)
strength of tendency
what with what
0.9 to 1 strong high y with high x and low y with low x
0.7 to 0.9 some high y with high x and low y with low x
0.3 to 0.7 little high y with high x and low y with low x
-0.3 to 0.3
none neither high nor low y with high or low x
-0.3 to -0.7
little low y with high x and high y with low x
-0.7 to -0.9
some low y with high x and high y with low x
-0.9 to -1 strong low y with high x and high y with low x
UC Merced 13
Excel Function – Correlation Coefficient
= CORREL(array1,array2)or
= PEARSON(array1,array2)
Positive Correlation
Lengths of a leg bone (in cm) in penguin mating pairs
UC Merced 14
Ice cream sales vs. number of people who drown at sea
Month# of Ice cream sales
(in million)# of
Drowning1 0.30 02 0.20 03 0.90 14 1.50 15 2.00 36 3.50 57 5.50 68 8.00 89 7.50 5
10 2.50 111 0.80 012 0.70 0
Correlation Coefficient 0.927
UC Merced 15
Wait!
What kinds of conclusion can we make from the correlation relationship?
UC Merced 16
Examples
Ice cream sales correlate with the number of people who drown at sea. Therefore, ice cream causes people to
drown.
Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. Hence, atmospheric CO2 causes crime.
Not Good Ones!
UC Merced 17
Ice cream sales vs. number of people who drown at sea
Month# of Ice cream sales
(in million)# of
Drowning1 0.30 02 0.20 03 0.90 14 1.50 15 2.00 36 3.50 57 5.50 68 8.00 89 7.50 5
10 2.50 111 0.80 012 0.70 0
Correlation Coefficient 0.927
UC Merced 18
Correlation does not imply causation
There can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A is correlated with B. Correlation Coefficient only tells you whether the
two variables vary together.
Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained.
UC Merced 19
Any Questions?
UC Merced 20
Regression
Regression is used when we have some reasons to believe that changes in one variable cause changes in the other. Correlation coefficient is not evidence for a causal
relationship.
The simplest kind of causal relationship is a straight-line (or linear) relationship.
Linear regression Linear regression
UC Merced 21
Linear regression
Linear regression assumes a linear relationship between two variables: Dependent factor, y, and independent factor, x.
In a mathematical approach, this relationship can be described by the following linear equation:
where a is called the slope and b is called the intercept. This equation, which allows you to calculate y
(dependent) based on x (independent), is based on the least square method.
baxy
UC Merced 22
Review - Math
Linear Equation Slope and Intercept
y20 y = ax + b
a1
10 a1
b
0 1 2 3 4 x
8
3
y = 3x + 8
UC Merced 23
Slope & Intercept formula
Pair Female Male1 17.1 16.52 18.5 17.43 19.7 17.34 16.2 16.85 21.3 19.56 19.6 18.3
Slope 0.5205Intercept 7.8830
Pair Female Male1 17.1 16.52 18.5 17.43 19.7 17.34 16.2 16.85 21.3 19.56 19.6 18.3
Slope =SLOPE(C2:C7,B2:B7)Intercept =INTERCEPT(C2:C7,B2:B7)
Y-values
X-values
Lengths of a leg bone (in cm) in penguin mating pairs
UC Merced 24
y = ax + b
a – slope & b - intercept
Pair Female Male1 17.1 16.52 18.5 17.43 19.7 17.34 16.2 16.85 21.3 19.56 19.6 18.3
Slope 0.5205Intercept 7.8830
Pair Female Male Predicted Male1 17.1 16.5 16.782 18.5 17.4 17.513 19.7 17.3 18.144 16.2 16.8 16.315 21.3 19.5 18.976 19.6 18.3 18.08
Slope 0.5205Intercept 7.8830
X-values
Predicted Y-values
=$C$10*B3+$C$11
123456789
101112
B C
X-value
Don’t forget $ sign!
UC Merced 25
Plot a linear regression (or trend) line – Part 1
16
17
18
19
20
16 17 18 19 20 21 22
Female size (mm)
Mal
e si
ze (
mm
)
Pair Female Male Predicted Male1 17.1 16.5 16.782 18.5 17.4 17.513 19.7 17.3 18.144 16.2 16.8 16.315 21.3 19.5 18.976 19.6 18.3 18.08
16
17
18
19
20
16 17 18 19 20 21 22
Female size (mm)
Mal
e si
ze (
mm
)
You can add a linear regression line
UC Merced 26
Plot a linear regression (or trend) line –Part 2
Pair Female Male1 17.1 16.52 18.5 17.43 19.7 17.34 16.2 16.85 21.3 19.56 19.6 18.3
16
17
18
19
20
16 17 18 19 20 21 22
Female size (mm)
Mal
e si
ze (
mm
)
Right-click on any data point on the graph Choose Add Trendline Click on Options tab, and select Display
equation and Display R-squared. Click “Ok”
Don’t forget to check these two parts!
UC Merced 27
Plot a linear regression (or trend) line –Part 2 – cont.
R2 Value (R-squared value – RSQ) “measure of scatter”
The closer this value comes to 1, the more accurate the prediction.
y = 0.5205x + 7.883
R2 = 0.7767
16
17
18
19
20
16 17 18 19 20 21 22
Female size (mm)
Mal
e si
ze (
mm
)
UC Merced 28
Let’s review the process!
Pair Female Male1 17.1 16.52 18.5 17.43 19.7 17.34 16.2 16.85 21.3 19.56 19.6 18.3
Lengths of a leg bone (in cm) in penguin mating pairs
16
17
18
19
20
16 17 18 19 20 21 22
Female size (mm)
Mal
e si
ze (
mm
)
If there are some reasons to believe some causalities between two variables, then, plot a graph!
Pair Female Male1 17.1 16.52 18.5 17.43 19.7 17.34 16.2 16.85 21.3 19.56 19.6 18.3
Correlation Coefficient
0.881
y = 0.5205x + 7.883R2 = 0.7767
16
17
18
19
20
16 17 18 19 20 21 22
Female size (mm)
Mal
e si
ze (
mm
) Regression
To see if two variables vary together
To see how one variable affect another.
UC Merced 29
Any Questions?