Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | shanon-jacobs |
View: | 214 times |
Download: | 1 times |
MAKE SURE YOU HAVE…
MULTIPLE PENCILS… 1 OR 2 CALCULATORS… EXTRA BATTERIES…
EAT A FULL MEAL BEFOREHAND!
Statistics is about…
Models A model is an attempt to represent reality…
…but we know it’s not perfect.
Avoid confusion …• known vs unknowable• samples vs populations• statistics vs parameters
THINK about Models
Probability Models: Normal Model Geometric & Binomial
t - Models Chi-Square Models
THINK about Models
Range is…
A way to measure the “spread” of data
It is a single number, such as 40. (not 30-70)
Variance, standard deviation, IQR are other ways to measure the “spread” of data.
Adding a constant to every value in a data set…
Changes the central location of the data (such as mean or median)
Does NOT change the spread of the data (such as standard deviation, variance, IQR)
Multiplying by a constant to every value in a data set…
Changes the central location of the data (such as the mean or the median)
DOES CHANGE the spread of the data (such as standard deviation, variance, IQR)
When describing bivariate quantitative data… Form! Strength! Direction!
This is when describing a scatterplot, linear regression, or the like...
A residual is… The vertical distance from point to
LSRL It is calculated as
Observed Y – Predicted Y
All points ABOVE the LSRL have positive residuals!
All points BELOW the LSRL have negative residuals!
You will see the word “COMPARE” at least once…
So COMPARE!!! Don’t list attributes… use
comparative words!
“Correlation does not imply causation!”
Be careful with the word “cause”. The only way to prove causation is a properly designed experiment.
R is called the… “Correlation Coefficient”
Or just…. “Correlation”
It measure the strength and direction of a linear relationship (no context for nonlinear relationships)
R-squared is called…
“Coefficient of Determination”
And is interpreted as “the % of the variation in [y] that is explained by [x]”
It can also be thought of as sum of explained error/sum of total error
If r squared is .64
Then r = .8 OR r= -.8
Figure it out by looking at the direction of the scatterplot!
Simpson’s Paradox is…
When combining the data from 2 groups results in a reversal of direction of the conclusion.
Double Blind
Neither the subjects nor the experimenter know which treatment the subject is receiving
Central Limit Theorem
As sample size increases, the shape of the sampling distribution
gets more and more normal, regardless of the shape of the parent distribution.
The center (mean) of the sampling distributions stays exactly the same.
The variability in the sampling distribution (standard deviation) decreases.
Notation is communication
a, b, n, p, q, r, s, t, x, y, z, E, H, P, π, , all have special meanings…
…and “hats” or “bars” change those meanings.
You are not free to substitute another letter even though it looks like algebra.
4 Requirements of a Binomial Setting are…
Independence Success/Failure for each trial Equal probability of success for
each trial Fixed number of trials
The only difference between binomial setting and geometric setting is…
Geometric does not have a fixed number of trials, it is waiting for the first success…
The mean of a geometric distribution is…
1/p
(this is not on your formula sheet, but you should know it)
To calculate a geometric probability, use a tree diagram
Undercoverage Bias
When some groups are systematically left out of the sampling process (like people without phones in a phone survey)
P of your “Phantoms” is “Define the parameter”. So Define your parameter (either p or u) as specifically as possible.
The grader of your test…
Doesn’t know what “PANIC” and “PHANTOMS” are… they are simply for your own organization.
Your hypotheses must…
Be about the PARAMETERS. Why make a hypothesis about the
sample? (X bar and p hat shouldn’t ever be
in the Hypotheses).
When you do inference…
You hypothesize about what the value of a single number is (that number is usually u, or p).
What IS an Assumption?
an underlying hypothesis about the situation required by the mathematical justification for the statistical method.
WE WILL PROBABLY NEVER KNOW
IF AN ASSUMPTION IS TRUE.
Draw the graph if you are given the data set! An outlier/skewness can dramatically affect your Test Statistic.
Your sample should be an SRS of the POPULATION OF INTEREST.
Be specific.
If the sample is randomly selected and unbiased, then we can generalize the findings to the population
We rarely know the population standard deviation, so the only time you do z-tests will probably be when using proportions.
Don’t Forget…
Degrees of Freedom! -for all t-statistics and Chi-Square statistics. ***(r-1)(c-1) for Chi-Square Tests for
Homogeneity and Chi-Square Tests for Independence
***n-2 for LinReg T Test *** n-1 for all others.
Can you find and interpret critical values?
1-sided or 2-sided? Draw a picture! Degrees of freedom?
Critical values interpret to be the number of std. devs. away from the mean, usually in order to reject a hypothesis
Your Calculations in PHANTOMS should…
Include equation and values you are plugging in…
Normal curves will earn you a PLUS (but don’t draw a normal curve for Chi-Square Distributions, they are ALWAYS skewed.
Interpretation of Confidence Interval:
“We are 95% confident that _____ falls within the interval _________”
Interpretation of Confidence Level
“If we repeated this sampling and calculation process many times, 95% of all calculated intervals will correctly contain ______.”
Interpretation of the P-Value
“A p-value of ______ indicates that if Ho is true, then we would obtain a sample statistics as extreme as ours less than (more than) ____% of the time due to random chance alone.”
You will be given an essay question that asks you to describe an experiment… Be thorough!
Your answer must include Repetition, Randomization, Control, and Comparison.
Don’t abbreviate “R.A.” Spell it out! – They will assume you don’t know what you are doing, but you do!
SAY WHAT YOU ARE COMPARING!
When describing an experiment or simulation…
Simply stating one of these things is not enough… “use a random number generator…” “use a table or random digit table…” “use a coin to randomly select…”
YOU MUST DESCRIBE YOUR PROCEDURE IN DETAIL
Describing random assignment using a RDT…
Label subjects to digits Peel digits Assign subjects to treatments
How will each of these affect the power of a test?…
Using a larger alpha? Using a larger sample size? Using a smaller sigma? Choosing an alternative that is further
from the population mean? Increasing Type 1 error? Decreasing Type 2 error? ……They all will Increase the Power!!!
Show all your work, even for little calculations!
AND WRITE NEATLY! Even show your work if the answer
is 2+2 = 4
Please use…
All 90 minutes of each section…
It cant hurt you and you cant go anywhere…
Once you think you are done, go back and try to find at least 1 problem to change/correct/improve.
The Investigative Task…
Will be difficult- but it will be difficult for everybody.
Even if you can get a 2 on it, you are ahead of the game.
If you don’t know an answer to part A of a question…
Say “Suppose Part A is 3.6”… just so you can go on to part B…
You can still get full credit for Part B if Part A is incorrect.
If you want to review one more thing….
Review mean and variance of a discrete random variable (last pages in your review packet)
The answers and explanations are included in the packet.