Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
UCLA Department of StatisticsStatistical Consulting Center
Introductory Statistics with R
Presented by Kekona Sorenson [email protected] ,Prepared by: Mine Cetinkaya [email protected]
November 9, 2010
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Outline
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 PreliminariesSoftware InstallationR Help
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Software Installation
Installing R on a Mac
1 Go tohttp://cran.r-project.org/
and select MacOS X
2 Select to download thelatest version: 2.11.0(2010-04-22)
3 Install and Open. The Rwindow should look like this:
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
R Help
R Help
For help with any function in R,put a question mark before thefunction name to determine whatarguments to use, examples andbackground information.
1 ?plot
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data setsLoading data into RViewing data sets in R
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Loading data into R
Loading data into R
Loading a data set into R:
1 survey = read.table("http://www.stat.ucla.
edu/~mine/students_survey_2008. txt",
header = TRUE , sep = "\t")
Displaying the dimensions of the data set:
1 dim(survey)
[1] 1325 29
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Viewing data sets in R
Viewing data sets in R
Displaying the first 3 rows and 5 columns of the data set:
1 survey [1:3 ,1:5]
gender hand eyecolor glasses california
1 female left hazel yes yes
2 male right brown no no
3 female right brown yes yes
Displaying the variable names in the data set:
1 names(survey)
[1] "gender" "hand" "eyecolor" "glasses" "california"
[6] "birthmonth" "birthday" "birthyear" "ageinmonths" "height"
[11] "graduate" "oncampus" "time" "walk" "hsclass"
...
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Viewing data sets in R
Attaching / detaching data frames in RAttaching the variables in a data set::
1 attach(survey)
The following object(s) are masked from package:datasets :
sleep
The warning is telling us that we have attached a data framethat contains a column, whose name is sleep. If you type:
1 sleep
the object with that name in the data frame will be seenbefore another object with the same name that is lower in thesearch() path. Thus, your object is “masking” the other.To detach a data frame, i.e. remove from the search() pathof available R objects - but we won’t do that now.
1 detach(sleep)
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive StatisticsVariable classesDisplaying categorical dataDisplaying quantitative dataDescribing distributions numerically
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 Exercises
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Variable classes
Displaying the class of a variable:
1 class(instructor)
[1] "factor"
Changing the class of a variable:
1 instructor = as.character(instructor)
2 class(instructor)
[1] "character"
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying categorical data
Tables
Tables are useful for displaying the distribution of categoricalvariables.
1 table(gender)
gender
female male
882 443
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying categorical data
Contingency tables
Contingency tables display two categorical variables at a time.
1 table(gender , hand)
hand
gender ambidextrous left right
female 9 67 806
male 11 45 387
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying categorical data
Frequency bar plotsDisplay counts of each category next to each other for easycomparison.
1 barplot(table(gender), main = "Barplot of
Gender")
female male
Barplot of Gender
0200
600
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying categorical data
Relative frequency bar plots
Display relative proportions of each category.
1 barplot(table(gender)/length(gender), main = "
Relative Frequency \n Barplot of Gender")
female male
Relative Frequency
Barplot of Gender
0.0
0.3
0.6
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying categorical data
Segmented bar chartsDisplays two categorical variables at a time.
1 barplot(table(gender , hand), col = c("skyblue"
, "blue"), main = "Segmented Bar Plot \n
of Gender")
2 legend("topleft", c("females","males"), col =
c("skyblue", "blue"), pch = 16, inset =
0.05)
ambidextrous left right
Segmented Bar Plot
of Gender
0200400600800
females
males
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying categorical data
Pie chartsPie charts display counts as percentages of individuals in eachcategory.
1 pct = round(table(gender) / length(gender) *
100)
2 lbls = paste(names(table(gender)), "\n", "%",
pct)
3 pie(table(gender), labels = lbls)
female
% 67
male
% 33
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying quantitative data
HistogramsDisplay the number of cases in each bin
1 hist(ageinmonths , main = "Histogram of Age in
Months")
Histogram of Age in Months
ageinmonths
Frequency
200 250 300 350
0100
200
300
400
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying quantitative data
Relative frequency histogramsDisplay the proportion of of cases in each bin.
1 hist(ageinmonths , freq = FALSE , main = "
Relative Frequency \n Histogram of Age in
Months", xlab = "Age in Months")
Relative Frequency
Histogram of Age in Months
Age in Months
Density
200 250 300 350
0.000
0.010
0.020
0.030
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying quantitative data
Stem-and-Leaf Plots
Preserve individual data values.
1 stem(ageinmonths)
The decimal point is 1 digit(s) to the right of the |
20 | 48
21 | 004444555566666666666666666777777777778888888888889999999999999999
22 | 00000000000000000000000000111111111111111122222222222222222222333333+258
23 | 00000000000000000000000000000000000000000000001111111111111111111111+379
24 | 00000000000000000000000000000000000000000000111111111111111111111111+170
25 | 00000000000001111111111111112222222222222222222223333333344444444445+24
26 | 000000000001111111111222222333334444444444556666778889
27 | 00111222222344566789
28 | 01334558888
29 | 0004569
30 | 267
31 | 02257
32 | 44
33 | 5
34 | 89
35 | 3
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Displaying quantitative data
Boxplots1 boxplot(ageinmonths , main = "Boxplot of Age in
Months")
200
250
300
350
Boxplot of Age in Months
Five Number Summary (Min, Q1, Median, Q3, Max):
1 fivenum(ageinmonths)
[1] 204 228 235 243 353
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Describing distributions numerically
Summary
Categorical variables:
1 summary(hand)
ambidextrous left right
20 112 1193
Quantitative variables:
1 summary(ageinmonths)
Min. 1st Qu. Median Mean 3rd Qu. Max.
204.0 228.0 235.0 237.8 243.0 353.0
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Describing distributions numerically
Measures of center
Mean (arithmetic average):
1 mean(ageinmonths)
[1] 237.8309
Median (value that divides the histogram into two equalareas):
1 median(ageinmonths)
[1] 235
Mode (the most frequent value): for discrete data
1 as.numeric(names(sort(table(ageinmonths),
decreasing = TRUE))[1])
[1] 228
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Describing distributions numerically
Mode (alternative)
To find the mode, you may also use the Mode function in theprettyR package.
1 install.packages("prettyR")
2 library(prettyR)
3 Mode(ageinmonths)
[1] "228"
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Describing distributions numerically
Adding measures to plotsAdding mean and median to a histogram.
1 hist(ageinmonths , main = "Histogram of Age in
Months")
2 abline(v = mean(ageinmonths), col = "blue")
3 abline(v = median(ageinmonths), col = "green")
4 legend("topright", c("Mean", "Median"), pch =
16, col = c("blue", "green"))
Histogram of Age in Months
ageinmonths
Frequency
200 250 300 350
0100
200
300
400
Mean
Median
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Describing distributions numerically
Measures of spread
Range (Min, Max):
1 range(ageinmonths)
[1] 204 353
IQR:
1 IQR(ageinmonths)
[1] 15
Standard deviation:
1 sd(ageinmonths)
[1] 16.03965
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability ModelsGeometricBinomialPoissonNormal
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 Exercises
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Geometric
Geometric distribution
If the probability of success is 0.35, what is the probability that thefirst success will be on the 5th trial?
1 dgeom (4 ,0.35)
[1] 0.06247719
Note: dgeom gives the density (or probability mass function for discrete
variables), pgeom gives the distribution function, qgeom gives the
quantile function, and rgeom generates random deviates. This is true for
the functions used for Binomial, Poisson and Normal calculations as well.
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Binomial
Binomial distribution
If the probability of success is 0.35, what is the probability of
3 successes in 5 trials?
1 dbinom (3 ,5 ,0.35)
[1] 0.1811469
at least 3 successes in 5 trials?
1 sum(dbinom (3:5 ,5 ,0.35))
[1] 0.2351694
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Poisson
Poisson distribution
The number of traffic accidents per week in a small city hasPoisson distribution with mean equal to 3. What is the probabilityof
two accidents in a week?
1 dpois (2,3)
[1] 0.2240418
at most one accident in a week?
1 sum(dpois (0:1 ,3))
[1] 0.1991483
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Normal
Normal distribution
Scores on an exam are distributed normally with a mean of 65 anda standard deviation of 12. What percentage of the students havescores
below 50?
1 pnorm (50 ,65 ,12)
[1] 0.1056498
between 50 and 70?
1 pnorm (70 ,65 ,12)-pnorm (50 ,65 ,12)
[1] 0.5558891
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Normal
Normal distribution (cont.)
What is the 90th percentile of the score distribution?
1 qnorm (.90 ,65 ,12)
[1] 80.37862
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence IntervalsOne sample meansTwo sample meansOne sample proportionsTwo sample proportions
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 Exercises
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
One sample means
Hypothesis testing for one sample means
Is there evidence to suggest that the average age in months forStats 10 students is more than 235 months? Use α = 0.05.
1 sample100 = sample (1:1325 , 100, replace =
FALSE)
2 survey.sub = survey[sample100 ,]
3 t.test(survey.sub$ageinmonths , alternative = "
greater", mu = 235, conf.level = 0.95)
One Sample t-test
data: survey.sub$ageinmonths
t = 1.5922, df = 99, p-value = 0.05726
alternative hypothesis: true mean is greater than 235
95 percent confidence interval:
234.9118 Inf
sample estimates:
mean of x
237.06
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
One sample means
Confidence intervals for one sample means
The t.test function prints out a confidence interval as well.
However this function returns a one-sided interval when thealternative is "greater" or "less".
When alternative = "greater" is chosen the lowerconfidence bound is calculated and the upper bound is givenas Inf by default.
When alternative = "less" is chosen the upperconfidence bound is calculated and the lower bound is givenas -Inf by default.
When alternative = "two.sided" is chosen both theupper and the lower confidence bounds are calculated.
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
One sample means
Confidence intervals for one sample means (cont.)
1 t.test(survey.sub$ageinmonths , alternative = "
two.sided", mu = 235, conf.level = 0.90)
One Sample t-test
data: survey.sub$ageinmonths
t = 1.5922, df = 99, p-value = 0.1145
alternative hypothesis: true mean is not equal to 235
90 percent confidence interval:
234.9118 239.2082
sample estimates:
mean of x
237.06
Note that we changed the confidence level to 0.90 in order tocorrespond to a one-sided hypothesis test with α = 0.05.
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
One sample means
Confidence intervals for one sample means (cont.)
Alternative calculation of confidence interval:
1 onesample.mean.ci = function(x, conf.level){
2 tstar = -qt(p = ((1 - conf.level)/2), df = (
length(x) - 1))
3 xbar = mean(x)
4 sexbar = sd(x) / sqrt(length(x))
5 cilower = xbar - tstar * sexbar
6 ciupper = xbar + tstar * sexbar
7 return(list = c(cilower , ciupper))
8 }
9 onesample.mean.ci(survey.sub$ageinmonths ,
0.90)
[1] 234.9118 239.2082
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Two sample means
Hypothesis testing and CI for two sample meansIs there a difference between the ages of females and males?Construct a 95% confidence interval for the difference between theaverage ages of females and males.
1 t.test(survey.sub$ageinmonths[survey.
sub$gender == "female"], survey.
sub$ageinmonths[survey.sub$gender == "male
"], alternative = "two.sided", conf.level
= 0.95)
Welch Two Sample t-test
data: survey.sub$ageinmonths[survey.sub$gender == "female"] and
survey.sub$ageinmonths[survey.sub$gender == "male"]
t = 1.25, df = 95.736, p-value = 0.2143
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.765100 7.768572
sample estimates:
mean of x mean of y
238.1406 235.1389
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
One sample proportions
Hypothesis testing for one sample proportions
64 out of 100 students in a random sample are females. Is thereevidence to suggest that the population proportion of females isless than 65%? Use a 90% confidence level.
1 prop.test(64, 100, p = 0.65, alternative = "
less", conf.level = 0.90)
1-sample proportions test with continuity correction
data: 64 out of 100, null probability 0.65
X-squared = 0.011, df = 1, p-value = 0.4583
alternative hypothesis: true p is less than 0.65
90 percent confidence interval:
0.0000000 0.7035286
sample estimates:
p
0.64
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
One sample proportions
Confidence intervals for one sample proportions
Just like the t.test, the prop.test function will calculate boththe upper and the lower bounds of the confidence interval onlywhen alternative = "two.sided" is chosen. Otherwise a lowerbound of 0 or an upper bound of 1 is produced.
1 prop.test(64, 100, p = 0.65, alternative = "
two.sided", conf.level = 0.80)
1-sample proportions test with continuity correction
data: 64 out of 100, null probability 0.65
X-squared = 0.011, df = 1, p-value = 0.9165
alternative hypothesis: true p is not equal to 0.65
80 percent confidence interval:
0.5715825 0.7035286
sample estimates:
p
0.64
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Two sample proportions
Hypothesis testing and CI for two sample proportions
54 out of 64 females and 32 out of 36 males are right handed. Isthere evidence to suggest that proportions of males and femaleswho are right handed are different?
1 prop.test(c(54 ,32), c(64 ,36))
2-sample test for equality of proportions with continuity correction
data: c(54, 32) out of c(64, 36)
X-squared = 0.1051, df = 1, p-value = 0.7458
alternative hypothesis: two.sided
95 percent confidence interval:
-0.2026789 0.1124012
sample estimates:
prop 1 prop 2
0.8437500 0.8888889
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear RegressionScatterplots, Association, and CorrelationSimple Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Scatterplots, Association, and Correlation
ScatterplotsIs there an association between amount of alcohol consumed andmaximum speed?
1 plot(speed ~ alcohol , main = "Scatterplot of
Speed vs. Alcohol", pch = 20, cex = 0.5)
0 20 40 60 80
050
100
150
Scatterplot of Speed vs. Alcohol
alcohol
speed
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Scatterplots, Association, and Correlation
Correlation
1 cor(alcohol , speed , use = "pairwise.complete.
obs")
[1] 0.2309745
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Simple Linear Regression
Simple Linear Regression
Build a linear regression model predicting speed from alcohol.
1 summary(lm(speed~alcohol))
Call:
lm(formula = speed ~ alcohol)
Residuals:
Min 1Q Median 3Q Max
-90.769 -8.725 1.275 11.275 91.541
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.7248 0.6511 136.261 <2e-16 ***
alcohol 0.9469 0.1108 8.549 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 21.83 on 1297 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.05335, Adjusted R-squared: 0.05262
F-statistic: 73.09 on 1 and 1297 DF, p-value: < 2.2e-16
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Online Resources for R
Download R: http://cran.stat.ucla.edu/
Search Engine for R: rseek.org
R Reference Card:http://cran.r-project.org/doc/contrib/Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Online Resources for R
Download R: http://cran.stat.ucla.edu/
Search Engine for R: rseek.org
R Reference Card:http://cran.r-project.org/doc/contrib/Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Online Resources for R
Download R: http://cran.stat.ucla.edu/
Search Engine for R: rseek.org
R Reference Card:http://cran.r-project.org/doc/contrib/Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Online Resources for R
Download R: http://cran.stat.ucla.edu/
Search Engine for R: rseek.org
R Reference Card:http://cran.r-project.org/doc/contrib/Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Online Resources for R
Download R: http://cran.stat.ucla.edu/
Search Engine for R: rseek.org
R Reference Card:http://cran.r-project.org/doc/contrib/Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Upcoming Mini-Courses
November 16th, R Stats II: Linear Regression
November 23rd, R Stats III: Generalized Linear Models
For a schedule of all mini-courses offered please visithttp:// scc.stat.ucla.edu/mini-courses .
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Thank youAny questions?
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
1 Preliminaries
2 Data sets
3 Descriptive Statistics
4 Probability Models
5 Hypothesis Testing and Confidence Intervals
6 Linear Regression
7 Online Resources for R
8 Upcoming Mini-Courses
9 ExercisesPresented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Exercises
1 Construct side-by-side box plots for the distribution of amountof time it takes students to get to class (time) by their meansof transportation (walk).
2 Usually younger students live on campus and older studentslive off campus. Is there evidence to suggest this trend in thisdata set? (Use a random sample of 100 students andα = 0.05.)
3 Calculate a 90% confidence interval for the difference betweenthe average ages of students who live on campus and offcampus.
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Solution to Exercise 1
1 boxplot(time ~ walk , main = "Time to get to
class \n by type of transportation")
bicycle bus car (by yourself) carpool motorcycle other segway skateboard walk
050
100
150
Time to get to class
by type of transportation
Minutes
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Solution to Exercise 2
1 t.test(survey.sub$ageinmonths[survey.
sub$oncampus == "yes"], survey.
sub$ageinmonths[survey.sub$oncampus == "no
"], alternative = "less", conf.level =
0.95)
Welch Two Sample t-test
data: survey.sub$ageinmonths[survey.sub$oncampus == "yes"] and
survey.sub$ageinmonths[survey.sub$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 2.964e-06
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises
Solution to Exercise 3
1 t.test(survey.sub$ageinmonths[survey.
sub$oncampus == "yes"], survey.
sub$ageinmonths[survey.sub$oncampus == "no
"], alternative = "two.sided", conf.level
= 0.90)
Welch Two Sample t-test
data: survey.sub$ageinmonths[survey.sub$oncampus == "yes"] and
survey.sub$ageinmonths[survey.sub$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 5.929e-06
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
-20.92402 -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000
Presented by Kekona Sorenson [email protected] , Prepared by: Mine Cetinkaya [email protected]
Introductory Statistics with R UCLA SCC