eNote 1 1
eNote 1
Introduction to R
Updated: 01/02/16 kl. 16:10
eNote 1 INDHOLD 2
Indhold
1 Introduction to R 11.1 Getting started with R and Rstudio . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Console and scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Assignments and vectors . . . . . . . . . . . . . . . . . . . . . . . . 41.1.3 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.4 Use of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Basic plotting, graphics - data visualisation . . . . . . . . . . . . . . . . . . 71.2.1 Frequency distributions and the histogram . . . . . . . . . . . . . . 81.2.2 Cumulative distributions . . . . . . . . . . . . . . . . . . . . . . . . 101.2.3 The Box-Plot and the modified Box-Plot . . . . . . . . . . . . . . . . 131.2.4 The Scatter plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.2.5 Bar plots and Pie charts . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.6 More plots in R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.2.7 R in 27411 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.2.8 Storage of Text and Graphics . . . . . . . . . . . . . . . . . . . . . . 221.2.9 Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 Introduction day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.3.1 Height-weight example from introstat eNote1 . . . . . . . . . . . . 231.3.2 Report ready tables with xtable . . . . . . . . . . . . . . . . . . . . 241.3.3 Height-weight example - continued . . . . . . . . . . . . . . . . . . 261.3.4 Height-weight example - continued: with some details related to
PCA and SVD - singular value decomposition (Appendix 2.6 and2.7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4 Something more on correlation and covariance . . . . . . . . . . . . . . . . 341.5 Some additional matrix scatterplotting for the Varmuza toy data in 2.6.3 . 361.6 Matrix scatterplotting the mtcars data . . . . . . . . . . . . . . . . . . . . . 431.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
eNote 1 1.1 GETTING STARTED WITH R AND RSTUDIO 3
1.1 Getting started with R and Rstudio
The program R is an open source statistics program that you can download to yourown laptop for free. Go to http://mirrors.dotsrc.org/cran/ and select your platform(Windows, Mac, or Linux) and follow instructions.
RStudio is a free and open source integrated development environment (IDE) for R.You can run it on your desktop (Windows, Mac, or Linux) or even over the web usingRStudio Server. It works as (an extended) alternative to running R in the basic way. Thiswill be used in the course. Download it from http://www.rstudio.com/ and followinstallation instructions. To use the software, you only need to open Rstudio (not R
itself).
1.1.1 Console and scripts
Once you have opened Rstudio, you will see a number of different windows. One ofthem is the console. Here you can write commands and execute them by hitting Enter.For instance:
> ## Adding numbers in the console
> 2+3
[1] 5
In the console you cannot go back and change previous commands and neithercan you save your work for later. To do this you need to write a script. Go toFile→ New→ R Script. In the script you can write a line and execute it in theconsole by hitting Ctrl+R (Windows) or Cmd+Enter (Mac). You can also markseveral lines and execute them all at the same time.
eNote 1 1.1 GETTING STARTED WITH R AND RSTUDIO 4
1.1.2 Assignments and vectors
If you want to assign a value to a variable, you can use = or <-. The latter is the preferredby R-users, so for instance:
> y <- 3
It is often useful to assign a set of values to a variable like a vector. This is done with thefunction c (short for concatenate).
> x <- c(1, 4, 6, 2)
> x
[1] 1 4 6 2
Use the colon :, if you need a sequence, e.g. 1 to 10:
> x <- 1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
You can also make a sequence with a specific stepsize different from 1 with seq(from,
to, stepsize):
> x <- seq( 0, 1, by = 0.1)
> x
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
If you are in doubt of how to use a certain function, the help page can be opened bytyping ? followed by the function, e.g. ?seq.
eNote 1 1.1 GETTING STARTED WITH R AND RSTUDIO 5
1.1.3 Descriptive statistics
All the basic summary statistics measures can be found as functions or part of functionsin R:
• mean(x) - mean value of the vector x
• var(x) - variance
• sd(x) - standard deviation
• median(x) - median
• quantile(x,p) - finds the pth quantile. p can consist of several different values,e.g. quantile(x,c(0.25,0.75)) or quantile(x,c(0.25,0.75), type=2)
• cov(x, y) - the covariance of the vectors x and y
• cor(x, y) - the correlation
Please again note that the words quantiles and percentiles are used interchangeably - theyare essentially synonyms meaning exactly the same, even though the formal distinctionhas been clarified earlier.
Example 1.1
Consider some n = 10 data on student heights. We can read these data into R and computethe sample mean and sample median as follows:
## Sample Mean and Median
x <- c(168, 161, 167, 179, 184, 166, 198, 187, 191, 179)
mean(x)
[1] 178
median(x)
[1] 179
eNote 1 1.1 GETTING STARTED WITH R AND RSTUDIO 6
The sample variance and sample standard deviation are found as follows:
## Sample variance and standard deviation
var(x)
[1] 149.1111
sqrt(var(x))
[1] 12.21111
sd(x)
[1] 12.21111
The sample quartiles can be found by using the quantile function as follows:
## Sample quartiles
quantile(x, type = 2)
0% 25% 50% 75% 100%
161 167 179 187 198
The option “type=2” makes sure that the quantiles found by the function is found usingthe definition given in the basic section of the eNote1 of the introstat course. By default, thequantile function would use another definition (not detailed here). Generally, we considerthis default choice just as valid as the one explicitly given here, it is merely a different one.Also the quantile function has an option called “probs” where any list of probability valuesfrom 0 to 1 can be given. For instance:
## Sample quantiles 0%, 10%,..,90%, 100%:
quantile(x, probs = seq(0, 1, by = 0.10), type = 2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
161.0 163.5 166.5 168.0 173.5 179.0 184.0 187.0 189.0 194.5 198.0
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 7
1.1.4 Use of R
Apart from access to probability distributions, the R-software can be used in severalways in our course (and in your future engineering activity)
1. As a pocket calculator substitute - that is making R calculate ”manually” - by simp-le routines - plus, minus, squareroot etc. whatever needs to be calculated, that youhave identified by applying the right formulas from the proper definitions andmethods in the written material.
2. As a ”statistical analysis machine” where with some data fed into it, it will, byinbuilt functions and procedures do all relevant computations for you and presentthe final results in some overview tables and plots.
3. As a high level graphics tool - using it for visualizing both data and models.
We will see and present all types of applications of R during the course, and any kindof flexibility jumping between the three is possible.
It must be stressed that even though the program is able to calculate things for the user,understanding the background of the calculations must NOT be forgotten - understan-ding the methods will always be good.
Remark 1.2 R is not a substitute for your brain activity in this course!
The software R should be seen as the most fantastic and easy computational com-panion that we can have for doing statistical computation. A good question to askyourself each time that you apply en inbuilt R-function is: ”Do I really understandwhat R is computing for me now?”
1.2 Basic plotting, graphics - data visualisation
A really important part of working with data analysis is the visualisation of as wellthe raw data as of the results of the statistical analysis. Let us focus on the first partnow. Depending on the data at hand different types of plots and graphics could berelevant. One can distinguish between quantitative and categorical data. We will touchon the following type of basic plots:
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 8
• Quantitative data:
– Frequency plots and histograms
– Boxplots
– Cumulative distribution
– Scatter plot (xy plot)
• Categorical data:
– Bar charts
– Pie charts
1.2.1 Frequency distributions and the histogram
The frequency distribution of the data for a certain grouping of the data is nicely depi-cted by the histogram, which is a barplot of either raw frequencies for some number ofclasses.
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 9
Example 1.3
Consider again the n = 10 data from Example 1.1.
## A histogram of the heights:
hist(x)
Histogram of x
x
Fre
quen
cy
160 170 180 190 200
01
23
4
The default histogram uses equidistant class widths (the same width for all classes)and depicts the raw frequencies/counts in each class. One may change the scale intoshowing what we will learn to be densities, that is dividing the raw counts as well by nas by the class width:
Density in histogram =Class counts
n · (Class width)
In a density histogram the area of all the bars add up to 1.
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 10
Example 1.4
## A density histogram of the heights:
hist(x, freq = FALSE, col = "red", nclass = 8)
Histogram of x
x
Den
sity
160 170 180 190 200
0.00
0.02
0.04
0.06
The R-function hist makes some choice of the number of classes based on the number ofobservations - it may be changed by the user option nclass as illustrated here, althoughthe original choice seems better in this case due to the very small data set.
1.2.2 Cumulative distributions
The cumulative distribution can be visualized simply as the cumulated relative frequen-cies either across data classes, as also used in the histogram, or individual data points,which is then called the empirical cumulative distribution function:
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 11
Example 1.5
plot(ecdf(x), verticals = TRUE)
160 170 180 190 200
0.0
0.4
0.8
ecdf(x)
x
Fn(
x)
●
●
●
●
●
●
●
●
●
The empirical cumulative distribution function Fn is a step function with jumps i/n atobservation values, where i is the number of identical(tied) observations at that value.
For observations (x1, x2, . . . , xn), Fn(x) is the fraction of observations less or equal to x,i.e.,
Fn(x) =#{xi ≤ x}
n
On amazing thing with R are the thousands of available and free/open source extrapackages where one can find basically anything that you could imagine.
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 12
Be sure that you have installed the qcc package on your local computer beforeyou try to carry out the R-code in the next example! As this is the first time inthe material that we explicitly refer to and use an add-on package, here is theinstruction on how to install the qcc package:
1. Make sure that you are online
2. In the top of the lower right window of Rstudio, click Packages, clickInstall
3. Write ”qcc” in the empty field (without the quotation marks)
4. Click Install
Alternatively simply run install.packages("qcc") at the command prompt
Example 1.6
## A Pareto diagram based on the class counts from the hist-function:
library(qcc)
myhist <- hist(x,plot=FALSE)
mycounts=myhist$counts
names(mycounts)=myhist$breaks[-1]
pareto.chart(mycounts)
170
180
190
200
Pareto Chart for mycounts
Fre
quen
cy
●
●
●
●
02
46
8
0%50
%10
0%
Cum
ulat
ive
Per
cent
age
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 13
1.2.3 The Box-Plot and the modified Box-Plot
The so-called boxplot in its basic form depicts the five quartiles (min, Q1, median, Q3,max) with a box from Q1 to Q3 emphasizing the Inter Quartile Range (IQR):
Example 1.7
## A basic boxplot of the heights: (range=0 makes it "basic")
boxplot(x, range = 0, col = "red", main = "Basic boxplot")
text(1.3, quantile(x), c("Minimum","Q1","Median","Q3","Maximum"), col="blue")
160
170
180
190
Basic boxplot
Minimum
Q1
Median
Q3
Maximum
In the modified boxplot the whiskers only extend to the largest/smallest observationif they are not too far away from the box: defined to be 1.5× IQR. These extreme ob-servations will be plotted individually, and in other words the whisker extends to thelargest/smallest observations within a distance of 1.5× IQR of the box (defined as eit-her 1.5× IQR larger than Q3 or 1.5× IQR smaller than Q1)
Example 1.8
If we add an extreme observation, 235cm, to the heights data, and then both make the so-called modified boxplot - the default in R - and the basic one, we get: (note that since there
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 14
are no extreme observations among the original 10 observations, the two ”different” plots areactually the same, so we cannot illustrate the difference without having at least one extremedata point)
boxplot(c(x, 235), col = "red", main = "Modified boxplot")
text(1.4, quantile(c(x, 235)), c("Minimum","Q1","Median","Q3","Maximum"),
col = "blue")
boxplot(c(x, 235), col = "red", main = "Basic boxplot", range = 0)
text(1.4, quantile(c(x, 235)),c("Minimum","Q1","Median","Q3","Maximum"),
col = "blue")
●
160
180
200
220
Modified boxplot
MinimumQ1
Median
Q3
Maximum
160
180
200
220
Basic boxplot
MinimumQ1
Median
Q3
Maximum
The boxplot hence is an alternative to the histogram in visualising the distribution ofthe data. It is a convenient way of comparing distributions in different groups, if suchdata is at hand.
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 15
Example 1.9
This example shows some ways of working with R to illustrate data.
In another sample of a statistics course participants the following heights of 17 females and23 males were found:
Males 152 171 173 173 178 179 180 180 182 182 182 185185 185 185 185 186 187 190 190 192 192 197
Females 159 166 168 168 171 171 172 172 173 174 175 175175 175 175 177 178
The two modified boxplots to visualize the height sample distributions for each gender canbe constructed by a single call to the boxplot function:
Males <- c(152, 171, 173, 173, 178, 179, 180, 180, 182, 182, 182, 185,
185 ,185, 185, 185 ,186 ,187 ,190 ,190, 192, 192, 197)
Females <-c(159, 166, 168 ,168 ,171 ,171 ,172, 172, 173, 174 ,175 ,175,
175, 175, 175, 177, 178)
boxplot(list(Males, Females), col = 2:3, names = c("Males", "Females"))
●
●
Males Females
160
170
180
190
At this point, it should be noted that in real work with data using R, one would generallynot import data into R by explicit listings in an R-script file as done here. This only works
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 16
for very small data set like this. The more realistic approach is to import the data fromsomewhere else, e.g. from a spread sheet program such as Microsoft Excel.
Example 1.10
Some gender grouped student heights data is available as a .csv-file via http://www2.
compute.dtu.dk/courses/introstat/data/studentheights.csv. The structure of the da-ta file, as it would appear in Excel is two columns and 40+1 rows including a header row:
1 Height Gender
2 152 male
3 171 male
4 173 male
. . .
. . .
24 197 male
25 159 female
26 166 female
27 168 female
. . .
. . .
39 175 female
40 177 female
41 178 female
The data can now be imported into R by the read.table function:
studentheights <- read.table("studentheights.csv", sep = ";", dec = ".",
header = TRUE)
The resulting object studentheights is now a so-called data.frame, which is the R-name fordata sets within R. There are some ways of getting a quick look at what kind of data is reallyin a data set:
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 17
## Have a look at the first 6 rows of the data:
head(studentheights)
Height Gender
1 152 male
2 171 male
3 173 male
4 173 male
5 178 male
6 179 male
## Get a summary of each column/variable in the data:
summary(studentheights)
Height Gender
Min. :152.0 female:17
1st Qu.:172.8 male :23
Median :177.5
Mean :177.9
3rd Qu.:185.0
Max. :197.0
For quantitative variables we get the quartiles and the mean. For categorical variables wesee (some of) the category frequencies. Such a data structure like this would be the mostcommonly encountered (and needed) for statistical analysis of data. The gender groupedboxplot could now be done by the following:
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 18
boxplot(Height ~ Gender, data = studentheights, col=2:3)
●
●
female male
160
180
The R-syntax Height ~ Gender with the tilde symbol “~” is one that we will use a lot invarious contexts such as plotting and model fitting. In this context it can be understood as“Height is plotted as a function of Gender”.
1.2.4 The Scatter plot
The scatter plot can be used when there are two quantitative variables at hand, and issimply one variable plotted versus the other using some plotting symbol.
Example 1.11
Now we will use a data set available as part of R itself. Both base R and many addon R-packages includes data sets, that can be used for testing, trying and practicing. Here we willuse the mtcars data set. If you write:
?mtcars
you will be able to read the following as part of the help info:
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 19
“The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumptionand 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). A data fra-me with 32 observations on 11 variables. Source: Henderson and Velleman (1981), Building multipleregression models interactively. Biometrics, 37, 391-411.”
Let us plot the gasoline use, (mpg=miles pr. gallon), versus the weigth (wt):
## To make 2 plots on a single plot-region:
par(mfrow=c(1,2))
## First the default version:
plot(mtcars$wt, mtcars$mpg)
## Then a nicer version:
plot(mpg ~ wt, xlab = "Car Weight (1000lbs)", data = mtcars,
ylab = "Miles pr. Gallon", col = factor(am),
sub = "Red: manual transmission", main = "Inverse fuel usage vs. size")
● ●●
●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
2 3 4 5
1020
30
mtcars$wt
mtc
ars$
mpg
● ●●
●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
2 3 4 5
1020
30
Inverse fuel usage vs. size
Red: manual transmissionCar Weight (1000lbs)
Mile
s pr
. Gal
lon
In the second plot call we have used the so-called formula syntax of R, that was intro-duced above for the grouped boxplot. Again, it can be read: “mpg is plotted as a functionof wt.”
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 20
1.2.5 Bar plots and Pie charts
All the plots described so far were for quantitative variables. For categorical variablesthe natural basic plot would be a bar plot or pie chart visualizing the relative frequenciesin each category.
Example 1.12
For the gender grouped student heights data we can plot the gender distribution:
## Barplot:
barplot(table(studentheights$Gender), col=2:3)
female male
05
1015
20
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 21
## Pie chart:
pie(table(studentheights$Gender), cex=1, radius=1)
female
male
1.2.6 More plots in R?
A good place for getting more inspired on how to do easy and nice plots in R is: http://www.statmethods.net/.
1.2.7 R in 27411
Some of the methods in this course can be covered with R-tools that can be found inthe base R installation and some are available in various add-on R-packages. And notuncommonly, the methods will be available in more than a single version as there bynow are thousands of R-packages available. This R-note should be seen more as a helpto find a good way through (or into) the tools, help and material already available thanproviding lots of specific new tutorial help in itself. So mostly ”copy-and-paste”fromand links to other sources! The two main sources of inspiration for us are the two R-packages chemometrics and ChemometricswithR which again are companions to thetwo books:
eNote 1 1.2 BASIC PLOTTING, GRAPHICS - DATA VISUALISATION 22
• K. Varmuza and P. Filzmoser (2009). Introduction to Multivariate Statistical Ana-lysis in Chemometrics, CRC Press.
• Ron Wehrens (2012). Chemometrics With R: Multivariate Data Analysis in the Na-tural Sciences and Life Sciences. Springer, Heidelberg.
Both of these books are available as ebooks: (that you can access using you DTU ID):
• http://www.crcnetbase.com.globalproxy.cvt.dk/isbn/978-1-4200-5947-2
• http://link.springer.com.globalproxy.cvt.dk/book/10.1007/978-3-642-17841-2/page/1
We will mostly in what follows base our introduction on the Varmuza and Filzmoserbook.
1.2.8 Storage of Text and Graphics
Text from windows in R can be copied into other programs in the usual manner:
• mark the text by holding the left mouse button down and drag it over the desiredtext.
• open another program (e.g. StarOffice), place the pointer at the desired locationand press the middle mouse button.
All text in the ’Commands Window’ or the ’Report Window’ can be stored in a text-fileby activating the window and choose ’File’→ ’Save As . . . ’.
Graphics can be stored in a graphics-file by activating the graphic window and choose’File’→ ’Save As . . . ’. It is possible to choose from a range of graphics formats (JPEG isthe default). One may also define explicitly graphics devices, e.g. ”pdf”to directly writethe graphics to a pdf-file.
1.2.9 Scatterplots
Have a look at: http://www.statmethods.net/graphs/scatterplot.html
eNote 1 1.3 INTRODUCTION DAY 23
1.3 Introduction day
1.3.1 Height-weight example from introstat eNote1
(http://introstat.compute.dtu.dk/enote/afsnit/NUID172/) Illustrating centering andscaling cf. Varmuza & Filzmoser, sec. 2.2.2.
# Reading data:
X1 <- c(168, 161, 167, 179, 184, 166, 198, 187, 191, 179)
X2 <- c(65.5, 58.3, 68.1, 85.7, 80.5, 63.4, 102.6, 91.4, 86.7, 78.9)
Basic means and sds:
mean(X1)
[1] 178
mean(X2)
[1] 78.11
sd(X1)
[1] 12.21111
sd(X2)
[1] 14.07184
Centered and standardized data:
eNote 1 1.3 INTRODUCTION DAY 24
# Only centering:
X_cent1 <- X1-mean(X1)
X_cent2 <- X2-mean(X2)
# Standardization
X_auto1 <- X_cent1/sd(X1)
X_auto2 <- X_cent2/sd(X2)
# Table 2.1:
Tab21 <- cbind(X1, X2, X_cent1, X_cent2, X_auto1, X_auto2)
Tab21
X1 X2 X_cent1 X_cent2 X_auto1 X_auto2
[1,] 168 65.5 -10 -12.61 -0.81892664 -0.89611621
[2,] 161 58.3 -17 -19.81 -1.39217528 -1.40777654
[3,] 167 68.1 -11 -10.01 -0.90081930 -0.71134998
[4,] 179 85.7 1 7.59 0.08189266 0.53937526
[5,] 184 80.5 6 2.39 0.49135598 0.16984280
[6,] 166 63.4 -12 -14.71 -0.98271196 -1.04535048
[7,] 198 102.6 20 24.49 1.63785327 1.74035576
[8,] 187 91.4 9 13.29 0.73703397 0.94443969
[9,] 191 86.7 13 8.59 1.06460463 0.61043920
[10,] 179 78.9 1 0.79 0.08189266 0.05614051
1.3.2 Report ready tables with xtable
Nice tables can be produced by the xtable function of the xtable-package. An example:
library(xtable)
first5obs <- Tab21[1:5,]
xtable(first5obs)
% latex table generated in R 3.2.1 by xtable 1.7-4 package
eNote 1 1.3 INTRODUCTION DAY 25
% Mon Feb 01 16:07:04 2016
\begin{table}[ht]
\centering
\begin{tabular}{rrrrrrr}
\hline
& X1 & X2 & X\_cent1 & X\_cent2 & X\_auto1 & X\_auto2 \\
\hline
1 & 168.00 & 65.50 & -10.00 & -12.61 & -0.82 & -0.90 \\
2 & 161.00 & 58.30 & -17.00 & -19.81 & -1.39 & -1.41 \\
3 & 167.00 & 68.10 & -11.00 & -10.01 & -0.90 & -0.71 \\
4 & 179.00 & 85.70 & 1.00 & 7.59 & 0.08 & 0.54 \\
5 & 184.00 & 80.50 & 6.00 & 2.39 & 0.49 & 0.17 \\
\hline
\end{tabular}
\end{table}
And then when this tex-code is included in your tex-file it will appear in the report asnice table:
X1 X2 X cent1 X cent2 X auto1 X auto21 168.00 65.50 -10.00 -12.61 -0.82 -0.902 161.00 58.30 -17.00 -19.81 -1.39 -1.413 167.00 68.10 -11.00 -10.01 -0.90 -0.714 179.00 85.70 1.00 7.59 0.08 0.545 184.00 80.50 6.00 2.39 0.49 0.17
Note how the input to xtable was a matrix here. The function is prepared to recognizea number of different R-objects, see e.g.:
methods(xtable)
[1] xtable.anova* xtable.aov*
[3] xtable.aovlist* xtable.coxph*
[5] xtable.data.frame* xtable.glm*
[7] xtable.lm* xtable.matrix*
[9] xtable.prcomp* xtable.summary.aov*
[11] xtable.summary.aovlist* xtable.summary.glm*
[13] xtable.summary.lm* xtable.summary.prcomp*
[15] xtable.table* xtable.ts*
eNote 1 1.3 INTRODUCTION DAY 26
[17] xtable.zoo*
see ’?methods’ for accessing help and source code
For instance, ANOVA-tables will be recognized. So a LaTex-user can then copy thesetex-lines into the report .tex-document. Or to integrate the R-code into the tex-code, usethe knitR-package to create the pure tex-file from a .Rnw file, which is a kind of tex-filewith all the R-code integrated into it, with a lot of flexibility in controlling what will beshowed/evaluated etc in the output. This can be used for both raw code/results, tablesand figures.
A word user may also use xtable through the html-print-option:
print(xtable(first5obs), type = "html")
<!-- html table generated in R 3.2.1 by xtable 1.7-4 package -->
<!-- Mon Feb 01 16:07:04 2016 -->
<table border=1>
<tr> <th> </th> <th> X1 </th> <th> X2 </th> <th> X_cent1 </th> <th> X_cent2 </th> <th> X_auto1 </th> <th> X_auto2 </th> </tr>
<tr> <td align="right"> 1 </td> <td align="right"> 168.00 </td> <td align="right"> 65.50 </td> <td align="right"> -10.00 </td> <td align="right"> -12.61 </td> <td align="right"> -0.82 </td> <td align="right"> -0.90 </td> </tr>
<tr> <td align="right"> 2 </td> <td align="right"> 161.00 </td> <td align="right"> 58.30 </td> <td align="right"> -17.00 </td> <td align="right"> -19.81 </td> <td align="right"> -1.39 </td> <td align="right"> -1.41 </td> </tr>
<tr> <td align="right"> 3 </td> <td align="right"> 167.00 </td> <td align="right"> 68.10 </td> <td align="right"> -11.00 </td> <td align="right"> -10.01 </td> <td align="right"> -0.90 </td> <td align="right"> -0.71 </td> </tr>
<tr> <td align="right"> 4 </td> <td align="right"> 179.00 </td> <td align="right"> 85.70 </td> <td align="right"> 1.00 </td> <td align="right"> 7.59 </td> <td align="right"> 0.08 </td> <td align="right"> 0.54 </td> </tr>
<tr> <td align="right"> 5 </td> <td align="right"> 184.00 </td> <td align="right"> 80.50 </td> <td align="right"> 6.00 </td> <td align="right"> 2.39 </td> <td align="right"> 0.49 </td> <td align="right"> 0.17 </td> </tr>
</table>
And then print the table directly into a file:
print(xtable(first5obs), type = "html", file = "myhtmltable.html")
Open the file in a browser and copy-paste to Word.
1.3.3 Height-weight example - continued
Centering and standardization can most easily be performed by the scale-function:
# Raw data in matrix X:
X <- cbind(X1, X2)
# Using scale function to only center:
X_cent <- scale(X, scale = F)
# Using scale function to center and standardize:
X_auto <- scale(X)
eNote 1 1.3 INTRODUCTION DAY 27
Means and standard deviations in each column of a matrix can easily be found by theapply-function (and using the round function to show fewer decimals):
# Means by columns:
round(apply(Tab21, 2, mean), 2)
X1 X2 X_cent1 X_cent2 X_auto1 X_auto2
178.00 78.11 0.00 0.00 0.00 0.00
# Standard deviations by columns:
round(apply(Tab21, 2, sd), 2)
X1 X2 X_cent1 X_cent2 X_auto1 X_auto2
12.21 14.07 12.21 14.07 1.00 1.00
par(mfrow=c(1, 2)) # To make two plots in one page in a 1x2 structure
plot(X1, X2, las = 1)
plot(X_auto1, X_auto2, las = 1)
abline(h = 0, v = 0) # Adding horizontal and vertical lines at zeros
arrows(0, 0, 1, 1) # Adding the arrow
eNote 1 1.3 INTRODUCTION DAY 28
●
●
●
●
●
●
●
●
●
●
160 170 180 190
60
70
80
90
100
X1
X2
●
●
●
●
●
●
●
●
●
●
−1.5 −0.5 0.5 1.0 1.5
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
X_auto1
X_a
uto2
1.3.4 Height-weight example - continued: with some details related toPCA and SVD - singular value decomposition (Appendix 2.6 and2.7)
How to find the correlation and the arrow direction from data
eNote 1 1.3 INTRODUCTION DAY 29
# Correlation using function:
cor(X)
X1 X2
X1 1.0000000 0.9656034
X2 0.9656034 1.0000000
# Correlation using matrix-multiplication:
t(X_auto) %*% X_auto/9
X1 X2
X1 1.0000000 0.9656034
X2 0.9656034 1.0000000
# How to find the arrow direction from data:
eigen(cor(X))
$values
[1] 1.96560343 0.03439657
$vectors
[,1] [,2]
[1,] 0.7071068 -0.7071068
[2,] 0.7071068 0.7071068
# Using standard notation:
W <- eigen(cor(X))$vectors
W # In PCA: loadings
[,1] [,2]
[1,] 0.7071068 -0.7071068
[2,] 0.7071068 0.7071068
eNote 1 1.3 INTRODUCTION DAY 30
# The projected values:
z1 <- X_auto %*% W[,1]
z2 <- X_auto %*% W[,2]
plot(z1, z2, ylim = c(-3, 3), xlim = c(-3, 3))
●● ●●
●● ●●
●●
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
z1
z2
par(mfrow = c(1, 2))
var(z1)
[,1]
[1,] 1.965603
var(z2)
[,1]
[1,] 0.03439657
hist(z1, xlim = c(-3, 3))
hist(z2, xlim = c(-3, 3))
eNote 1 1.3 INTRODUCTION DAY 31
Histogram of z1
z1
Fre
quen
cy
−3 −2 −1 0 1 2 3
01
23
4
Histogram of z2
z2
Fre
quen
cy
−3 −2 −1 0 1 2 3
01
23
4
These variances are also found as the so-called singular values or eigen values of thecorrelation matrix:
eigen(cor(X))$values
[1] 1.96560343 0.03439657
# And the D-matrix is the diagonal of the square-roots of these:
D <- diag(sqrt(eigen(cor(X))$values))
D # In Pca: The explained variances
[,1] [,2]
[1,] 1.402 0.0000000
[2,] 0.000 0.1854631
eNote 1 1.3 INTRODUCTION DAY 32
# And the z1 and z2 can be standardized by these sds:
z_auto1 <- z1/sd(z1)
z_auto2 <- z2/sd(z2)
cbind(z_auto1, z_auto2)
[,1] [,2]
[1,] -0.86499187 -0.29429720
[2,] -1.41217204 -0.05948223
[3,] -0.81310699 0.72238105
[4,] 0.31334011 1.74422312
[5,] 0.33347947 -1.22581868
[6,] -1.02286513 -0.23881902
[7,] 1.70381944 0.39080657
[8,] 0.84806106 0.79076636
[9,] 0.84481813 -1.73157589
[10,] 0.06961784 -0.09818407
eNote 1 1.3 INTRODUCTION DAY 33
# Or the same could be extracted from the matrices:
Z_auto <- X_auto %*% W %*% solve(D)
Z_auto # In PCA: The standardized scores
[,1] [,2]
[1,] -0.86499187 -0.29429720
[2,] -1.41217204 -0.05948223
[3,] -0.81310699 0.72238105
[4,] 0.31334011 1.74422312
[5,] 0.33347947 -1.22581868
[6,] -1.02286513 -0.23881902
[7,] 1.70381944 0.39080657
[8,] 0.84806106 0.79076636
[9,] 0.84481813 -1.73157589
[10,] 0.06961784 -0.09818407
var(Z_auto)
[,1] [,2]
[1,] 1.000000e+00 4.700949e-16
[2,] 4.700949e-16 1.000000e+00
# So we have done the SVD, check:
cbind(Z_auto %*% D %*% t(W), X_auto)
X1 X2
[1,] -0.81892664 -0.89611621 -0.81892664 -0.89611621
[2,] -1.39217528 -1.40777654 -1.39217528 -1.40777654
[3,] -0.90081930 -0.71134998 -0.90081930 -0.71134998
[4,] 0.08189266 0.53937526 0.08189266 0.53937526
[5,] 0.49135598 0.16984280 0.49135598 0.16984280
[6,] -0.98271196 -1.04535048 -0.98271196 -1.04535048
[7,] 1.63785327 1.74035576 1.63785327 1.74035576
[8,] 0.73703397 0.94443969 0.73703397 0.94443969
[9,] 1.06460463 0.61043920 1.06460463 0.61043920
[10,] 0.08189266 0.05614051 0.08189266 0.05614051
eNote 1 1.4 SOMETHING MORE ON CORRELATION AND COVARIANCE 34
1.4 Something more on correlation and covariance
In section 2.3 in the Varmuza book, some toy data are simulated from a so-called 2-dimensional normal distribution:
library(mvtnorm)
library(StatDA)
sigma <- matrix(c(1, 0.8, 0.8, 1), ncol=2) # sigma1 in Fig. 2.8
X <- rmvnorm(200, mean = c(0, 0), sigma = sigma)
par(mfrow = c(1, 3))
plot(X[,1], X[,2])
edaplot(X[,1])
edaplot(X[,2])
●●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3 4
−3
−2
−1
01
23
X[, 1]
X[,
2]
−4 −2 0 2 4
−20
020
4060
Histogram of X[, 1]
X[, 1]
Fre
quen
cy
●● ● ●●
−3 −2 −1 0 1 2 3
−10
010
2030
40
Histogram of X[, 2]
X[, 2]
Fre
quen
cy
●● ●●
The covariance:
eNote 1 1.4 SOMETHING MORE ON CORRELATION AND COVARIANCE 35
## On raw data:
cov(X)
[,1] [,2]
[1,] 1.2278453 0.9541705
[2,] 0.9541705 1.1094357
## On centered data with Matrix multiplication
X_cent <- scale(X, scale = F)
t(X_cent)%*%X_cent/199
[,1] [,2]
[1,] 1.2278453 0.9541705
[2,] 0.9541705 1.1094357
The correlation:
## On raw data:
cor(X)
[,1] [,2]
[1,] 1.0000000 0.8175289
[2,] 0.8175289 1.0000000
## On centered AND scaled data with Matrix multiplication
X_auto <- scale(X)
t(X_auto)%*%X_auto/199
[,1] [,2]
[1,] 1.0000000 0.8175289
[2,] 0.8175289 1.0000000
Also works for data matrices of higher dimension than 2!
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 36
1.5 Some additional matrix scatterplotting for the Varmuzatoy data in 2.6.3
A starting way of exploring several variables simultaneously is to do multiple scatter-plots in a single page. We show here a few ways of doing this using the little Educationalscores data from the exercis below (Lattin exercise 2.2)
# Importing data: (the Table 2.4 data with an x0 column added)
tab24data <- read.table("Tab24ArtificialData.txt",
header = TRUE, sep = ",", dec = ".")
tab24data
x0 x1 x2 y
1 0.9 0.8 3.5 1
2 0.2 3.0 4.0 1
3 -0.2 4.2 4.8 1
4 -0.7 6.0 6.0 1
5 0.3 6.7 7.1 1
6 0.8 1.5 1.0 2
7 -1.1 4.0 2.5 2
8 -0.9 5.5 3.0 2
9 -0.7 7.3 3.5 2
10 -0.4 8.5 4.5 2
X <- tab24data[,1:3]
# Scatterplot Matrices using pairs:
pairs(X)
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 37
x0
2 4 6 8
●
●
●
●
●
●
●
●
●
●
−1.
0−
0.5
0.0
0.5
●
●
●
●
●
●
●
●
●
●
24
68
●
●
●
●
●
●
●
●
●
●
x1
●
●
●
●
●
●
●
●
●
●
−1.0 −0.5 0.0 0.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2 3 4 5 6 7
12
34
56
7
x2
# Scatterplot Matrices using pairs WITH color coding of groups
Gender <- factor(tab24data[,4]) # Defining column 4 as a grouping factor
pairs(X, main = "Scatterplots - Gender grouping",
col = c("red", "blue")[Gender])
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 38
x0
2 4 6 8
●
●
●
●
●
●
●
●
●
●
−1.
0−
0.5
0.0
0.5
●
●
●
●
●
●
●
●
●
●
24
68
●
●
●
●
●
●
●
●
●
●
x1
●
●
●
●
●
●
●
●
●
●
−1.0 −0.5 0.0 0.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2 3 4 5 6 7
12
34
56
7
x2
Scatterplots − Gender grouping
library(GGally)
ggpairs(X)
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 39
x0x1
x2
x0 x1 x2
0.00
0.25
0.50
0.75
Corr:
−0.618
Corr:
−0.0724
2
4
6
8
●
●
●
●
●
●
●
●
●
●
Corr:
0.53
2
4
6
−1.0 −0.5 0.0 0.5 1.0
●
●
●
●
●
●
●
●
●
●
2 4 6 8
●
●
●
●
●
●
●
●
●
●
2 4 6
# 3D Scatterplot
library(scatterplot3d)
scatterplot3d(X, main="3D Scatterplot")
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 40
3D Scatterplot
−1.5 −1.0 −0.5 0.0 0.5 1.0
12
34
56
78
0
2
4
6
8
10
x0
x1
x2
●
●
●
●
●
●
●
●
●
●
# 3D Scatterplot with Coloring and Vertical Drop Lines
scatterplot3d(X, pch = 16, highlight.3d = TRUE,
type = "h", main = "3D Scatterplot")
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 41
3D Scatterplot
−1.5 −1.0 −0.5 0.0 0.5 1.0
12
34
56
78
0
2
4
6
8
10
x0
x1
x2
●
●
●
●
●
●
●
●
●
●
# 3D Scatterplot with Coloring and Vertical Lines
# and Regression Plane
s3d <- scatterplot3d(X, pch = 16, highlight.3d = TRUE,
type = "h", main = "3D Scatterplot")
fit <- lm(X[,3] ~ X[,1] + X[,2])
s3d$plane3d(fit)
eNote 1 1.5 SOME ADDITIONAL MATRIX SCATTERPLOTTING FOR THE VARMUZATOY DATA IN 2.6.3 42
3D Scatterplot
−1.5 −1.0 −0.5 0.0 0.5 1.0
12
34
56
78
0
2
4
6
8
10
x0
x1
x2
●
●
●
●
●
●
●
●
●
●
An interactive spinning 3D-plot can be done with the rgl-package:
# Spinning 3d Scatterplot
library(rgl)
plot3d(X, col="red", size=3)
The code shown here will start up a separate plotting window, in which you can spin
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 43
the plot using the mouse.
1.6 Matrix scatterplotting the mtcars data
data(mtcars)
?mtcars
#import:
head(mtcars) # List the top of the data set
summary(mtcars) # Summarize each variable in the data set
dim(mtcars) # Show number of rows and columns in the data set
pairs(mtcars)
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 44
mpg
4 6 8
●●● ●●●●
●●●● ●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●●● ●●●●
●●●● ●●●
●●●
●●●
●
●●●
●
●●●
●●
●
●
50 250
●●●●●●
●
●●●● ●●●
●●●
●●●
●
●● ●
●
●●●
●●
●
● ●●●●●●●
●●●●●●●
●●●
● ●●
●
● ● ●
●
●●●
●●●
●
2 4
●●● ●●●●
●●●●●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●● ●●● ●
●
● ●●●●●●●●
●
●●●
●
●●●
●
●●●
●●
●
●
0.0 0.8
●● ●●● ●●
●●●●●●●
●●●
●●●
●
●●●
●
●●●
●●●
● ●●●●●●●
●●●●●●●●●●
●●●
●
●●●
●
●●●
●●●
●
3.0 4.5
●●●●●●●
●●●●●●●
●●●
●●●
●
●●●
●
● ●●
●●●
●
1025●●●●
●●●
●●●●●●●●●●
●●●
●
●● ●
●
●●●
●●
●
●
46
8
●●
●
●
●
●
●
●●
●●
●●●●● ●
●●●●
●●● ●
●● ●
●
●
●
●
cyl ●●
●
●
●
●
●
●●
●●
●●● ●●●
●●●●
●●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●● ●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
● ●●●
● ● ●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●● ●●●
●●●●
●●●●
●●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●●●●●●
●●●●
●●● ●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●●●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●●●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●● ●●
●●●
●
●
●
●
●●●
●
●
●
●
●●●●
●●●
●● ●
●●●●
●●●
●
●● ●
●
●
●
●●●
●
●
●
●
●
●● ●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●
disp●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●
●
●●●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●●●
● ●●●
● ●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●●●
●
●
●
●
●
● ●●●
●●●
●●●
●●●●
●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●● ●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●
100
400
●●●
●
●
●
●
●● ●●
●●●
●●●
●●●●
●●●
●
●●●
●
●
●
●
5025
0
●●●●
●
●
●
●●
●●●●●
●●●
●●●●
●●
●
●
●● ●
●
●
●
● ●●● ●
●
●
●
●●
●●●●●●●●
●●●●
●●
●
●
●●●
●
●
●
● ●●● ●
●
●
●
●●●●
●●●●●
●
●●●●
●●
●
●
●●●
●
●
●
●
hp●●●●
●
●
●
●●●●
●●●●●
●
● ●●●
● ●
●
●
●●●
●
●
●
● ●●● ●
●
●
●
●●●●
●●●●●●
●●●●
●●
●
●
●●●
●
●
●
● ●● ●●
●
●
●
●●
●●●●●●●
●
●●●●
●●
●
●
●●●
●
●
●
● ●● ●●
●
●
●
●●●●
●●●●●●
●●●●
●●
●
●
●● ●
●
●
●
● ●●●●
●
●
●
●●●●●●●●●●
●●●●●●
●
●
●●●
●
●
●
● ●●●●
●
●
●
●●●●
●●●●●●
●●●●●●
●
●
●●●
●
●
●
● ●●●●
●
●
●
●●
●●●●●
●●●
●●●●
●●
●
●
●●●
●
●
●
●
●●●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
●●
●●●●
● ●●
●
●● ●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
● ●
●●●●
● ●●
●
●●●●
●●● ●●●
●
●
●
●
●●
●
●
●●
●●
● ●
●●●●
● ●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
● ●
● drat ●●●
●●●●
●●●●
●●● ●●●
●
●
●
●
●●
●
●
●●
●●
● ●
●●● ●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
●●
●●● ●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
●●
●●●●
●●●●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
●●
●●●●
●●●●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
●●
●
3.0
4.5
●●●
●●●
●
●● ●●
●●●●●●
●
●
●
●
●●
●
●
●●
●●
● ●
●
24
●●●
●●●●●●●●
●●●
●● ●
●●●
●
●●● ●
●●●
●●
●● ●●●
● ●● ●●● ●●
●●●
●●●
●●●●
●●●●
●●●
●●
●● ●●●
● ●● ●●●●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●● ●●●
● ●● ●●●●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●● ●●●
●●● ●●●●●
●●●
●●●
●●●
●
● ●●●
●●●
●●●
●wt
●● ●
●● ●●● ●●●
●●●
●●●
●●●
●
●●● ●
●●●
●●
●● ●● ●
●● ●●●●●●
●●●
●●●
●●●●
●●●●
●●●
●●●
● ●●●
●●●●●●●●●●●
●●●
●●●
●
●●●●
●●●
●●●● ●●●
●●●●
●●●●●●●
●●●
●●●
●
●●●●
● ●●
●●●
● ●●●
●●● ●●● ●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●●
●●●
●
●
●
●
●
●
●●●●●●● ●
●●
●●
●●
●●
●
● ●
●●
●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●●● ●●●
●●●●
●●
●●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●
●●
● ●
●●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●●●●● ●●●
●●●●
●●
●●
●
●●
●●
●
● qsec●●
●●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
● ●
●●●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●●
●
1622
●●●●
●
●
●
●
●
●●●●●●●●
●●
●●
●●
●●
●
●●
●●
●
●
0.0
0.8
●●
●●
●
●
●
●●●●
●●●●● ●
●●●●
●●● ●
●
●
●
●●●
●
●●
● ●
●
●
●
●● ●●
●●●●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●● ●●
●
●
●
●● ●
●
●●
●●
●
●
●
●●●●
●●●●●●
● ●●●
● ● ●●
●
●
●
●●●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
●●
●
●
●
● ●●●
●●●●●●
●●●●
●●● ●
●
●
●
●●●
●
vs●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●●●●
●
●
●
●●●
●
●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●●●●
●
●
●
●●●
●
●●
●●
●
●
●
●● ●●
●●●●●●
●●●●
●● ●●
●
●
●
● ● ●
●
●●●
●●●● ●●●●●●●●● ●
●●●
●●●● ●
●● ●●●● ● ●●●
● ●● ●●● ●● ●●●●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●●●●● ●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●●●●● ●●●●●●
●●●
● ●● ●●
●●● ●● ●● ●●●
●●● ● ●●●●●●●●●●
● ●●
●● ● ●●
●●● ●●● ● ●●●
●●●●●●●●●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●● ●
●● ●● ● ●●●●●●●●●
●●●
●●●● ●
●●●●●● ● ●● ●
●● ●● ●●●●●●●●●●
●●●
●●●●●
●● ●●●● ●
am●●●
●●●● ●●●●●●●●●●
●●●
●●●●●
● ●●●●●●
0.0
0.8●●●
●●● ●●● ●●●●●●●●
●●●
●●● ●●
●●● ● ● ●●
3.0
4.5
●●●
●●●●
●●●●
●●●●● ●
●●●
●●●● ●
●
● ●●●●
● ●●●
● ●● ●
●● ●●
●●●●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
●●●●
●●●●●●
●●●
● ●● ●●
●
●● ●● ●
● ●●●
●●● ●
●●●●
●●●●●●
● ●●
●● ● ●●
●
●● ●●●
● ●●●
●●●●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
● ●● ●
●● ●●
● ●●●
●●●●●●
●●●
●●●● ●
●
●●●●●
● ●● ●
●● ●●
●●●●
●●●●●●
●●●
●●●●●
●
● ●●●●
● ●●●
●●●●
●●●●
●●●●●●
●●●
●●●●●
●
●●●●●
● gear ●●●
●●● ●
●● ●●
●●●●●●
●●●
●●● ●●
●
●● ● ● ●
●
10 25
●●
●●●●
●
●●
●●●●●
●● ●
●●
●●●●
●
●●
● ●
●
●
●
●
●●
● ●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
100 400
●●
● ●●
●
●
●●
●●●●●
●●●
●●●●
●●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
3.0 4.5
●●
●●●
●
●
●●
●●●●●
●●●
●●
●●● ●
●
●●
●●
●
●
●
●
●●
● ●●●
●
●●
●●●●●
●●●
●●●●
●●
●
●●●●
●
●
●
●
16 22
●●
●●●
●
●
● ●
●●●●●●●●
●●
●●●●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●
● ●
●
●
●
●
0.0 0.8
●●
●●●●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
●●
●●●●
●
●●
●●●●●●●●
●●●●
●●
●
●●
●●
●
●
●
●
1 4 7
14
7
carb
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 45
# Scatterplot Matrices from the glus Package
library(cluster)
library(gclus)
dta <- mtcars # get data - just put your own data here!
dta.r <- abs(cor(dta)) # get correlations
dta.col <- dmat.color(dta.r) # get colors
# reorder variables so those with highest correlation
# are closest to the diagonal
dta.o <- order.single(dta.r)
cpairs(dta, dta.o, panel.colors = dta.col, gap =.5,
main = "Variables Ordered and Colored by Correlation" )
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 46
qsec
1 4 7
●●●●
●
●
●
●
●
●●●●●●●●
●●
●●
●●
●●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
● ●
●●●
●
50 250
●●●●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●
●
●
100 400
●●●
●
●
●
●
●
●
●●●●● ●●●
●●●●
●●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●●● ●●●
●●● ●
●●
●●
●
●●
●●
●
●
10 25
●●●
●
●
●
●
●
●
●●●●●●● ●
●●
●●
●●
●●
●
● ●
●●
●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●
●●
● ●
●●
●
●●
●●●
●
0.0 0.6
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●●
●
1622
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●●
●
●●
●●●
●
14
7
●●
●●●
●
●
● ●
●●●●●●●●
●●
●●●●
●
●●
●●
●
●
●
●
carb ●●
●●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●
● ●
●
●
●
●
●●
●●●
●
●
●●
●●●●●
●●●
●●●●
●●
●
●●
●●
●
●
●
●
●●
● ●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
●●
● ●●
●
●
●●
●●●●●
●●●
●●●●
●●
●
●●
●●
●
●
●
●
●●
● ●●●
●
●●
●●●●●
●●●
●●● ●
●●
●
●●●●
●
●
●
●
●●
●●●●
●
●●
●●●●●
●● ●
●●
●●●●
●
●●
● ●
●
●
●
●
●●
●●●
●
●
●●
●●●●●
●●●
●●
●●● ●
●
●●
●●
●
●
●
●
●●
●●●●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
●●
●●●●
●
●●
●●●●●●●●
●●●●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
● ●●●
●●●●●●
●● ●●
●●● ●
●
●
●
●●●
●
●●
●●
●
●
●
●● ●●
●●●●●●
●●●●
●● ●●
●
●
●
● ● ●
●
vs●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●● ●●
●
●
●
●● ●
●
●●
● ●
●
●
●
●● ●●
●●●●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●● ●
●●●●
●
●
●
●● ●
●
●●
●●
●
●
●
●●●●
●●●●● ●
●● ●●
●●● ●
●
●
●
● ●●
●
●●
●●
●
●
●
●●●●
●●●●●●
● ●●●
● ● ●●
●
●
●
●●●
●
●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●●●●
●
●
●
●●●
●
0.0
0.6
●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●●●●
●
●
●
●●●
●
5025
0
●● ●●
●
●
●
●●
●●
●●●●●
●
●● ●●
●●
●
●
●●●
●
●
●
● ●●●●
●
●
●
●●
●●
●●●●●●
●●●●
●●
●
●
●●●
●
●
●
● ●● ●●
●
●
●
●●●●
●●●●●●
●●●●
●●
●
●
●● ●
●
●
●
●
hp●●● ●
●
●
●
●●
●●
●●●●●●
●●●●
●●
●
●
●●●
●
●
●
● ●●● ●
●
●
●
●●●●
●●●●●
●
●●●●
●●
●
●
●●●
●
●
●
● ●●● ●
●
●
●
●●●●
●●●●●●
●●●●
●●
●
●
●●●
●
●
●
● ●●●●
●
●
●
●●
●●
●●●●●
●
●● ●●
●●
●
●
●● ●
●
●
●
● ●●●●
●
●
●
●●●●
●●●●●
●
● ●●●
● ●
●
●
●●●
●
●
●
● ●●●●
●
●
●
●●●●
●●●●●●
●●●●●●
●
●
●●●
●
●
●
● ●●●●
●
●
●
●●●●
●●●●●●
●●●●●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●●●●●●
●● ●●
●●● ●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●● ●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●●●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●● ●●
●●●
●
●
●
●
cyl ●●
●
●
●
●
●
●●
●●
●●● ●●●
●●●●
●●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●● ●●●
●●● ●
●●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●● ●
●● ●●
●●● ●
●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
● ●●●
● ● ●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●●●●
●●●
●
●
●
● 46
8
●●
●
●
●
●
●
●●
●●
●●●●●●
●●●●
●●●●
● ●●
●
●
●
●
100
400
●●●
●
●
●
●
● ●●●
●●●
●●●
●● ●●
●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●● ●●
●●●
●●●
●●●●
●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●● ●
●
●
●
●●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●● ●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●
disp●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●● ●
●● ●●
●●●
●
●● ●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●●●
● ●●●
● ●●
●
●●●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●●●●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●
●●●
●● ●●● ●●●
●●●
●●●
●● ●
●
●●● ●
●●●
●●
●
● ●●●
●●● ●●● ●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●
● ●● ●
●● ●●●●●●
●●●
●●●
●●●●
●●●●
●●●
●●
●
● ●●●
● ●● ●●●●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●
● ●●●
● ●● ●●● ●●
●●●
●●●
●●●●
●●●●
●●●
●●
●
● ●●●
● ●● ●●●●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●
● wt●●●
●●●●●●●●
●●●
●● ●
●● ●
●
●●● ●
●●●
●●
●
● ●●●
●●● ●●●●●
●●●
●●●
●●●
●
● ●●●
● ●●
●●
●
● ●●●
●●●●●●●●●●●
●●●
●●●
●
●●●●
●●●
●●
●
●
24
●●●
●●●●
●●●●●●●
●●●
●●●
●
●●●●
● ●●
●●
●
●
1025 ●● ●●
● ●●
● ●●●●●●
●●●
●●
●
●
●●●
●
●●●
●●
●
● ●●●●●●
●
●●●●●●●
●●●
●●
●
●
●●●
●
●●●
●●
●
● ●● ●●● ●●
●●●●●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●●●●●●
●
●●●● ●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●●● ●●●●
●●●● ●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●●● ●●●●
●●●● ●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●●● ●●●●
●●●● ●●
●
●●●
●●●
●
●●●
●
●●●
●●
●
● mpg ●●●●●●●
●●●●●●●
●●●
●●
●
●
● ●●
●
● ●●
●●
●
● ●●●●●●●
●●●●●●●
●●●
●●●
●
●●●
●
●●●
●●
●
● ●●●●●●●
●●●●●●●
●●●
●●●
●
●●●
●
● ●●
●●
●
●
●● ●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●●
●●●
●
●● ●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●● ●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●●
● ●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●●●
● ●●
●
●● ●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●●●
● ●●
●
●●●●
●●● ●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●●●
●●●
●
●●●●
●●● ●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●●●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
● drat ●●●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
3.0
4.5
●●●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
0.0
0.6
●● ●
●● ●● ● ●●●●●●●●●
●● ●
●●●● ●
●●●●●● ● ●●●
●●● ●●● ●●●●●●●●
●●●
●●● ●●
●●● ● ● ●● ●● ●
●● ●● ●●●●●●●●●●
●●●
●●●●●
●● ●●●● ● ●●●
● ●● ●●●●● ●●●●●●
●●●
● ●● ●●
●●● ●● ●● ●●●
● ●● ●●● ●● ●●●●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●●●●● ●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●●●
●●●●●●●● ●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●●●
●●●● ●●●●●●●●● ●
●● ●
●●●● ●
●● ●● ●● ● ●●●
●●● ● ●●●●●●●●●●
● ●●
●● ● ●●
● ●● ●●● ●
am●●●
●●●● ●●●●●●●●●●
●●●
●●●●●
● ●●●●●●
16 22
●● ●
●● ●●
● ●●●
●●●●●●
●● ●
●●●● ●
●
●●●●●
● ●●●
●●● ●
●● ●●
●●●●●●
●●●
●●● ●●
●
●● ● ● ●
●
0.0 0.6
●● ●
●● ●●
●●●●
●●●●●●
●●●
●●●●●
●
● ●●●●
● ●●●
● ●● ●
●●●●
●●●●●●
●●●
● ●● ●●
●
●● ●● ●
●
4 6 8
●●●
● ●● ●
●● ●●
●●●●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
●
2 4
●●●
●●●●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
●●●●
●●●●
●●●●● ●
●● ●
●●●● ●
●
● ●● ●●
●
3.0 4.5
●●●
●●● ●
●●●●
●●●●●●
● ●●
●● ● ●●
●
●● ●●●
● ●●●
●●●●
●●●●
●●●●●●
●●●
●●●●●
●
●●●●●
●
3.0 4.5
3.0
4.5
gear
Variables Ordered and Colored by Correlation
# 8 Normal probility plots on the same plot page:
library(car)
par(mfrow=c(2,5))
for (i in 1:10) qqPlot(mtcars[,i])
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 47
−2 0 1 2
1015
2025
30
norm quantiles
mtc
ars[
, i]
● ●
●
●●●●●
●●
●
●
●●
●
●●
●
●●●●●
●●
●
●
●
●●
●
●
−2 0 1 2
45
67
8
norm quantiles
mtc
ars[
, i]
● ● ●●●●●●●●●
●●●●●●●
●●●●●●●●●●●● ● ●
−2 0 1 2
100
200
300
400
norm quantiles
mtc
ars[
, i]
●● ●●
●
●
●●●
●●●
●●●●
●
●
●●●
●●
●
●●
●●
●
●
●
●
−2 0 1 2
5010
015
020
025
030
0
norm quantiles
mtc
ars[
, i]
●
●●●●
●●●●
●●●●●
●
●●
●●
●●●●●●
●
●
●
●●
●
●
−2 0 1 2
3.0
3.5
4.0
4.5
5.0
norm quantiles
mtc
ars[
, i]
● ●
●
●
●●●●●
●●
●●
●
●
●●●●
●
●●●●●
●●●
●●
●
●
−2 0 1 2
23
45
norm quantiles
mtc
ars[
, i]
●
●
●
●
●●
●
●
●
●●
●
●●●●
●●●●●●●●
●●●●
●
●
●
●
−2 0 1 2
1618
2022
norm quantiles
mtc
ars[
, i]
●●
●●
●
●
●
●●●●●
●●●
●
●●●
●
●●●
●●
●●
●●●
●
●
−2 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
norm quantiles
mtc
ars[
, i]
● ● ●●●●●●●●●●●●●●●●
●●●●●●●●●●●● ● ●
−2 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
norm quantiles
mtc
ars[
, i]
● ● ●●●●●●●●●●●●●●●●●
●●●●●●●●●●● ● ●
−2 0 1 2
3.0
3.5
4.0
4.5
5.0
norm quantiles
mtc
ars[
, i]
● ● ●●●●●●●●●●●●●
●●●●●●●●●●●●
●●● ● ●
# 9 histograms with color:
par(mfrow=c(2,4))
for (i in 1:8) hist(mtcars[,i],col=i)
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 48
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
10 15 20 25 30 35
02
46
810
12
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
4 5 6 7 8
02
46
810
1214
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
100 200 300 400 500
01
23
45
67
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
50 100 150 200 250 300 350
02
46
810
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
2.5 3.0 3.5 4.0 4.5 5.0
02
46
810
12
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
2 3 4 5
02
46
8
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy
14 16 18 20 22
02
46
810
Histogram of mtcars[, i]
mtcars[, i]
Fre
quen
cy0.0 0.2 0.4 0.6 0.8 1.0
05
1015
# For all of them:
par(mfrow=c(2,5))
for (i in 2:11) {plot(mtcars$mpg ~ mtcars[,i], type="n",xlab=names(mtcars)[i])
text(mtcars[,i],mtcars$mpg,labels=row.names(mtcars))
abline(lm(mtcars$mpg~mtcars[,i]), col="red")
}
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 49
4 5 6 7 8
1015
2025
30
cyl
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
100 300
1015
2025
30
disp
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
50 150 250
1015
2025
30
hp
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
3.0 4.0 5.0
1015
2025
30
drat
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
2 3 4 5
1015
2025
30
wt
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
16 20
1015
2025
30
qsec
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
0.0 0.4 0.8
1015
2025
30
vs
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
0.0 0.4 0.8
1015
2025
30
am
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
3.0 4.0 5.0
1015
2025
30
gear
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
1 3 5 7
1015
2025
30
carb
mtc
ars$
mpg
Mazda RX4Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240D
Merc 230
Merc 280
Merc 280C
Merc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128
Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC Javelin
Camaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 50
# Saving directly to a pgn-file:
png("mpg_relations.png",width=800,height=600)
par(mfrow=c(2,5))
for (i in 2:11) {plot(mtcars$mpg ~ mtcars[,i], type="n",xlab=names(mtcars)[i])
text(mtcars[,i],mtcars$mpg,labels=row.names(mtcars))
abline(lm(mtcars$mpg~mtcars[,i]), col="red")
}dev.off()
detach(mtcars)
# Correlation (with 2 decimals)
round(cor(mtcars),2)
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
# Making a table for viewing in a browser - to be included in e.g. Word/Powerpoint
# Go find/use the cortable.html file afterwards
library(xtable)
capture.output(print(xtable(cor(mtcars)),type="html"),file="cortable.html")
ggpairs(mtcars)
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 51m
pgcy
ldi
sphp
drat
wt
qsec
vsam
gear
carb
mpg cyl disp hp drat wt qsec vs am gear carb
15202530 Corr:
−0.852
Corr:
−0.848
Corr:
−0.776
Corr:
0.681
Corr:
−0.868
Corr:
0.419
Corr:
0.664
Corr:
0.6
Corr:
0.48
Corr:
−0.551
45678
●●
●
●
●
●
●
●●
●●
●●●●● ●
●●●●
●●● ●
●● ●
●
●
●
●
Corr:
0.902
Corr:
0.832
Corr:
−0.7
Corr:
0.782
Corr:
−0.591
Corr:
−0.811
Corr:
−0.523
Corr:
−0.493
Corr:
0.527
100200300400
●●●
●
●
●
●
●●●●
●●●
●● ●
●●●●
●●●
●
●● ●
●
●
●
●●●
●
●
●
●
●
●● ●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●
Corr:
0.791
Corr:
−0.71
Corr:
0.888
Corr:
−0.434
Corr:
−0.71
Corr:
−0.591
Corr:
−0.556
Corr:
0.395
100200300
●●●●
●
●
●
●●
●●
●●●●●
●
●●●●
●●
●
●
●●
●
●
●
●
● ●●● ●
●
●
●
●●
●●
●●●●●●
●●●●
●●
●
●
●●●
●
●
●
● ●●● ●
●
●
●
●●●●
●●●●●
●
●●●●
●●
●
●
●●
●
●
●
●
●
Corr:
−0.449
Corr:
0.659
Corr:
−0.708
Corr:
−0.723
Corr:
−0.243
Corr:
−0.126
Corr:
0.75
3.03.54.04.55.0
●●●
●●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●●
● ●●
●
●● ●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●●●
● ●●
●
●●●●
●●● ●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●●●
● ●●
●
●●●●
●●●●●●
●
●
●
●
●●
●
●
●●
●
●
● ●
● Corr:
−0.712
Corr:
0.0912
Corr:
0.44
Corr:
0.713
Corr:
0.7
Corr:
−0.0908
2345
●●●
●●●●●●●●
●●●
●● ●
●●●
●
●●● ●
●●●
●●
●
● ●●●
● ●● ●●● ●●
●●●
●●●
●●●●
●●●●
●●●
●●
●
● ●●●
● ●● ●●●●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●
● ●●●
● ●● ●●●●●
●●●
●●●
●●●
●
●●●●
●●●
●●
●
● ●●●
●●● ●●●●●
●●●
●●●
●●●
●
● ●●●
● ●●
●●
●
●
Corr:
−0.175
Corr:
−0.555
Corr:
−0.692
Corr:
−0.583
Corr:
0.428
16182022
●●●
●
●
●
●
●
●
●●●●●●● ●
●●
●●
●●
●
●
●
● ●
●●
●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●●● ●●●
●●●●
●●
●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●●●●●●●
●●●●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●●●●●●●●
●●
●●
● ●
●
●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●●●●● ●●●
●●● ●
●●
●
●
●
●●
●●
●
●
Corr:
0.745
Corr:
−0.23
Corr:
−0.213
Corr:
−0.656
0.000.250.500.751.00
●●
●●
●
●
●
●●●●
●●●●● ●
●●●●
●●● ●
●
●
●
● ●●
●
●●
● ●
●
●
●
●● ●●
●●●●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
●●
●
●
●
●●●●
●●●●●●
●●●●
●● ●●
●
●
●
●● ●
●
●●
●●
●
●
●
●●●●
●●●●●●
● ●●●
● ● ●●
●
●
●
●●●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●● ●
●●●●
●
●
●
●● ●
●
●●
●●
●
●
●
● ●●●
●●●●●●
●● ●●
●●● ●
●
●
●
●●●
●
Corr:
0.168
Corr:
0.206
Corr:
−0.57
0.000.250.500.751.00 ●●●
●●●● ●●●●●●●●● ●
●●●
●●●● ●
●● ●● ●● ● ●●●
● ●● ●●● ●● ●●●●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●●●●● ●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●●●●● ●●●●●●
●●●
● ●● ●●
●●● ●● ●● ●●●
●●● ● ●●●●●●●●●●
● ●●
●● ● ●●
● ●● ●●● ● ●●●
●●●●●●●● ●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●● ●
●● ●● ● ●●●●●●●●●
●● ●
●●●● ●
●●●●●● ● ●● ●
●● ●● ●●●●●●●●●●
●●●
●●●●●
●● ●●●● ●
Corr:
0.794
Corr:
0.0575
3.03.54.04.55.0
●●●
●●●●
●●●●
●●●●● ●
●●●
●●●● ●
●
● ●● ●●
● ●●●
● ●● ●
●● ●●
●●●●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
●●●●
●●●●●●
●●●
● ●● ●●
●
●● ●● ●
● ●●●
●●● ●
●●●●
●●●●●●
● ●●
●● ● ●●
●
●● ●●●
● ●●●
●●●●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
● ●● ●
●● ●●
● ●●●
●●●●●●
●● ●
●●●● ●
●
●●●●●
● ●● ●
●● ●●
●●●●
●●●●●●
●●●
●●●●●
●
● ●●●●
● ●●●
●●●●
●●●●
●●●●●●
●●●
●●●●●
●
●●●●●
●
Corr:
0.274
2468
101520253035
●●
●●●●
●
●●
●●●●●
●● ●
●●
●●●●
●
●●
● ●
●
●
●
●
4 5 6 7 8
●●
● ●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
100200300400
●●
● ●●
●
●
●●
●●●●●
●●●
●●●●
●●
●
●●
●●
●
●
●
●
100200300
●●
●●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
3.03.54.04.55.0
●●
●●●
●
●
●●
●●●●●
●●●
●●
●●● ●
●
●●
●●
●
●
●
●
2 3 4 5
●●
● ●●●
●
●●
●●●●●
●●●
●●● ●
●●
●
●●●●
●
●
●
●
16182022
●●
●●●
●
●
● ●
●●●●●●●●
●●
●●●●
●
●●
●●
●
●
●
●
0.000.250.500.751.00
●●
●●●
●
●
●●
●●●●●●●●
●●●●
●●
●
●●
● ●
●
●
●
●
0.000.250.500.751.00
●●
●●●●
●
●●
●●●●●●●●
●●●●
●●
●
●●●●
●
●
●
●
3.03.54.04.55.0
●●
●●●●
●
●●
●●●●●●●●
●●●●
●●
●
●●
●●
●
●
●
●
2 4 6 8
The ggplot2-package offers some really nice plotting options, although when it comes toexactly pairwise matrix scatterplots the suggestions above should be used.
The ggplot2-package is e.g. particularly usefull for nice plotting for various combina-tions of factors in the data. Two of the variables in the Leslie salt data set are actuallybinary information, which we can let R know by the factor function:
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 52
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am)
This could then be used to create the following plot of mpg versus wt conditioned onthe factors:
library(ggplot2)
# First for each cyl:
p <- ggplot(mtcars, aes(wt, mpg, colour = cyl,
label = row.names(mtcars)))
p <- p + geom_text()
p <- p + geom_smooth(method = lm, fullrange=T)
print(p)
Mazda RX4Mazda RX4 WagDatsun 710Hornet 4 Drive
Hornet SportaboutValiant
Duster 360
Merc 240DMerc 230
Merc 280Merc 280CMerc 450SEMerc 450SLMerc 450SLC
Cadillac FleetwoodLincoln Continental
Chrysler Imperial
Fiat 128Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC JavelinCamaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2
Lotus Europa
Ford Pantera L
Ferrari Dino
Maserati Bora
Volvo 142E
0
10
20
30
2 3 4 5wt
mpg
cyl
aaa
4
6
8
The shaded area is, by default, the (pointwise) 95% confidence interval for the line esti-mate in each subgroup.
Note the special way of making plots this way - you have to get used to it - copy and
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 53
paste from here and search for more help, e.g.
http://www.cookbook-r.com/Graphs/Scatterplots_(ggplot2)/
library(ggplot2)
# Then for each transmission type:
p <- ggplot(mtcars, aes(wt, mpg, colour = am,
label = row.names(mtcars)))
p <- p + geom_text()
p <- p + geom_smooth(method = lm, fullrange=T)
print(p)
Mazda RX4Mazda RX4 WagDatsun 710Hornet 4 Drive
Hornet SportaboutValiantDuster 360
Merc 240DMerc 230Merc 280Merc 280CMerc 450SEMerc 450SL
Merc 450SLC
Cadillac FleetwoodLincoln ContinentalChrysler Imperial
Fiat 128Honda Civic
Toyota Corolla
Toyota Corona
Dodge ChallengerAMC JavelinCamaro Z28
Pontiac Firebird
Fiat X1−9Porsche 914−2Lotus Europa
Ford Pantera LFerrari Dino
Maserati Bora
Volvo 142E
−10
0
10
20
30
2 3 4 5wt
mpg
am
aa
0
1
# Or all combinations on separate plots: (not so nice here)
library(ggplot2)
p <- ggplot(mtcars, aes(wt, mpg, label = row.names(mtcars)))
p <- p + geom_text()
p <- p + geom_smooth(method = lm, fullrange=F)
p <- p + facet_wrap(~ am + cyl)
print(p)
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 54
Mazda RX4Mazda RX4 WagDatsun 710 Hornet 4 DriveHornet SportaboutValiantDuster 360 Merc 240D
Merc 230Merc 280Merc 280CMerc 450SEMerc 450SLMerc 450SLC
Cadillac FleetwoodLincoln ContinentalChrysler ImperialFiat 128Honda Civic
Toyota Corolla
Toyota CoronaDodge ChallengerAMC Javelin
Camaro Z28Pontiac FirebirdFiat X1−9
Porsche 914−2 Lotus EuropaFord Pantera LFerrari DinoMaserati BoraVolvo 142E
0, 4 0, 6 0, 8
1, 4 1, 6 1, 8
10
20
30
10
20
30
2 3 4 5 2 3 4 5 2 3 4 5wt
mpg
We can use the same feature to create nicer versions of the multiple scatterplot of mpgversus all the x-variables, as also made above. Using the melt function of the reshape2-package a version of the data set where (relevant) variables are ”stringed out on top ofeach other” as a single variable, and coding for this in a new variable:
library(reshape2)
mtcars2 <- melt(mtcars, measure.vars=c(3:8, 10:11))
summary(mtcars2)
mpg cyl am variable value
Min. :10.40 4: 88 0:152 disp :32 Min. : 0.000
1st Qu.:15.43 6: 56 1:104 hp :32 1st Qu.: 2.982
Median :19.20 8:112 drat :32 Median : 4.000
Mean :20.09 wt :32 Mean : 51.126
3rd Qu.:22.80 qsec :32 3rd Qu.: 30.175
Max. :33.90 vs :32 Max. :472.000
(Other):64
eNote 1 1.6 MATRIX SCATTERPLOTTING THE MTCARS DATA 55
And then using this new ”variable coding” factor to produce multiple plots for eachvariable:
library(ggplot2)
p <- ggplot(mtcars2, aes(value, mpg))
p <- p + geom_point(shape=1)
p <- p + geom_smooth(method = lm)
p <- p + facet_wrap(~ variable, scales="free")
print(p)
●●●
●
●●
●
●●
●●
●●
●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●●●
●●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●●
●●
●●●
● ●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
● ●●
●
● ●
●
●●
●●
●●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
● ●●●●
● ●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
disp hp drat
wt qsec vs
gear carb
10
15
20
25
30
35
10
20
30
10
15
20
25
30
35
10
20
30
10
15
20
25
30
35
10
15
20
25
30
35
10
15
20
25
30
35
10
20
30
100 200 300 400 100 200 300 3.0 3.5 4.0 4.5 5.0
2 3 4 5 16 18 20 22 0.00 0.25 0.50 0.75 1.00
3.0 3.5 4.0 4.5 5.0 2 4 6 8value
mpg
One can then easily use other fit types than the linear one:
eNote 1 1.7 EXERCISES 56
library(ggplot2)
p <- ggplot(mtcars2, aes(value, mpg))
p <- p + geom_point(shape=1)
p <- p + geom_smooth(method="loess")
p <- p + facet_wrap(~ variable, scales="free")
print(p)
●●●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●●
●●
●●
●
●●
●●
●●
●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●●●
●●
●
●●
●
●
● ●●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●●
●●
●●●
● ●
●
●●
●
●
●●●
●
●●
●
●
●
●
●● ●
●●
● ●
●
●●
●●
●●
●
●●
●
●●
●
●
● ●●
●
●●
●
●
●
●
●
●●●●
● ●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
disp hp drat
wt qsec vs
gear carb
10
15
20
25
30
35
10
20
30
10
20
30
40
10
20
30
10
15
20
25
30
35
10
15
20
25
30
35
10
15
20
25
30
35
10
20
30
100 200 300 400 100 200 300 3.0 3.5 4.0 4.5 5.0
2 3 4 5 16 18 20 22 0.00 0.25 0.50 0.75 1.00
3.0 3.5 4.0 4.5 5.0 2 4 6 8value
mpg
1.7 Exercises
Exercise 1 Table 2.4 artificial data from the Varmuza book
a) Have a look at the R introduction in Appendix 3 of the Varmuza-book:http://www.crcnetbase.com.globalproxy.cvt.dk/doi/pdfplus/10.1201/9781420059496.ax3
eNote 1 1.7 EXERCISES 57
(They do not mention Rstudio, but this is still recommended)
b) Consider the (extended) artificial data from Table 2.4::
1. If needed: Install R and Rstudio
2. Start Rstudio
3. Import the able 2.4 arificial data from the Varmuza book (Hint: Use the filetab24Artificialdata.txt available under File Sharing in Campusnet)
#import:
tab24data <- read.table("Tab24ArtificialData.txt",
header = TRUE, sep = ",", dec = ".")
tab24data
x0 x1 x2 y
1 0.9 0.8 3.5 1
2 0.2 3.0 4.0 1
3 -0.2 4.2 4.8 1
4 -0.7 6.0 6.0 1
5 0.3 6.7 7.1 1
6 0.8 1.5 1.0 2
7 -1.1 4.0 2.5 2
8 -0.9 5.5 3.0 2
9 -0.7 7.3 3.5 2
10 -0.4 8.5 4.5 2
X <- tab24data[,1:3]
eNote 1 1.7 EXERCISES 58
#Select the relevant columns:
X <- tab24data[,1:3]
X
x0 x1 x2
1 0.9 0.8 3.5
2 0.2 3.0 4.0
3 -0.2 4.2 4.8
4 -0.7 6.0 6.0
5 0.3 6.7 7.1
6 0.8 1.5 1.0
7 -1.1 4.0 2.5
8 -0.9 5.5 3.0
9 -0.7 7.3 3.5
10 -0.4 8.5 4.5
c) Find the ”centroid”vector of X, that is, the 3 means:
?apply()
apply(X, 2, mean)
x0 x1 x2
-0.18 4.75 3.99
d) Find the mean centered matrix Xc:
eNote 1 1.7 EXERCISES 59
?scale
X_cent <- scale(X, scale = F)
X_cent
x0 x1 x2
[1,] 1.08 -3.95 -0.49
[2,] 0.38 -1.75 0.01
[3,] -0.02 -0.55 0.81
[4,] -0.52 1.25 2.01
[5,] 0.48 1.95 3.11
[6,] 0.98 -3.25 -2.99
[7,] -0.92 -0.75 -1.49
[8,] -0.72 0.75 -0.99
[9,] -0.52 2.55 -0.49
[10,] -0.22 3.75 0.51
attr(,"scaled:center")
x0 x1 x2
-0.18 4.75 3.99
e) Find the standardized matrix Xa (”autoscaled”):
eNote 1 1.7 EXERCISES 60
apply(X, 2, sd)
x0 x1 x2
0.7036413 2.5074334 1.7400192
X_auto <- scale(X)
X_auto
x0 x1 x2
[1,] 1.53487290 -1.5753160 -0.281606095
[2,] 0.54004787 -0.6979248 0.005747063
[3,] -0.02842357 -0.2193478 0.465512116
[4,] -0.73901288 0.4985177 1.155159696
[5,] 0.68216573 0.7776877 1.787336644
[6,] 1.39275504 -1.2961461 -1.718371886
[7,] -1.30748433 -0.2991106 -0.856312411
[8,] -1.02324860 0.2991106 -0.568959253
[9,] -0.73901288 1.0169762 -0.281606095
[10,] -0.31265930 1.4955532 0.293100221
attr(,"scaled:center")
x0 x1 x2
-0.18 4.75 3.99
attr(,"scaled:scale")
x0 x1 x2
0.7036413 2.5074334 1.7400192
f) Find the sum of squares matrix XtdXd:
eNote 1 1.7 EXERCISES 61
?cov
t(X_cent) %*% X_cent
x0 x1 x2
x0 4.456 -9.820 -0.798
x1 -9.820 56.585 20.805
x2 -0.798 20.805 27.249
9*cov(X)
x0 x1 x2
x0 4.456 -9.820 -0.798
x1 -9.820 56.585 20.805
x2 -0.798 20.805 27.249
g) Find the covariance matrix XtcXc/(n− 1):
(t(X_cent)%*%X_cent)/9
x0 x1 x2
x0 0.49511111 -1.091111 -0.08866667
x1 -1.09111111 6.287222 2.31166667
x2 -0.08866667 2.311667 3.02766667
cov(X)
x0 x1 x2
x0 0.49511111 -1.091111 -0.08866667
x1 -1.09111111 6.287222 2.31166667
x2 -0.08866667 2.311667 3.02766667
h) Find the correlation matrix (XtaXa)/(n− 1):
eNote 1 1.7 EXERCISES 62
(t(X_auto)%*%X_auto)/9
x0 x1 x2
x0 1.00000000 -0.6184267 -0.07241942
x1 -0.61842671 1.0000000 0.52983638
x2 -0.07241942 0.5298364 1.00000000
cor(X)
x0 x1 x2
x0 1.00000000 -0.6184267 -0.07241942
x1 -0.61842671 1.0000000 0.52983638
x2 -0.07241942 0.5298364 1.00000000
Exercise 2 Matrix scatterplotting
a) Work your way through some inital explorative analysis of the mtcars data, seethe subsection on this above.