Initiation to R
Nicolas Sutton-Charani
Initiation to R 2019-04-29 1 / 28
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 2 / 28
Introduction
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 3 / 28
Introduction
What is R ?
Programming langage and statistical computing for data analysis
GNU package (C, Fortran)
freely available under GNU General Public License
collaborative project
Comprehensive R Archive Network (CRAN)
History
1975 : J. Chambers (Bell Laboratories) → S1995 : R. Ihaka and R. Gentleman (University of Auckland) → R
Initiation to R 2019-04-29 4 / 28
Introduction
Employment
1
1. http://r4stats.com/articles/popularity/
Initiation to R 2019-04-29 5 / 28
Introduction
R vs Python
2
2. http://r4stats.com/articles/popularity/
Initiation to R 2019-04-29 6 / 28
Introduction
Analytic tool
3
3. http://r4stats.com/articles/popularity/
Initiation to R 2019-04-29 7 / 28
Introduction
Analytic tool
4
4. http://r4stats.com/articles/popularity/
Initiation to R 2019-04-29 8 / 28
Software installation
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 9 / 28
Software installation
Software installation
R : the software
http://www.r-project.org/ → CRAN → choose one of the frenchmirrors → Download R for Windows/Mac/Linux → base → Download R3.5.1 for XXX
R studio : development environment
https://www.rstudio.com/ → Download Rstudio → RStudio DesktopOpen Source License : Download → choose correct installer
Execute the 2 .exe �les
Initiation to R 2019-04-29 10 / 28
Types and basics operations with R
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 11 / 28
Types and basics operations with R
Variables types
No type declaration ! → R-object assignment :
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
Data types :
logical (TRUE, FALSE)
numeric (ex : 12.3, 5, 999)
character (ex : "a" , "good", "TRUE", "23.4") or factor when allmodalities are known
Initiation to R 2019-04-29 12 / 28
Types and basics operations with R
Operators
Arithmetic Comparison Logical+ addition < lesser than ! x logical NOT− subtraction > greater than x & y logical AND∗ multiplication <= lesser than or equal to x && y id./ division >= greater than or equal to x | y logical OR∧ power == equal x ‖ y id.
%% modulo ! = di�erent xor(x, y) exclusive OR%/% integer division
Initiation to R 2019-04-29 13 / 28
Types and basics operations with R
Vectors
Values assignment : '<-' or '=' (ex : x <- 3 or x = 3)
Data generation
1 : 10[1] 1 2 3 4 5 6 7 8 9 10
seq(-3, +3, length = 13)[1] -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
v <- c(4, 10, 16)v[3]
[1] 16
Functions on vectors
mean(), sum(), median()
var() and sd()
length()
summary()
Initiation to R 2019-04-29 14 / 28
Types and basics operations with R
Matrices
m <- matrix(data = 1 : 12, nrow = 3, ncol = 4)m
[,1] [,2] [,3] [,4][1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12
m[2, 3][1] 8
dim(m)[1] 3 4
cbind(m, v)v
[1,] 1 4 7 10 4[2,] 2 5 8 11 10[3,] 3 6 9 12 16
rbind(m, c(v, 5))[,1] [,2] [,3] [,4]
[1,] 1 4 7 10[2,] 2 5 8 11[3,] 3 6 9 12[4,] 4 10 16 5
Initiation to R 2019-04-29 15 / 28
Types and basics operations with R
Dataframes
special type of matrix
mixed types of data
nominal column indexing → insensitive to data reordering !
df <- data.frame(id = c("id1", "id2", "id3", "id4", "id5"),poids = c(85, 78, 56, 102, 91),taille = c(170, 176, 155, 187, 202))
dfid poids taille
1 id1 85 1702 id2 78 1763 id3 56 1554 id4 102 1875 id5 91 202
call columns by their names : df$poids (vector) or df['poids'](dataframe)
Initiation to R 2019-04-29 16 / 28
Types and basics operations with R
Dataframes
List of equal-sized vector containing di�erent variable types→ function 'class' :
class(df)[1] "data.frame"
class(df$taille)[1] "numeric"
class(df$id)[1] "factor"
Variable names → selection, �lter :
df1 <- subset(df, select = c(id, taille)) ⇔ df1 <- subset(df, select = - poids)
df2 <- df[df$poids > 80, ]
Initiation to R 2019-04-29 17 / 28
Data import
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 18 / 28
Data import
Import
from a text or csv �le :df <- read.table(�le = "�le.txt", sep = " ;", dec = ",", header = TRUE)
from a csv �le : df <- read.csv(�le = "�le.csv")
from an excel �le :library(readxl)
df <- read_excel("my_�le.xls")
from a database :library(RODBC)connexion <- odbcDriverConnect('driver = SQL Server ;
server = mysqlhost ;database = mydbname ;trusted_connection = true')
df <- sqlQuery(connexion, 'SELECT * FROM information_schema.tables')
odbcClose(channel)
Initiation to R 2019-04-29 19 / 28
Data simulation
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 20 / 28
Data simulation
Simulations
Simple randomness : n <- sample(N, size = 7, replace = FALSE)
Random generation
Normal (Gauss) : v <- rnorm(n, mean = 0, sd = 1)
Poisson : v <- rpois(n, lambda)
Binomial : v <- rbinom(n, size, prob)
...
Probability corresponding distributions : dnorm, dpois, dbinom, ...
Initiation to R 2019-04-29 21 / 28
Plots
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 22 / 28
Plots
Plots
x <- seq(-10, +10, length = 10000)y <- cos(x)z <- dnorm(x)
plot(x, y)
−10 −5 0 5 10
−1.
0−
0.5
0.0
0.5
1.0
x
cos(
x)
plot(x, z, main = "Normal distribution",cex.main = 3, font.main = 6,xlab = "x", ylab = "f(x)", pch = "+",
cex.axis = 1.5, cex.lab = 1.5, col = "red")
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++++++++++++++++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
−10 −5 0 5 10
0.0
0.1
0.2
0.3
0.4
Normal distribution
x
f(x)
Initiation to R 2019-04-29 23 / 28
Packages
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 24 / 28
Packages
Packages
Installation : install.packages("<the package's name>",repos='http ://cran.us.r-project.org')Loading : library(<the package's name>)
package name description
ggplot advances plottingMASS statistical toolsmatlab use matlab codedplyr data manipulationdoParallel parallelisationcaret machine learninge1071 SVMshiny interfacing...
...
Initiation to R 2019-04-29 25 / 28
Useful functions
Plan
1. Introduction
2. Software installation
3. Types and basics operations with R
4. Data import
5. Data simulation
6. Plots
7. Packages
8. Useful functions
Initiation to R 2019-04-29 26 / 28
Useful functions
Useful functions
grep :grep("mtpl", c("PSG", "OL", "MSCHmtplPro", "GazélecAjac"))[1] 3
apply :df <- data.frame(A = c('hello', 'bye', 'thanks'),+ B = 1 : 3,+ C = c(T, F, F))sapply(df, class)
A B C"factor" "integer" "logical"
Initiation to R 2019-04-29 27 / 28
Useful functions
Useful functions
cat, paste :n <- 10cat(paste("run number", n))run number 10
system.time :learCT <- system.time(svm <- svm(target ∼ ., data = trainData)
)
head/tail
which
Initiation to R 2019-04-29 28 / 28