Statistical Software R
More data sets …. See http://www.statsci.org
What is R ?
A new(?) standard to interchange the ideas of statistics.
- 1st version was published in early 90’s
- Public SW by GNU, under GPL ( It’s free ).
- S language + Math/Stat Lib + Graphical tools
- More information: http://www.cran.r-
project.org
Time vs Time
Dev. time
Run time
C, FORTRAN
Excel
R
Develop for 1 month, run in 1 second.
Or, develop for 1 day, run in 10
min.
Applicability, range of
Applicability
Convenience
C, FORTRAN
Excel
R
C, FORTRAN
R
Excel
Calculator
R, Excel and C
- Excel is a SW for general purpose
- R is a professional SW
- C is a developing tool having wide range of applicability
GUI ?
• Clicking is slower and hard than typing !!
• Clicking is not good for iterative job at
company
• Clicking is easy to generate garbage !!
GUI is a good feature , especially for novice!
R is ~
R = S lang. + Math & Stat Lib. + Graphic
tools
Easy & efficient handling of data
Rich modern statistical routines
Free under GPL of GNU
- R is at the center of statistical development.
- To turn ideas into SW, quickly and faithfully.
- R is a tool for saving & exchanging statistical data
Very good book, but a little difficult to novice.
Easier alternatives
There are many easy books (try to find in amazon)
and free tutorial guides in internet.
http://cran.r-project.org/doc/manuals/R-intro.pdf
Official free introductory guide:
http://tryr.codeschool.com/
A free self study guide sites:
http://www.sr.bham.ac.uk/~ajrs/R/index.html
http://www.cran.r-project.org/bin/windows/base/R-2.10.1-win32.exe
Download
R ver. 2.10.1, base package, executable binary file :
Contributed packages: downloading inside of R
By clicking the install icon, you can install R easily.
ENIAC programming, 1946
A journey for easy scientific computing
Pascal
S
C
Lisp
Scheme
S-plus
C++
COBOL
Algol60
Smalltalk
FORTRAN
APL
OOSense
Semantics
Syntax
ENIAC
Features of R
1. Vector Arithmetic (APL, S-plus)
2. Object Oriented property (Smalltalk, S-
plus)
3. Lazy evaluation (S-plus)
4. (Nested) lexical scoping (Scheme,
PASCAL)
1. Vector Arithmetic
x <- c(10,20,30) + c(5,5,5)
y <- c(10,20,30) + c(1,2,3)
2. Object oriented property
Smalltalk (1970, A. Kay, Xerox)
Everything is an object, and every object has a
class.
Object is everything ?
Integrated concept : Variable, Data, Function,
…..
Unified framework to work on. (user)
Class has the info of the object. (types of var)
거시기
갑옷을 거시기하자 ( 갑옷을 입자 , 갑옷을
벗자 )
class: 갑옷 method: 거시기 object: 실제
개개의 갑옷
Concept of OO
Clicking the mouse button !
( open a file, execute a pgm, delete a
file, ….)
Let the function work properly
according to the characteristics of
objects !
Make human command easier
and make computer work harder
to understand the command.
OO in R
- diag(3), diag(c(1,2,3)), diag(diag(3))
- plot(sunspots) , plot(Titanic),
plot(USJudgeRatings)
- attributes(sunspots) ,
attributes(Titanic),
attributes (USJudgeRatings)
How to use R
1) Help : by menu, help(plot), ?title
2) demo(); demo(nlm); demo(image)
3) x <- matrix(1:4,2,); ls();
attributes(x)
4) #Install & Upload package tseries; search()
5) save.image("C:/temp/a.RData"); q()
Memory & HDD
HDD
Peripheral device Computer
CPU
Memory
How R works
Frame for computing
Input Output
…
.GlobalEnv
library
….
Environment
Namespace & Loaded Value
> search()
> searchpaths()
….
Memory
HDD
new objects
loaded package
> ls() # shows objects inside of libraries
R data sets
R has its own data sets for testing
- data();
- Titanic; ?Titanic
- plot(Titanic)
http://www.aw.com/sharpe Data sets of SVV
Get text file and excel file in your computer,
and decompress.
Make copies of text files under “C:\temp\text”
SDV data : see p 188 # 32 , Economic Analysis data
You can draw by yourself very simply !
data.svv<-dir("c:/temp/text")dfile.svv<-paste("c:/temp/text/",data.svv,sep="")
dsv<- read.table(dfile.svv[37],head=TRUE, sep="\t")
y<-dsv[,3]x<-dsv[,4]
plot(x,y, pch=16, col="purple", xlab="Sogang Stat" )
points(20000,40, pch=1, cex=10, col="blue")title("Economic Analysis")
Install & load packages
Memory
HDD Internet
Load
Install
Server
Stock price data from finance.yahoo.com
ghq<-get.hist.quote # upload the package “tseries”
time<- "1996-01-01"
kospi <- ghq(ins = "^ks11", start =time, quote = "Close")
dscon <- ghq(ins = "011160.ks", start = time, quote ="Close")
tm <- ghq(ins = "tm", start =time, quote = "Close")
plot(tm,xlab="Toyata Motors")
plot(kospi,dscon,type="l", xlab=" 종합주가지수 ", ylab=" 두산건설 " )
Hanoi Tower
By simple programming, graphical implementation of
Hanoi tower is possible in R . The code & program
were loaded to cyber campus.
- hanoi(4)
- hanoi(14)
Business Statistics, Sogang Business School
# This is comment line.# download R from cran.r-project.org # explain menu first
q() # Stop R session; Do not save the workspace
# .First<-function() cat("Helo everyone ?\n") # .Last<-function() { cat(“Bye, SBS Students !")} # ls() # ls(all=TRUE)
q()
# Save the workspace
# Now, we know the first and the last of R# That is, we know everything of R
q help help(q)
data()
help(data)
sunspots
help(sunspots)
hist(sunspots)
help(hist)
args(hist) # arguments of the function
hist()
hist(sunspots, nclass=10) # with more
intervals
par(mfrow=c(1,2)) # set graphic
layout
hist(sunspots) # in different
layout
hist(sunspots, nclass=20) # two in a
picture
hist(sunspots, nclass=20,plot=F) # without
plot
?co2
# co2 and sunspots in Jan 59 - Dec 83 ?
co2x<- co2[1:(12*(83-58))]
sunpt<-sunspots[-(1:(12*(1958-1748)))]
par(mfrow=c(2,1))
plot(co2x)
plot(sunpt)
x <- rnorm(100,0,1) # random number
generator
y<-rnorm(100,0,1) # each has 100
elements
x # show x
y # show y
xy<- x + y
( z<-rnorm(100,0,1) ) # assign and show
ls() # show objects in …
# tuning for graphic layout
help(par)
# Text and Symbols: cex, pch, type, xlab,
ylab, ....
# The Plot Area: bty, pty, xlim, ylim, ....
# Figure and Page Areas: mfrow, ....
# Miscellaneous: lty, ....
plot(x,y)
plot(xy, y)
# set the graphic parameters
par(mfrow=c(2,2), pty="s")
plot(x, y, pch=0, cex=0.7 ) # pch and
cex
plot(xy, y, pch=16,cex=0.7)
plot(x,y, pch=0, cex=1.2 )
plot(xy,y, pch=16, cex=1.2 )
par(mfrow=c(1,1)) # mfrow
plot(xy,y, pch=16, cex=1.2 )
plot(xy,y, type="n") # prepare
axis only
points(xy,y, pch=16, cex=1.2 )
lines(xy,y)
# plot only points, but not axis
plot(xy,y, axes=FALSE, xlab="x+y",
ylab="y")
cbind(x, y, xy) # column binding
y[y>0]
xy[y>0]
cbind(x, y, xy) [y>0]
plot(xy,y, type="n", xlab="x+y", ylab="y" )
# axis only
points(xy[y>0],y[y>0], pch=16, cex=0.6 )
# for y>0
points(xy[y<=0],y[y<=0], pch=1, cex=0.8 )
# y <= 0
# pch
plot(c(-1,8),c(-1,8), type="n")
for(i in 0:7) for(j in 0:7) points(i, j, pch=i+8*j,
cex=1.2)
points(-0.5, -0.5, pch="9", cex=1.2)
points(7.5, 7.5, pch=" 한 ", cex=1.2)
identify( xy, y, x)
# to pick the points, using (left) mouse
button
identify( xy, y, round(x,2), cex=0.6)
# to stop, use (right) mouse button
pts<-locator(5)
polygon(pts)
help(polygon)
par() # all graphic parameters
par()$usr # usr
uc <- par()$usr # to simplify
lines( c(uc[1], uc[2]), c(0,0), lty=2) # center
line
lines( c(0,0), c(uc[3], uc[4]), lty=2) # lty
# diagonal line
lines( c(uc[1], uc[2]), c(uc[3], uc[4]) , lty=1)
text( 1.0, -1.2, " positive y-values ! ")
title(" (x+y) and y from N(0,1) ", cex=0.6 )
help(USJudgeRatings) USJudgeRatings
pairs(USJudgeRatings)
pairs(USJudgeRatings[1:5])
## put histograms on the diagonal
panel.hist <- function(x, ...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = FALSE) breaks <- h$breaks; nB <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...) }
pairs(USJudgeRatings[1:5],
panel=panel.smooth,
cex = 1.5, pch = 24, bg="light blue",
diag.panel=panel.hist, cex.labels = 2,
font.labels=2)
# You can fix and modify the picture in
power point
# Class Assignment.
# draw the picture of (2x+y, 2y)
# for different pch parameters
# in a plot and put a legend.
# Important functions to understand R
# ls(); search(); searchpaths()
# attributes()
# c(); data.frame() ; factor();
ordered()
# apply()
Thank you !!