+ All Categories
Home > Documents > Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D....

Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D....

Date post: 18-Dec-2015
Category:
Upload: esmond-young
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
15
Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student, Statistics As you come in, please get materials here: https://filebox.vt.edu/users/lan ham/LISA/
Transcript
Page 1: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Combining the Power of R and Excel: RExcel

A LISA Short CourseFebruary 2012

Matthew LanhamPh.D. Student, Business Information TechnologyM.S. Student, Statistics

As you come in, please get materials here:

https://filebox.vt.edu/users/lanham/LISA/

Page 2: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,
Page 3: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Motivation for this course:Two Facts1. Excel is the most prevalent software used for data storage and analysis. There are a

lot of built in statistical functions in Excel along in addition to the “Analysis ToolPak.”

2. R is a free and open source program, and one of the most powerful and the fastest-growing statistics programs.

Outcome from this course:I hope to have provided you some examples that you might incorporate in your own work that might prove beneficial.

Why not use them both together!!

This is you with Excel This is you with Excel + R

Page 4: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Lets get started:1) Double-click the RExcel2010 with Rcommander Icon

This will open Excel and Rcommander. R commander is like using the standard R GUI, but looks a bit different. You will find R in the Excel Ribbon as well.

Page 5: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Transferring data between R and Excel• Data from Excel to R• Data from R to Excel

RExcel Drop-down Close R – Will close the open instance of R and Rcommander as well Run Code – Will run R code Get R Value (Array or Dataframe) – Gets data Put R Value (Array or Dataframe) – Defines a cell or range for R Get R Output – Retrieves code output from R to Excel Set R working dir – Define the folder location you want to work from on your PC. Load R file – Used to load a data set or .R file Copy code – copies code in Excel Debug R – If checked, this will open a debugger if an error occurs Error log – This will show you all the R errors Options – Offers a few basic options Set R sever – allows to select the server type, server name (for remote servers), and R process

name (for servers from a serverpool). RExcel Help – Takes you here: file:///C:/Program%20Files%20%28x86%29/RExcel/doc/RExcel.html Rhelp – Takes you here: http://127.0.0.1:18357/doc/html/index.html Rcommander – Opens Rcommander with menus in the Excel Ribbon or in Rcommander. Demo worksheets – There are five demos for learning how to use the software Mark calc cells – If activated, this will mark all cells containing calculated results with a special

marker in the upper left corner About RExcel

Part 1

Page 6: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Functions, Arrays, and Dataframes• Advantages– Use Excel as a container for dependencies– Use R code functions without lengthy “IF”

statements– Allows automatic recalculations via Excel’s

computation engine(R will not do this by itself)

See RExcelExamples workbook, Part1 tab

Part 1

Page 7: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Regression: Excel and RExcel1. Excel Functions TREND(Y-range, X-range, X-value for prediction) function LINEST(Y-range, X-range, Const, Stats) array function 2. Excel’s Analysis TookPak Data -> Data Analysis -> Regression -> Then fill in the dialog box (see example sheet)

R3. Use Rcommander

"Statistics" -> "Fit models" -> "Linear regression.."2. Use R code via RExcel myfit = lm(formula = Sales ~ Advertising, data =

salesdata)summary(myfit)

Benefits of each: Use what you like and is more advantageous to your problem• The Excel functions automatically update • Analysis TookPak outputs the statistics in a nice readable table• Rcommander has nice drop-down menus• R provides plots that are not easily available via Excel alone• R is more extensible and allows more advanced modeling

Part 2

Page 8: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Regression: Assumption Review

35 45 55 65 75 85 95 105 1150.0

100.0

200.0

300.0

400.0

500.0

600.0Sales vs. Advertising

Advertising (in $1000s)

Sale

s (in

$10

00s)

Tells us that our OLS estimators (our intercept and slope) are unbiased and have minimum variance among all linear unbiased estimators IF…

Gauss-Markov Theorem

Two assumptions: (1) Independence => = 0 (2) Equal variance (aka. Homoscedasticity, same finite variance) =>

To make tests inferences, do statistical tests, and create confidence intervals, we need to assume a third condition: (3) Error is normally distributed => )

Part 2

Page 9: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Regression: Assumption investigation (a.k.a. Diagnostics) Linear relationship among Sales

and Advertising looks fine.

What do you think about our independence assumption?

What do you think about the constant finite variance assumption?

What about normality?

Anything else stand out?

Part 3

Page 10: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Regression: Fit without influential points

Here we see our new fitted line, in addition to how well our model performed at estimating sales.

Part 3

Page 11: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Regression: More diagnostics Linear relationship among

Sales and Advertising is fine.

What do you think about our independence assumption?

What do you think about the constant finite variance assumption?

What about normality?

Anything else stand out?

Part 3

Page 12: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Regression: InterpretationRegression Statistics What is this calculation?

Multiple R 0.956 This is the Pearson correlation of (x and y) for simple regression, or the sqrt of "R Square"R Square 0.914 a.k.a. Multiple R-squared, is the fraction of total variation explained by the model)Adjusted R Square 0.909 Similar to R-square but adjusts for the number of covariates in the model.Standard Error 32.707 This is the standard error of our residuals.Observations 18 This is the total number of observations we used.

Regression Statistics What does it mean?Multiple R 0.956 There is a strong positive linear relationship among advertising and salesR Square 0.914 91.4% of the variation in sales is explained by the variation in advertisingAdjusted R Square 0.909 90.9% of the variation in sales is explained by the variation in advertising, accounting for number of covariates used.Standard Error 32.707 This is our measure of spread or variability for our residuals in the model.Observations 18 This is the total number of observations we used

ANOVA df SS MS F Significance F

Regression 1 181861.7 181861.7 170.0 0.000Residual 16 17116.4 1069.8Total 17 198978.0 ANOVA - This is just a table that summarizes the levels of variation.

df SS MS F Significance FRegression k = # covariates SSR = variation in the the mean response MSR = SSR/k MSR/MSE p-value of F-testResidual n -1 - k SSE = variation in residuals MSE = SSE/(n-1-k)Total n -1 SST = total variation in the response

The F and Significant F tell us that our slope is statistically significant. Meaning, it is highly unlikely it is 0.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -109.64 37.46 -2.93 0.010 -189.06 -30.23Advertising 6.26 0.48 13.04 0.000 5.24 7.28

Coefficients Standard Error t Stat P-value 95% CI for parameterIntercept y-int seY = s.e. of y-intercept y-int/seY significance for y-int Lower limit Upper limitAdvertising slope seX = s.e. of slope slope/seX significance for slope Lower limit Upper limit

Part 3

Page 13: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Using R commander

• Obtain data sets from R libraries or load in your own• Nice drop-down for basic statistics and plots (code

prints to R commander window)• Common distributions are available via drop-down

See RExcelExamples workbook, Part4 tab

Part 4

Page 14: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Using built-in R commander plug-insLets look at RmcdrPlugin.HH

XY conditioning plot (HH) Side-by-side Boxplot

These plots are useful, but somewhat dull. The code that generates these will show up in the R commander window (very useful for newbies). Like plotting in Excel, you can get what you need by default, but you’ll probably have to modify the graph a bit.

Part 5

Page 15: Combining the Power of R and Excel: RExcel A LISA Short Course February 2012 Matthew Lanham Ph.D. Student, Business Information Technology M.S. Student,

Additional References• http://rcom.univie.ac.at/RExcelDemo/


Recommended