+ All Categories
Home > Documents > SAS Statistics

SAS Statistics

Date post: 24-Feb-2016
Category:
Upload: ernst
View: 52 times
Download: 0 times
Share this document with a friend
Description:
SAS Statistics. Technology Short Courses: Spring 2010 Kentaka Aruga. Object of the course. Performing simple descriptive statistics (proc mean, proc freq, and proc corr) Performing basic test statistics (Chi-square test, T-test, F-test) - PowerPoint PPT Presentation
Popular Tags:
52
SAS Statistics Technology Short Courses: Spring 2010 Kentaka Aruga
Transcript
Page 1: SAS Statistics

SAS   Statistics

Technology Short Courses: Spring 2010 Kentaka Aruga

Page 2: SAS Statistics

Object of the course

• Performing simple descriptive statistics (proc mean, proc freq, and proc corr)

• Performing basic test statistics (Chi-square test, T-test, F-test)

• Basic commands for regression analysis and how to export the result into a table(proc reg)

Page 3: SAS Statistics

Section 1 Preparation

Getting data and importing data

Page 4: SAS Statistics

Getting data• Download the SAS command that will be used in this practice

from

http://www.uri.edu/its/research/sasstat.txt

• Download the data file that will be used in this course from

http://www.uri.edu/its/research/auto.xlshttp://www.uri.edu/its/research/vote.txt

• Save the files under ‘C:/’ drive of your windows computer.

Page 5: SAS Statistics

Importing Excel file to SAS • Open SAS program and copy and paste the

following commands from the file you have just downloaded “sasstat.txt”:libname car ‘c:/’;proc import out= car.autodatafile=“c:/auto.xls”dbms=excel2000 replace;sheet=“auto”;getnames=yes;run;

Page 6: SAS Statistics
Page 7: SAS Statistics

Then highlight the command line and execute the command.

Page 8: SAS Statistics

Proc import• Look at the ‘trunk’ column• Do you see an empty column?• SAS determines the data type based on

the most common data type in the first 8 rows. ‘trunk’ column has mixed data.(since the first eight columns are all zero, the remaining columns become all zero)

Page 9: SAS Statistics
Page 10: SAS Statistics

Proc import• Add the following statement

mixed = yes;• Now the command line should look like

proc import out= car.autodatafile=“c:/auto.xls”dbms=excel2000 replace;sheet=“auto”;getnames=yes;mixed = yes;run;

• Execute this command

ADDED

Page 11: SAS Statistics
Page 12: SAS Statistics

Importing Excel file from the main menu bar

• From the main menu click “File,” and then click “Import Data.”

Page 13: SAS Statistics

Importing Excel file from the main menu bar

• Under the “Import Wizard” specify the data source (in this example select MS Excel) and click next.

• Under the “Connect to MS Excel” wizard, browse the Excel file you are importing.

Page 14: SAS Statistics

Importing Excel file from the main menu bar

• Under the “Select Table” wizard select the name of the “sheet” of your Excel file and click next.

• Under the “Select library and member” wizard, specify the library where you want to import the Excel file.

• Put in the name of the file in the “Member” box to name the file that will be imported to SAS.

Page 15: SAS Statistics

Saving the syntax for importing Excel file• You can save the syntax for what we just did to import

the Excel file using the main menu bar.• Browse and name the file in “Create SAS Statements”

wizard.• Open the “sas” file you just saved to see the commands.

Page 16: SAS Statistics

Section 2

Performing simple descriptive statistics (proc mean, proc freq, and proc corr)

Page 17: SAS Statistics

How to perform simple descriptive statistics (Review from SAS basics course)

• How would you see the number of obvs, mean, std, min, and max of all numeric variables in SAS?Ans. proc means data=car.auto;

run;

• How do you analyze frequency of the variables?Ans. proc freq data=car.auto;

run;

Page 18: SAS Statistics

Proc means• By default “proc means” provides the number of obvs, mean, std,

min, and max of all numeric variables proc means data=car.auto;run;

• Specifying a certain variable– var variable name ;

Q. How would you execute the mean procedure for the variables “price”, “mpg,” and “weight” ?

• Creating an output table– output out= file name

Q. How would you get the output for the mean procedure for the variables “price”, “mpg,” and “weight”?

Page 19: SAS Statistics

Proc means (Answers)proc means data=car.auto;output out=car.means;var price mpg weight;run;

Page 20: SAS Statistics

Proc freq• By default this procedure creates frequency tables for all

variablesproc freq data=car.auto;run;

• Specifying a certain variable– tables variable name

Q. How would you execute the FREQ procedure for the variable “foreign”?

• Creating an output table– /out = file name

Q. How would you get the output for the FREQ procedure for the variable “foreign”?

Page 21: SAS Statistics

Proc freq (Answers)

proc freq data=car.auto;tables foreign /out=car.frn;run;

Page 22: SAS Statistics

Proc freq: Creating a two-way table

• How would you create a two-waytable using the FREQ procedure for the variables “rep78” and “foreign”?

Ans.proc freq data=sasuser.auto;tables rep78*foreign;run;

Page 23: SAS Statistics

Proc freq: two-way table

Total % (= 8/13)

Row % (= 8/9)

Column % (= 8/10)

Page 24: SAS Statistics

Proc corr• The CORR procedure generates ‘Simple Statistics’

based on non missing values, and ‘Pearson Correlation Coefficient’, an index that quantifies the linear relationship between a pair of variables

• Insignificant p-value indicates the lack of linear relationship between the two variables.

Page 25: SAS Statistics

Proc corr• Finding correlations between a pair of

variables1) All variables

proc corr data=car.auto;run;

2) Three specific variablesproc corr data=car.auto;var price mpg weight;run;

Page 26: SAS Statistics

The low p-value indicates a strong negative linear relationship between weight and mpg. The heavier the car is the lower the mpg becomes.

Page 27: SAS Statistics

Section 3

Performing basic test statistics(Chi-square test, T-test, F-test)

Page 28: SAS Statistics

Chi-square test of independence

• What is the Chi-square test of independence?Ans. It tests whether the variable in the row and column are independent or related

• What is the null hypothesis?Ans. The variables in the row and column are independent: there is no relationship between row and column frequencies

• The command for SAS to test this is provided in the option of “proc freq”. Simply use chisq.

• To display the expected cell frequency for each cell use the option “expected.”

Page 29: SAS Statistics

Chi-square test of independence: exercise

There are 34 students in the classroom and there was a vote on whether they wanted to have a turtle in their classroom as a pet. The data file “vote.txt” contains the result of the vote (Yes=y, No=n), and gender of the students (male=m, female=f).

• Q1 Import the file “vote.txt” into SAS and name the variables “answers” and “gender.”

• Q2 Using the option “chisq,” test whether or not the answers to the vote and gender are associated with each other.

Page 30: SAS Statistics

Answers

Q1 data vote; infile 'c:/vote.txt'; input answers $ gender $; run;

Q2 proc freq data=vote; tables answers*gender /expected chisq; run;

Page 31: SAS Statistics

Results

(34)totalTable(16)totalColumn)15(totalRowFreqExpect

Page 32: SAS Statistics

What does the result tell you?• The null hypothesis that

the two variables are independent is rejected at even 1% significance level.

• The two variables “answers” and “gender” are associated with each other (They are dependent).

This is lower than 0.01

Page 33: SAS Statistics

Proc ttest• This procedure is used to test the hypothesis of

equality of means for two normal populations from which independent samples have been obtained.

– Three cases in SAS• One-sample t-test

– Computes the sample mean of the variable and compares it with a given number.

• Two-sample t-test– Compares the mean of the first sample minus the

mean of the second sample to a given number.• Pair observations t-test

– Compares the mean of the differences in the observations to a given number.

Page 34: SAS Statistics

Assumptions of “proc ttest”

• The observations are random samples drawn from normally distributed populations. This can be tested using the UNIVARIATE procedure – If the normality assumptions are not satisfied: use NPAR1WAY

procedure.• Two populations of a group comparison must be

independent. – If not independent, you should question the validity of a paired

comparison.• The default null hypothesis is set as equal to zero. To

change this you can use H0=‘number’.” e.g. h0=10• The default confidence level is 5%. To change this you can

use alpha=‘confidence level’.” e.g. alpha=0.01

Source: http://www.okstate.edu/sas/v8/saspdf/stat/chap67.pdf

Page 35: SAS Statistics

Proc ttest: exercise

• How would you perform a t-test on mpg variable classified by foreign variable?Hint: use “class” and “var” statement

• What will the null hypothesis be in this case?

Page 36: SAS Statistics

Proc ttest (Cont’d)• The command

proc ttest data=car.auto;class foreign;var mpg;run;

– CLASS statement: contains a variable that distinguishes the groups being compared.

– VAR statement: specifies the response variable to be used in calculations.

• The null hypothesis

• The alternative hypothesis

0:H foreigndomestic0

0:H foreigndomestic1

Page 37: SAS Statistics

• The first table shows the basic statistics• The second table is the t-test for equal mean. Before using this

table you need to look at the third table to determine if the assumption of equal variances is reasonable

• The third table is a test of equal variances• In this example the null hypothesis of equal variance is not

rejected.• Thus you need to look at the “equal variance” in the second table.

The second table suggests there is not a difference in means across domestic and foreign car.

See here

High high p-value

Page 38: SAS Statistics

Section 4

Basic commands for regression analysis and how to export the result into a table(proc reg)

Page 39: SAS Statistics

Regression analysis

• Regression analysis : finding a reasonable mathematical model of the relationship between a response variable (y) and a set of explanatory variables (x1, x2,…. xP)

• General model

0 1 1 2 2 p py x x x

Page 40: SAS Statistics

Proc reg• General command

proc reg data = file namemodel DV = IV ; run;DV: dependent variable IV: independent variable

• This procedure also does the following testing:– F-test: Tests the null hypothesis that none of the independent

variables has any effect– T-test

Tests for each IV the null hypothesis that the independent variable has no effect toward the dependent variable.

Page 41: SAS Statistics

Proc reg: exercise• Let ‘price’ be a response variable (dependent

variable (DV)), and ‘mpg’ and ‘length’ be explanatory variables (independent variables (IV))

Q1 What will be the commands?

Q2 What null hypotheses will be tested?

Q3 Will the model be significant?

Page 42: SAS Statistics

Proc reg: answers

Q1 proc reg data = car.auto; model price = mpg length;

run;Q2 F-test

T-test

0:0 iH 0:1 iH

00 price:H lengthmpgprice: 2101H

Page 43: SAS Statistics

Proc reg

Q3

Page 44: SAS Statistics

Proc reg: Confidence and prediction interval• Constructing 95% confidence and

prediction interval by adding two options, ‘clm’ and ‘cli’

• How would you add these options in the case of previous model?proc reg data=car.auto;model price = mpg length / clm cli;run;

Page 45: SAS Statistics

Proc reg: creating an output table

• Add “outest = file name” after the “proc reg” command

proc reg data=car.auto outest=car.est1;model price = mpg length /clm cli;run;quit;

• In order to see the output data file “car.est1” you need to add the statement “quit” in the end.

No semicolon here

Page 46: SAS Statistics

• You can drop the categories you do not want to see by using the “keep” or “drop” statemente.g. data car.est2 (keep=intercept mpg length);set car.est1;run;data car.est3 (drop=price _model_ _depvar_ _type_ _RMSE_);set car.est1;run;

Page 47: SAS Statistics

Proc reg: creating an output table

• To see other outputs go to “Help” and type in “REG” and go into “The REG procedure.”

Click “Syntax”

Page 48: SAS Statistics

Click Here

Page 49: SAS Statistics
Page 50: SAS Statistics

Exporting the output data to Excel • General commands

proc export data = Name of the SAS data file you are exporting

outfile = “The name of the drive or the pass to the folder of your computer”

dbms = excel2000 replace;run;

• How would you export the file “car.est2” into an Excel file?Ans. proc export data = car.est2

outfile = “c:/est.xls" dbms = excel2000 replace; run;

Page 51: SAS Statistics

Useful supports: other useful sites

• Online SAS manualshttp://www.uri.edu/sasdoc

This will automatically link you to http://support.sas.com/documentation/onlinedoc/sas9doc.html

• Statbookstore: useful site for finding program exampleshttp://www.geocities.com/statbookstore/

Page 52: SAS Statistics

For further Questions: [email protected]


Recommended