8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 1/31
Ground RulesSwitch off all mobile phones
No breaks allowed in between the training program
No discussion among the trainees
Questions are open for discussion so that everyone can learn and theymustbe addressed to the TrainerThough the Trainer would try to provide all questions/clarifications sought,in some specific cases he may decide to provide the necessaryclarificationspost the training and continue ahead with the trainingEvery trainee must complete the exercises provided at the end of thetraining
program individually. Training would be considered complete only uponcompletion of the exercises satisfactorily
All exercises submitted would be discussed with the trainee individually /another meeting might be set up to share the learning among the group
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 2/31
Overview of how the training is organized
Expectations from the Training Program
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 3/31
Numerous software are used in the academia and industry fordata management, statistical analysis and optimization
Statistical Analysis
Predictive Modeling
Decision Trees & Segmentation
Forecasting & Simulation
Optimization
Campaign Management
Win Cross
KnowledgeSeeker
CART
Unica
Evolver Risk Optimizer
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 4/31
SAS provides a complete set of solutions for enterprise wisebusiness users for data management and analysis
Brief History• Stands for Statistical Analysis System • Developed in early 70‟s at the North Carolina State University • SAS Institute Inc. formed in 1976
Applications ofSAS
SAS Products
• Base SAS – Data Management and Basic Procedures • SAS/STAT – Statistical Analysis • SAS/GRAPH – Presentation Quality Graphics • SAS/OR – Operations Research • SAS/ETS – Econometrics and Time Series Analysis • SAS/IML – Interactive Matrix Language • SAS/SQL – Structured Query Language
• Data Entry, Retrieval and Management • Report Writing • Statistical and Mathematical Analysis • Business planning, Forecasting and Decision Support • Operations Research • Quality Improvement • Applications Development
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 5/31
• Editor – Write theSAS program
• Log – Check the logafter running the SAScode
• Output – Check theoutput, if applicable,post SAS processing
• Explorer – Navigateand check librariesand datasets
• Results – Stores pastresults for review
There are five basic windows available in the SAS software – irrespective of whether its Windows SAS or Unix SAS
Results Explorer Output LOG Editor
SAS Help
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 6/31
• DATA step is used to create a SAS dataset,o either temporary or permanent datao from raw data or another SAS dataset
• SAS dataset can be creating in multiple wayso In-stream raw data o DATA Employee;o INPUT Name $ Id Dateofjoining mmddyy10.;o DATALINES;o Sundar 1012 09152005o Indrajit 1000 06012005o Anindya 1017 12262005o ;RUN;
o Existing SAS dataset o DATA Employee2;o SET Employee;o Empperiod = today() - dateofjoining;o RUN
;
o An external delimited file (details later!!) o Remote Databases (Oracle, DB2 etc.)
“DATA” and “PROC” step are the basic and most importantdata processing methods available in SAS
• SAS Procedure (Proc) is used to perform an action ona SAS dataset, for e.g. –
• Sorting a SAS dataset by one or more variables • Running a frequency distribution on a variable in a
SAS dataset • Ordinary least squares linear regression model (comes
with SAS/STAT)
• Creating a final report in a client presentable format
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 7/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 8/31
• To create a SAS dataset from a raw data file
o Start a DATA step and name the dataset being createdo Mention the location of raw data file to be read using a INFILE statement
INFILE „filename‟ <options>; o Mention the data fields from the raw data file using the INPUT statement
INPUT varname <$> <var-specifications>;
Raw data files in any input format can be read into SAS usingthe INFILE and INPUT statements in a DATA Step
0031GOLDENBERG DESIREE0040WILLIAMS ARLENE M.0071PERRY ROBERT A. 0082MCGWIER-WATTSCHRISTINA
0031,GOLDENBERG,DESIREE0040,WILLIAMS,ARLENE M.0071,PERRY,ROBERT A. 0082,MCGWIER-WATTS,CHRISTINA
Raw data in columns
Delimited raw data
SAS Dataset
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 9/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 10/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 11/31
• Every field in a SAS dataset have these three properties defined at the time of creation• Length of field is the number of bytes SAS allocates for storing the values of the field in
the SAS dataset. By default the length of numeric and character variables is 8 bytes.• Length of variable can be set using two ways
o Using an appropriate informat in the INPUT statemento By assigning the variable to a constant value, in which case the length is set to the
first constant value encountered for the variable
Lengths, Informats and Formats
DATA CUSTOMER; LENGTH age 3;INPUTname $15. age 4. ; … RUN;
“AGE” is allocated 3 bytes
DATA NAMES; INPUT name $; CARDS; Tony Hargis Dave Eagle ; RUN;
“NAME” is allocated 8 bytes by default
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 12/31
• Informat instructs SAS how to read a raw SAS dataset. If no informat is specified, then itsBEST12. for numeric, and $w. for character variables where „w‟ is either 8 or the length ofthe first constant value encountered
• You can specify an informato for numeric variables by using w.d where „w‟ is the total width and „d‟ is the number of
places after the decimalo for character variables by using $w. where „w‟ is the maximum number of characters
for the variable• Format is a layout specification for how a variable should be printed or displayed. Bydefault it is BEST12. for numeric and $w.d for character formats
• The format of a variable can be changed by using the FORMAT statement in the DATA step.o Overrides the default setting of length for a variable when it is created by assignment
to a character constanto
Can be used to display numbers, dates, currency, etc. in a user friendly manner
Lengths, Informats and Formats…(contd.)
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 13/31
• Create a new variable in an same or different dataset
DATA Output_Data; SET Input_Data; IF Variable_1 > 500 THEN Variable_2 = 1;ELSE Variable_2 = 0; RUN;
• Create a new variable in an same or different dataset DATA Output_Data; SET Input_Data; IF Variable_1 > 500 THEN DO; Variable_2 = 1;Variable_3 = “Sundar”; END; RUN;
• Create a different dataset based on the criterion DATA Output_Data; SET Input_Data; IF Variable_1 > 500 THEN OUTPUT Output_Data;RUN;
If… Then… Else conditions can be used effectively to executethe SAS process conditionally
Q: How will you output to more than one SAS dataset using the IF Statement?
Ne~=Eq=
Ge>=
Gt>
Le<=
MnemonicOperator
Lt<
Comparison Operators
NOT~
OR|
MnemonicOperator
AND&
Logical Operators
Multiple conditions can be specified using a combination of Logical and Comparisonoperators
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 14/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 15/31
• Common variables in the input datasets are used in the BY statement
• Datasets must be sorted by the common variable(s) prior to merging
DATA Merged_data; MERGE Input_Data_1 (in = AA) Input_Data_2 (in = BB); /* Use combinations of aa and bb to control what is written to the output dataset. */ BY <Common Variables>;RUN;
Data MERGE statement is used to combine multiple datasetsbased on values of specified common variables
Q: How will you perform a Many-to-Many merge in SAS? Q: What will happen if you don‟t use the “BY” statement while merging?
One-to-
OneMerg
e
One-to-
Many Merg
e
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 16/31
SAS Procedures
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 17/31
Base SAS Procedures
Report WritingProcedures
PRINTFREQ
MEANS SUMMARY TABULATE
PLOT SQL
StatisticalProcedures
CHART FREQ
MEANS CORR SQL
SUMMARY UNIVARIATE
UtilityProcedures
EXPORT IMPORT APPEND
CONTENTS DATASETS
SORT TRANSPOSE
• SAS Procedure or a PROC step always starts with a the word PROC • Some commonly used Base SAS procedures are listed below
SAS Procedures – An Introduction
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 18/31
• Most SAS procedures require an input dataset which is specified using the DATA= option• VAR specifies the variables on which the procedure is applicable. If no variables are
specified, then SAS will automatically apply the procedure on all the variables.• WHERE allows usage of a particular filter criteria on the procedure.• “Sales ” is used to refer the SAS dataset to elaborate any SAS procedure going forward • Only a few most frequently used SAS procedures are covered in the training• Further, not all options available on the SAS procedure is covered in the training
A typical SAS procedure has a few key words that are a part ofthe syntax
PROC <PROCEDURE NAME > DATA = <DSN Name> OPTIONS; • BY <Variable List>; • CLASS <Variable List>; • VAR ; • WHERE ;
RUN ;
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 19/31
SAS Procedure – PROC CONTENTS – displays the structureof the dataset
PROC CONTENTS DATA = SALES OUT = VAR_LIST VARNUM; RUN;
Name of the input data set
Option lists all the variables inthe same order as present in
the data set
Name of the output data setwill contain the list of thevariables with their formats
Q: What is the output if you don‟t use the option “varnum”
# of Observations in thedataset
List of variables with their
Type, format, length and Label
Informs if the dataset has beensorted or not
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 20/31
SAS Procedure – PROC PRINT – prints the observations of aSAS dataset in the SAS Output window
PROC PRINT DATA = SALES (FIRSTOBS = X OBS = Y); VAR <Variable List>; WHERE <Condition>; RUN;
Name of the input data set
Option: Print sampleobservations satisfying the
criteria
Option: To print only samplerecords from row # “X” to row
# ”Y” of the SAS dataset
Q: What will the syntax if you want to print last 10 observations in the output
Option: To print only selectedvariables from the SAS dataset
Output of Proc Print Procedure
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 21/31
• The DATASETS procedure helps too Copy or append SAS files from one library to another
o Rename, repair or delete SAS fileso List the SAS files that are contained in a SAS libraryo Create or delete indexes
SAS Procedure – PROC DATASETS – is a utility procedurethat manages SAS files
PROC DATASETS MEMTYPE = DATA LIB = WORK NOLIST;APPEND BASE = DATA = ; CHANGE old_name = new_name ; COPY IN = l ibref-1 OUT = l ibref-2 ; SELECT sas_fi les; DELETE sas_fi les ; RUN;
Specifies the kind of files to process
Specifies the library
Option does not print any kind of outputin the SAS output window
Specifies the dataset to be renamed
Specifies the library to copy SAS datasets
Only specified datasets will be copied
Specifies SAS datasets to be deleted
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 22/31
• SORT procedure can be used either to modify the original dataset or create a new sorteddataset
• SAS, by default, sorts the datasets in an ascending order, unless specified otherwise• Variables should be mentioned in the same order as sorting is required
• Using NODUPKEY option without using the OUT = statement may destroy your originaldataset and duplicate might not be available for any future analysis
SAS Procedure – PROC SORT – orders the SAS datasetobservations by the values of one or more variables
PROC SORT DATA = SALES OUT = <DSN Name>NODUPKEY DUPOUT = <DSN Name>; BY <Variable List>; WHERE <Condition>; RUN;
Name of the input data set
Option: Remove duplicates
Option: To print only samplerecords from row # “X” to row
# ”Y” of the SAS dataset
Q: What option will be used if you want to remove duplicate records, when
duplicates are to be identified using all the variables in the SAS dataset
Option: Store only duplicatesin a separate SAS dataset
Option: Sort values byone or more variables
Option: FilterCriteria
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 23/31
• For two-way tables, PROC FREQ can compute
tests and measures of association
SAS Procedure – PROC FREQ – produces one-way to n-wayfrequency and cross-tabulation (contingency) tables
PROC FREQ DATA = SALES; WEIGHT <Weight Variable>; TABLES <Variable List> /MISSING NOROW NOCOL NOPERCENT ALL; WHERE <Condition>; RUN;
Option: Give different weight to the
observations
Q: What option would you use to generate three way tables? Q: How would you output the results of Freq procedure to a SAS dataset?
Option:Missing: Treats missing values as aseparate observation Norow: Removes row percentages Nocol: Removes column percentages Nopercent: Removes cell percentage
Sample Proc Freq Procedure Output Output with Statistical Test
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 24/31
• PROC MEANS also computeso Descriptive statistics based on moments and quantileso Calculates confidence interval for meanso Performs t – test
SAS Procedure – PROC MEANS – produces summarystatistics
PROC MEANS DATA = SALES; CLASS <Variable List>; VAR <Variable List>; OUTPUT OUT = <DSN Name> <Summary Procedure>; RUN;
Example:SUM(Variable) = New Variable 1 MEAN(Variable) = New Variable 2 MIN(Variable) = New Variable 3 MAX(Variable) = New Variable 4
Sample Proc Means Procedure Output
Q: Which SAS dataset will contain the results without the use of “output out =“ option Q: What SAS default variables which will be created in the output SAS dataset?
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 25/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 26/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 27/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 28/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 29/31
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 30/31
1. Check the SAS log after every step of data processing
2. Use “Proc Print” post every data processing3. Ensure that meaningful names are given to SAS variables and SAS datasets4. Comment your code – Use them judiciously to indicate the purpose of SAS processing
and for future reference as well5. Indent the code so that its easy to read6. Use a “BY” while merging SAS datasets. Check the # of observations pre and post data
merge7. Use “else if” when using “if” statements recursively in a Data Step 8. Be careful of Divide by Zero errors during Data Step processing!9. Run your code on Sample 10 observations before running it on entire SAS Dataset10. Have your code audited and verified by some one else to confirm there are no logical
issues11. Check the processing for issues after creation of a new variable12. Be careful of missing values while processing13. Pay extra attention while reading external files into SAS. There are separate list of audit
checks to be followed to ensure there are no issues
List of audit checks to keep in mind while working with SAS fordata processing and analysis
ADD THE WORD “ALWAYS” IN FRONT OF EACH STATEMENT
8/13/2019 MS Introduction to SAS Training
http://slidepdf.com/reader/full/ms-introduction-to-sas-training 31/31
• Other than the in built SAS help, there are many websites which provides assistanceo Website 1o http://v8doc.sas.com/sashtml/ o Website 2o http://www.ats.ucla.edu/stat/sas/ o Website 3o www.google.com
SAS Help
IF NONE OF THESE HELP, ASK YOUR COLLEAGUE !!