BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting
Readings – Chapter 14
In-Class Practice - 14.1, 14.4, 14.5, 14.6, 14.11
Summer 2008 BMTRY 789 Introduction to SAS Programming 2
You have to have Goals …
Summer 2008 BMTRY 789 Introduction to SAS Programming 3
Vertical & Flat File Definitions
A vertical file is a file that has multiple observations per identifier (i.e. patient)
A flat file is a file that has one observation per identifier.
Subj Age Date HR BP 001 26 07/23/98 58 12080 [Vertical]001 27 02/14/99 62 11060…Subj Age Date1 HR1 BP1 Date2 HR2 BP2001 26 07/23/98 58 12080 02/14/99 62 11060 [Flat]002 25 02/28/00 65 11070 09/01/01 65 12070…
Summer 2008 BMTRY 789 Introduction to SAS Programming 4
Data Manipulation Why do you spend 85% of your
programming time doing data manipulation? You get data one way, but it needs to be
another way in order to perform the analysis.
Will need to do things like, subset, merge, rename variables, create new variables, etc. in order to structure the data into the format that you need it to be.
Summer 2008 BMTRY 789 Introduction to SAS Programming 5
Conditional Logic [If-Then-Else]
Frequently, you want an assignment statement to apply to some observations but not all - under some conditions, but not others.
1) IF condition THEN action;2) IF condition THEN action;
ELSE IF condition THEN action;ELSE IF condition THEN action;
3) IF condition THEN action;ELSE IF condition THEN action;ELSE action;
4) IF condition THEN DO;Perform a set of actions;
END;
Summer 2008 BMTRY 789 Introduction to SAS Programming 6
IF-THEN-ELSE Rules A single IF-THEN statement can only have one
action. If you add the keywords DO and END, then you can execute more than one action.
You can also specify multiple conditions with the keywords AND and OR
The IN operator also makes comparisons, IN compares the value of a variable to a list of values.
*Remember SAS considers missing values to be smaller than non-missing values.
Summer 2008 BMTRY 789 Introduction to SAS Programming 7
Comparison Operators These operators can be coded using Symbols or
Mnemonics.Symbol Mnemonic Meaning= EQ Equals~= NE Not Equal> GT Greater Than< LT Less Than>= GE Greater than or Equal<= LE Less than or Equal& AND All comparisons must be true| OR Only one comparison must be true
Summer 2008 BMTRY 789 Introduction to SAS Programming 8
Subsetting Often programmers find that they want
to use some of the observations in a data set and exclude the rest. The most common way to do this is with a subsetting IF statement in a DATA step.
Syntax: IF expression; Ex: IF Sex = ‘f’;
Summer 2008 BMTRY 789 Introduction to SAS Programming 9
Computing a Difference Between first and last observationsProc Sort Data= Annie.PATIENTS;
By id Date;Run;Data First_Last;
Set Annie.PATIENTS;By Id;Format Date mmddyy10.;Retain First_HR First_SBP First_DBP;If First.id and Last.id then delete;***Purpose of code: if it is the first visit assign values to the first visit variables, same for last visit;If First.ID Then Do;
First_HR = HR;First_SBP = SBP;First_DBP = DBP;
End;If Last.ID Then Do;
Last_HR = HR;Last_SBP = SBP;Last_DBP = DBP;Diff_HR = Last_HR - First_HR;Diff_SBP = Last_SBP - First_SBP;Diff_DBP = Last_DBP - First_DBP;
Output;End;
Run;
Summer 2008 BMTRY 789 Introduction to SAS Programming 10
Subsetting (cont.) If the expression is true, then SAS continues with
the DATA step. If the expression is false, then no further statements are processed for that observation; that observation is not added to the data set being created; and SAS moves to the next observation.
While the subsetting IF statement tells SAS which observations to include, the DELETE statement tells SAS which observations to exclude:IF expression THEN DELETE;IF Sex = ‘f’ THEN DELETE;
Summer 2008 BMTRY 789 Introduction to SAS Programming 11
Stacking Data Sets Used to combine data sets with all or most of
the same variables but different observations. The order of observations is determined by the
order of the list of old data sets If one of the data sets has a variable not
contained in the other data sets, then the observations from the other data sets will have missing values for that variable.DATA new-data-set;
SET old-data-set1 old-data-set2;RUN;
Summer 2008 BMTRY 789 Introduction to SAS Programming 12
Interleaving Data Sets If your data sets are already sorted, combine
data sets while preserving that order. PROC SORT first!
If one of the data sets has a variable not contained in the other data sets, then the observations from the other data sets will have missing values for that variable.DATA new-data-set;
SET old-data-set1 old-data-set2;BY variable-list;
RUN;
Summer 2008 BMTRY 789 Introduction to SAS Programming 13
Combining Data Sets Using One-to-One Match Merge
When you have two data sets with related data and you want to combine them.
If you merge two data sets, and they have variables with the same names – besides the BY variables, then variables from the second data set will overwrite any variables having the same names in the first data set.
All observations from old data sets will be included in the new data set whether they have a match or not.
Summer 2008 BMTRY 789 Introduction to SAS Programming 14
Match Merge ExampleProc Sort Data = Rat;
BY RatID Date;Run;Proc Sort Data = Rat2;
BY RatID Date;Run;DATA BigRat;
MERGE Rat Rat2;BY RatID Date;
Run;
Summer 2008 BMTRY 789 Introduction to SAS Programming 15
Tracking and Selecting Obs with the IN=Option
This is most often used with MERGE IN=varName is temporary, existing only
during the current DATA step SAS gives the IN=varName variable a value
of ‘0’ if the data set did not contribute to the current observation and a value of ‘1’ if the data set did contribute
Summer 2008 BMTRY 789 Introduction to SAS Programming 16
IN=varName ExampleProc Sort Data = Worker;
By Year;Run;Proc Sort Data = Exp;
By Year;Run;Data WorkExp;
Merge Worker (IN=InWork) Exp;By Year;
If InWork;Run;