Get the Scoop on the Loop How Best to Write a Loop in the
DATA Step
Arthur LiDepartment of Information Science
City of Hope Comprehensive Cancer Center Duarte, CA
INTRODUCTION
Loops: execute one or a group of statements repetitively until it reaches a predefined condition
For SAS, there are implicit and explicit loops
Sometimes programmers can’t distinguish clearly between the two different loops
Knowing when the situation calls for creating an explicit loop is one of a programmer’s challenges
COMPILATION AND EXECUTION PHASES
Compilation Phase
Execution phase
If there is no syntax error
A DATA step is processed in two-phase sequences:
Each statement is scanned for syntax errors
PDV is created according to the descriptor portion of the input dataset
SAS uses the PDV to build the new dataset
IMPLICIT LOOP
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
During the execution phase, the DATA step works like a loop – an implicit loop
It repetitively executes statements reads data values creates observations in the PDV one at a time
Each loop is called an iteration Suppose you have the following dataset that contains
patient IDs for a clinical trial
You would like to assign each patient with either a drug or a placebo (50% chance of either/or)
IMPLICIT LOOP
The RANUNI function
RANUNI (SEED)
It generates a number ~ Uniform(0, 1) e.g. 0.13567, 0.34567, 0.56789, etc
SEED is a nonnegative integerThe RANUNI function generates a stream of numbers
based on SEEDWhen SEED is set to 0, the generated number cannot
be reproducedwhen SEED is a non-zero number, the generated
number can be produced
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:
COMPILATION:
Check for Syntax Error
PDV is Created
Automatic variables:_N_ = 1: 1st observation is being processed_N_ = 2: 2nd observation is being processed
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:
COMPILATION:
Check for Syntax Error
PDV is Created
Automatic variables:_ERROR_ = 1: signals the data error of the currently-processed observation
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:
Variable exists in the INPUT dataset
SAS sets each variable to missing in the PDV only before the 1st iteration of the execution
Variables will retain their values in the PDV until they are replaced by the new values
COMPILATION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:
Variables being created in the DATA step
SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution
COMPILATION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:
COMPILATION:
D = dropped
K = kept
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
1st iteration:_N_ 1_ERROR_ 0The rest of variables are set to missing
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 .PDV:
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
1st iteration:
The SET statement copies the 1st observation PDV
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 .PDV:
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
1st iteration: RANNUM is generated
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 0.36993PDV:
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
1st iteration: GROUP ‘P’ since RANNUM is not > 0.5
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 0.36993 PPDV:
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
1st iteration:
The implicit OUTPUT statement writes the variables marked with (K) to the final dataset
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 M2390 0.36993 PPDV:
Trial1:ID GROUP
1 M2390 P
EXECUTION:
REVIEW: OUTPUT Statement
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
Explicit OUTPUT
The explicit OUTPUT statement:
Writes the current observation from the PDV to a SAS dataset immediately
Not at the end of the DATA step
REVIEW: OUTPUT Statement
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
Implicit OUTPUT
The implicit OUTPUT statement:
Without explicit OUTPUT statements, every DATA step contains an implicit OUTPUT statement at the end of the DATA step
It tells SAS to write observations to the dataset at the end of the DATA step
REVIEW: OUTPUT Statement
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
Placing an explicit OUTPUT
Override the implicit OUTPUT
SAS adds an observation to a dataset only when an explicit OUTPUT is executed
We can use more than one OUTPUT statement in the DATA step
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
2nd iteration:
_N_ ↑2ID is retained since ID is from input datasetGROUP and RANNUM are set to missing
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
2 0 M2390 .PDV:
Trial1:ID GROUP
1 M2390 P
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
2nd iteration:The SET statement copies the 2nd observation PDV
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
2 0 M2390 .PDV:
Trial1:ID GROUP
1 M2390 P
Skip a few iterations….
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
The end of 4th iteration:The implicit OUTPUT statement writes the variables
marked with K to the final datasetSAS returns to the beginning of the DATA step
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
4 0 M1240 0.51880 DPDV:
Trial1:ID GROUP
1 M2390 P
2 F2390 D
3 F2340 D
4 M1240 D
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
5th iteration:_N_ ↑5ID is retained GROUP and RANNUM are set to missing
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
5 0 M1240 .PDV:
Trial1:ID GROUP
1 M2390 P
2 F2390 D
3 F2340 D
4 M1240 D
EXECUTION:
IMPLICIT LOOP
data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;
5th iteration:SAS reaches the end-of-file-marker, which means that
there are no more observations to readThe execution phase is completed, goes to next
DATA/PROC step
Patient:ID
1 M2390
2 F2390
3 F2340
4 M1240
_N_ D _ERROR_ D ID K RANNUM D GROUP K
5 0 M1240 .PDV:
Trial1:ID GROUP
1 M2390 P
2 F2390 D
3 F2340 D
4 M1240 DEnd-of-file marker
EXECUTION:
EXPLICIT LOOP
Suppose you don’t have a dataset containing the patient IDs
You are asked to assign four patients, ‘M2390’, ‘F2390’, ‘F2340’, ‘M1240’, with a 50% chance of receiving either the drug or the placebo
You can create the ID and assign each ID to a group in the DATA step at the same time. For example
EXPLICIT LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
Assigning IDs in the DATA step
EXPLICIT LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
4 explicit OUTPUT statements
EXPLICIT LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
4 almost identical blocks
Put identical codes in a loop
Loop along the IDs
Reduce amount of coding
ITERATIVE DO LOOP
General form for an iterative DO loop
DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;
INDEX-VARIABLE: contains the value of the current iteration
The loop will execute along VALUE1 through VALUEN
The VALUES can be either character or numeric
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; ...
id = 'F2340'; ...
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;
INDEX-VARIABLE: IDVALUE1 – VALUEN: 'M2390’, 'F2390’, 'F2340’, 'M1240'SAS STATEMENTS:
rannum = ranuni(2);if rannum> 0.5 then group = 'D';else group ='P';output;
ITERATIVE DO LOOP
data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;
id = 'F2390'; ...
id = 'F2340'; ...
id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;
DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;
data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
ITERATIVE DO LOOP
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
Usually we use the iterative DO loop and loop along a sequence of integers
The loop will execute from the START to the STOP value
ITERATIVE DO LOOP
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
Usually we use the iterative DO loop and loop along a sequence of integers
The optional BY clause specifies an increment between START and STOP
The default value for INCREMENT is 1
ITERATIVE DO LOOP
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
Usually we use the iterative DO loop and loop along a sequence of integers
START, STOP, and INCREMENTNumbersVariablesSAS expressions
These values are set upon entry into the DO loop and cannot be modified during the processing of the DO loop
ITERATIVE DO LOOP
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
Usually we use the iterative DO loop and loop along a sequence of integers
INDEX-VARIABLE can be changed within the loop
ITERATIVE DO LOOP
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
Suppose you are using a sequence of numbers, say 1 to 4, as patient IDs
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
INDEX-VARIABLE: IDSTART: 1STOP: 4INCREMENT: 1
ITERATIVE DO LOOP
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
Since we didn’t read an input dataset, there will be only one iteration for the DATA step
_N_ will be 1 for the entire execution phase
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 . .PDV:
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
ID 1
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 .PDV:
1st Iteration of DO loop:
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
RANNUM is generated
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993PDV:
1st Iteration of DO loop:
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
GROUP ‘P’ since RANNUM is not > 0.5
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
1st Iteration of DO loop:
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
The OUTPUT statement instructs SAS to write observations to the output dataset
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
1st Iteration of DO loop:
ID GROUP
1 1 P
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
SAS reaches the end of DO loop
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
1st Iteration of DO loop:
ID GROUP
1 1 P
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
ID ↑ 2; since 2 ≤ 4, the 2nd iteration continues
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.36993 PPDV:
2nd Iteration of DO loop:
ID GROUP
1 1 P
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
RANNUM is generated
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.94018 PPDV:
2nd Iteration of DO loop:
ID GROUP
1 1 P
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
GROUP ‘D’ since RANNUM > 0.5
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.94018 DPDV:
2nd Iteration of DO loop:
ID GROUP
1 1 P
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
The OUTPUT statement instructs SAS to write observations to the output dataset
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.94018 DPDV:
2nd Iteration of DO loop:
ID GROUP
1 1 P
2 2 D
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
Let’s skip two iterations
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.94018 DPDV:
ID GROUP
1 1 P
2 2 D
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
SAS reaches the end of the DO loop of the 4th iteration
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 4 0.51880 DPDV:
4th Iteration of DO loop:
ID GROUP
1 1 P
2 2 D
3 3 D
4 4 D
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
ID ↑5; since 5 is > 4, the loop ends
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 5 0.51880 DPDV:
5th iteration of DO loop:
ID GROUP
1 1 P
2 2 D
3 3 D
4 4 D
ITERATIVE DO LOOP: EXECUTION PHASE
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;
There will be no implicit OUTPUT statementSince we didn’t read an input dataset, the DATA step
execution ends
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 5 0.51880 DPDV:
ID GROUP
1 1 P
2 2 D
3 3 D
4 4 D
EXECUTING LOOPS CONDITIONALLY
Using an iterative DO loop requires specifying the number of iterations for the DO loop
Sometimes you will need to execute statements repetitively until a condition is met
In this situation, you need to use either the DO WHILE or DO UNTIL statements
DO WHILE
DO WHILE (EXPRESSION);SAS STATEMENTSEND;
EXPRESSION is evaluated at the top of the DO loopThe DO loop will not execute if the EXPRESSION is false
DO WHILE
DO WHILE (EXPRESSION);SAS STATEMENTSEND;
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
Iterative DO loop: DO WHILE loop:
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
_N_ 1, _ERROR_ 0ID 0 because of the SUM statementThe rest of the variables are set to missing
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 0 .PDV:
At the beginning of the execution phase:
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
Since ID < 4, loop continues
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 0 .PDV:
1st iteration of the DO WHILE loop:
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
ID 1
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 .PDV:
1st iteration of the DO WHILE loop:
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
RANNUM is generated
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993PDV:
1st iteration of the DO WHILE loop:
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
GROUP ‘P’
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
1st iteration of the DO WHILE loop:
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
The OUTPUT statement instructs SAS to write observations to the output dataset
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
1st iteration of the DO WHILE loop:
ID GROUP
1 1 P
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
SAS reaches the end of DO loop
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
1st iteration of the DO WHILE loop:
ID GROUP
1 1 P
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
Since ID < 4, the loop continues
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 1 0.36993 PPDV:
2nd iteration of the DO WHILE loop:
ID GROUP
1 1 P
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
ID 2
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.36993 PPDV:
2nd iteration of the DO WHILE loop:
ID GROUP
1 1 P
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 2 0.36993 PPDV:
ID GROUP
1 1 P
Let’s skip a few iterations
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 4 0.51880 DPDV:
At the end of the 4th iteration:
ID GROUP
1 1 P
2 2 D
3 3 D
4 4 D
Here’s the contents of the PDV at the end of the 4th loop
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 4 0.51880 DPDV:
5th iteration:
ID GROUP
1 1 P
2 2 D
3 3 D
4 4 D
Now ID is not < 4, loop stops
DO WHILE
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
_N_ D _ERROR_ D ID K RANNUM D GROUP K
1 0 4 0.51880 DPDV:
5th iteration:
ID GROUP
1 1 P
2 2 D
3 3 D
4 4 D
The execution phase ends
DO UNTIL
Unlike DO WHILE loops, the DO UNTIL loop evaluates the condition at the end of the loop
The DO UNTIL loop will not continue for another iteration if the EXPRESSION is evaluated to be TRUE at the end of the current loop
That means the DO UNTIL loop always executes at least once
DO UNTIL (EXPRESSION);SAS STATEMENTSEND;
DO UNTIL
DO UNTIL (EXPRESSION);SAS STATEMENTSEND;
data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;
Iterative DO loop:
DO WHILE loop:data trial5 (drop=rannum); do until (id >=4); id +1; rannum = ranuni(2); if rannum > 0.5 then group ='D'; else group ='P'; output; end;run;
DO UNTIL loop:
Will not continue if the EXPRESSION is false
Will not continue for another iteration if the EXPRESSION is true
NESTED LOOPS
Suppose that you would like to assign 12 patients with either a drug or a placebo
These 12 subjects are from 3 cancer centers (“COH”, “UCLA”, and “USC”) with 4 subjects per center
data trial6; length center $4; do center = "COH", "UCLA", "USC"; do id = 1 to 4; if ranuni(2) > 0.5 then group = 'D'; else group ='P'; output; end; end;run;
Outer loop
Inner loop
NESTED LOOPS
Suppose that you would like to assign 12 patients with either a drug or a placebo
These 12 subjects are from 3 cancer centers (“COH”, “UCLA”, and “USC”) with 4 subjects per center
Obs center id group 1 COH 1 P 2 COH 2 D 3 COH 3 D 4 COH 4 D 5 UCLA 1 D 6 UCLA 2 D 7 UCLA 3 P 8 UCLA 4 P 9 USC 1 P 10 USC 2 P 11 USC 3 D 12 USC 4 P
COMBINING IMPLICIT AND EXPLICIT LOOPS
In previous program, all the observations were created from one DATA step since we didn’t read any input data
Suppose the values for CENTER is stored in a SAS dataset
For each center, you need to assign 4 patients with either a drug or a placebo
data trial7; set cancer_center; do id = 1 to 4; if ranuni(2)> 0.5 then group = 'D'; else group ='P'; output; end;run;
CENTER
1 COH
2 UCLA
3 USC
explicit loop
DATA step: implicit loop
UTILIZING LOOPS TO CREATE SAMPLESDIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
When reading a SAS dataset, by default, SAS reads the dataset sequentially SAS reads one observation for
each iteration of the DATA step This process will stop once it
reaches the end-of-file marker
sequentially
The end-of-file marker
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
SAS can also access an observation directly via direct-access mode
Direct Access
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement
SET SAS-DATA-SET POINT = VARIABLE;
Temporary variable, not outputtedSet to 0 in the PDV at the very
beginning of the DATA step
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement
SET SAS-DATA-SET POINT = VARIABLE;
VARIABLE must be assigned to an observation number before the SET statement
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement
SET SAS-DATA-SET POINT = VARIABLE;
For example, to select the 5th observation…data sample1;
obs_n = 5; set sbp point= obs_n;
run;
Sbp:
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step2: Use the STOP statementWhen using direct-access mode, SAS
will not be able to detect the end-of-file marker
data sample1; obs_n = 5; set sbp point= obs_n;
run;
Sbp:
The end-of-file marker
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step2: Use the STOP statementWhen using direct-access mode, SAS
will not be able to detect the end-of-file marker
Without telling SAS explicitly when to stop processing, it will cause infinite looping
STOP;
data sample1; obs_n = 5; set sbp point= obs_n;
stop;run;
Sbp:
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step3: Use the OUTPUT statement
data sample1; obs_n = 5; set sbp point= obs_n;
stop;run;
Sbp:
Implicit output
Recall:If there is no explicit OUTPUT, SAS writes the observations to the output data at the end of the DATA step
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step3: Use the OUTPUT statement
data sample1; obs_n = 5; set sbp point= obs_n;
stop;run;
Sbp:
Implicit output
DATA step processing stop
DATA step processing stops BEFORE the end of the DATA step Implicit OUTPUT will not be reached!
DIRECT ACCESS MODE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
There are 3 important components for using the direct-access mode
Step3: Use the OUTPUT statement
data sample1; obs_n = 5; set sbp point= obs_n; output; stop;run;
Sbp:
Add the OUTPUT statement before the STOP
CREATING A SYSTEMATIC SAMPLE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
Select every 3rd observation
ID SBP
1 01 145
2 04 106
3 07 127
A systematic sample is created by selecting every kth observation from an original dataset
CREATING A SYSTEMATIC SAMPLE
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
The systematic sample cannot be created sequentially - A direct-access mode must be used
You can create a systematic sample by using an iterative DO loop
DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;
1total # of obs.
k - every kth obs.
CREATING A SYSTEMATIC SAMPLE
To find out the total number of observations, use the NOBS = option in the SET statement
SET SAS-DATA-SET NOBS = VARIABLE;
A temporary variable that contains the # of observations of the SAS-DATA-SET
It will not be outputted to the final dataset It is created automatically based on the
descriptor portion of the SAS-DATA-SET during the compilation phase
It will retain its value throughout the execution phase
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 0 9 .PDV:
At the beginning of the execution phase:_N_ 1_N_ will be 1 throughout the execution phase because
SAS didn’t read the input data sequentially
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
_ERROR_ is not shown for simplicity
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 0 9 .PDV:
At the beginning of the execution phase:CHOOSE 0TOTAL 9, based on the descriptor portion of SbpThe rest of variables missing
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 1 9 .PDV:
1st iteration of the DO loop:CHOOSE 1
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 1 9 01 145PDV:
1st iteration of the DO loop:SAS reads the 1st observation via direct-access mode
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 1 9 01 145PDV:
1st iteration of the DO loop:The OUTPUT statement instructs SAS to write the
contents from PDV to Sample2
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 1 9 01 145PDV:
1st iteration of the DO loop:SAS reaches the end of 1st iteration
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 4 9 01 145PDV:
2nd iteration of the DO loop:CHOOSE ↑4Since 4 ≤ TOTAL (9), the 2nd iteration continues
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 4 9 04 106PDV:
2nd iteration of the DO loop:SAS reads the 4th observation via direct-access mode
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 4 9 04 106PDV:
2nd iteration of the DO loop:The OUTPUT statement instructs SAS to write the
contents from PDV to Sample2
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 4 9 04 106PDV:
2nd iteration of the DO loop:SAS reaches the end of 2nd iteration
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 7 9 04 106PDV:
3rd iteration of the DO loop:CHOOSE ↑7Since 7 ≤ TOTAL (9), the 3rd iteration continues
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 7 9 07 127PDV:
3rd iteration of the DO loop:SAS reads the 7th observation via direct-access mode
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 7 9 07 127PDV:
3rd iteration of the DO loop:The OUTPUT statement instructs SAS to write the
contents from PDV to Sample2
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
3 07 127
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 7 9 07 127PDV:
3rd iteration of the DO loop:SAS reaches the end of 3rd iteration
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
3 07 127
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 10 9 07 127PDV:
4th iteration of the DO loop:CHOOSE ↑10Since 10 > TOTAL (9), the loop ends
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
3 07 127
Sample2:
CREATING A SYSTEMATIC SAMPLE
_N_ D CHOOSE D TOTAL D ID K SBP K
1 10 9 07 127PDV:
The STOP statement stops the DATA step processing
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
ID SBP
1 01 145
2 04 106
3 07 127
Sample2:
CREATING A RANDOM SAMPLE WITH REPLACEMENT
A random sample – a sample is created from an original dataset on a random basis
A random sample with replacement An observation is replaced back into the original
dataset after it is chosen Any observations can be chosen more than once
data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 151
6 06 112
7 07 127
8 08 119
9 09 113
Systematic sample
CHOOSE is incremented by k to create a systematic sample
To create a random sample, we need to generate a random integer between 1 and total # of observations
CREATING A RANDOM SAMPLE WITH REPLACEMENT
data sample3 (drop= i); do i =1 to 3; choose = ceil(ranuni(5)*total); set sbp point=choose nobs=total; output; end; stop;run;
How to generate a random integer between 1 and total # of observations ?
RANUNI(SEED) A randomly generated real number (0,1)
N Total number of observations
RANUNI(SEED)*N A real number (0, N)
CEIL(RANUNI(SEED)*N) An integer [1, N]
CREATING A RANDOM SAMPLE WITH REPLACEMENT
CREATING A RANDOM SAMPLE WITHOUT REPLACEMENT
SELF STUDY!
UTILIZING LOOPS TO READ A LIST OF EXTERNAL FILESTHE INFILE STATEMENT WITH THE END= OPTION
To read an external file, you can use the INFILE statement
For example, text1.txt, is located in “C:\”,
text1.txt:
01 14502 119
data example13; infile "C:\text1.txt"; input id $ sbp;run;
2 observations SAS will use 2 DATA step iterations to read the data
Like a SAS dataset, the external file also contains an end-of-file marker
When SAS reaches the end-of-file marker, it stops reading
End-of-file marker
THE INFILE STATEMENT WITH THE END= OPTION
When reading a SAS dataset …
Input dataset:ID
1 M2390
2 F2390
3 F2340
4 M1240
ID GROUP
1 M2390 P
2 F2390 D
3 F2340 D
4 M1240 D
_N_ D _ERROR_ D ID K RANNUM D GROUP K
4 0 M1240 0.51880 D
Output dataset:
PDV:
THE INFILE STATEMENT WITH THE END= OPTION
When reading a raw dataset …
Input dataset:
ID SBP
1 01 145
2 02 119
_N_ D _ERROR_ D ID K SBP K
2 0 02 119
Output dataset:
PDV:
01 14502 119
1 2 3 4 5 6 …
0 2 1 1 9 …Input buffer:Used to hold raw data
THE INFILE STATEMENT WITH THE END= OPTION
You can use an explicit loop to read the external fileTo construct an explicit loop, you need to specify
the number of iterations for the iterative DO loop or a condition for the DO WHILE /DO UNTIL loops
One way to specify a condition is by telling SAS to read the observations until it reads the last record
To identify the last record, use the END = option in the INFILE statement
INFILE FILE-SPECIFICATION END = VARIABLE;
The VARIABLE is set to 1 when SAS reads the last record of the external file; otherwise it sets to 0
THE INFILE STATEMENT WITH THE END= OPTION
The following program uses the DO UNTIL loop to read the external filedata example14; infile "C:\text1.txt" end = last; do until (last = 1); input id $ sbp; output; end;run;
There’s only one DATA step iterationWithin this iteration, the DO UNTIL loop iterates twice to
read the two observations in text1.txt.
THE INFILE STATEMENT WITH THE FILEVAR = OPTION
Generally, you specify the name and the location of the external file immediately in the INFILE statement
Alternatively, you can use the FILEVAR = option in the INFILE statement to read an external file that is specified by the FILEVAR = option
infile "C:\text1.txt";
INFILE FILE-SPECIFICATION FILEVAR = VARIABLE
VARIABLE contains the name of the external file
must be created before the INFILE statement
A placeholder, not an actual filename
THE INFILE STATEMENT WITH THE FILEVAR = OPTION
For example,
data example15; filename = "C:\text1.txt"; infile dummy filevar = filename; input id $ sbp;run;
167 data example14;168 filename = "C:\text1.txt";169 infile dummy filevar = filename;170 input id $ sbp;171 run;NOTE: The infile DUMMY is: File Name=C:\text1.txt, RECFM=V,LRECL=256NOTE: 2 records were read from the infile DUMMY. The minimum record length was 6. The maximum record length was 6.NOTE: The data set WORK.EXAMPLE13 has 2 observations and 2 variables.
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
text2.txt: 03 12604 106
text3.txt: 05 14006 118
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 140
6 06 118
read
concatenate
Identical Format:
You can read them all by using the FILEVAR = option in the INFILE statement in one single DATA step
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
text2.txt: 03 12604 106
text3.txt: 05 14006 118
ID SBP
1 01 145
2 02 119
3 03 126
4 04 106
5 05 140
6 06 118
read
concatenate
Identical Format:
The FILEVAR = option will cause the INFILE statement to close the current input file and open a new one which is the FILEVAR = option
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
data example15 ;
filename = "C:\text1.txt";
infile dummy filevar = filename; input id $ sbp; run;
text2.txt: 03 12604 106
text3.txt: 05 14006 118
These three statements need to be placed inside a loop
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
data example15 ;
filename = "C:\text1.txt";
infile dummy filevar = filename; input id $ sbp; run;
text2.txt: 03 12604 106
text3.txt: 05 14006 118
The names of the external files suggest that you create an iterative DO loop and iterate between 1 and 3
do i = 1 to 3;
end;
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
data example15 ;
filename = "C:\text1.txt";
infile dummy filevar = filename; input id $ sbp; run;
text2.txt: 03 12604 106
text3.txt: 05 14006 118
Modify the FILENAME statement by using the the || operator
do i = 1 to 3;
end;
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
data example15 ;
filename = "C:\text1.txt";
infile dummy filevar = filename; input id $ sbp; run;
text2.txt: 03 12604 106
text3.txt: 05 14006 118
do i = 1 to 3;
end;
filename = "C:\text" || put(i, 1.) || ".txt";
filename = "C:\text" || put(i, 1.) || ".txt";
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
data example15 ;
filename = "C:\text1.txt";
infile dummy filevar = filename; input id $ sbp; run;
text2.txt: 03 12604 106
text3.txt: 05 14006 118
do i = 1 to 3;
end;
filename = "C:\text" || put(i, 1.) || ".txt";
Add the OUTPUT statement within the loop
output;
READING MULTIPLE EXTERNAL FILES
text1.txt: 01 14502 119
data example15 ;
filename = "C:\text1.txt";
infile dummy filevar = filename; input id $ sbp; run;
text2.txt: 03 12604 106
text3.txt: 05 14006 118
do i = 1 to 3;
end;
filename = "C:\text" || put(i, 1.) || ".txt";
FILEVAR = option controls closing the current input file and opening a new file; SAS will not be able to detect the end-of-file marker
Place a STOP statement outside the loop
output;
stop;
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ 1Other variables missing
_N_ D I D FILENAME D ID K SBP K
1 . .
At the beginning of the DATA step:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
I 1
_N_ D I D FILENAME D ID K SBP K
1 1 .
1st iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
FILENAME C:\text1.txt
_N_ D I D FILENAME D ID K SBP K
1 1 C:\text1.txt .
1st iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
INFILE reads:1st data line from ‘text1.txt’ input buffer
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 .
1st iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
INPUT reads data values: input buffer PDV
_N_ D I D FILENAME D ID K SBP K
1 1 C:\text1.txt 01 145
1st iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
OUTPUT tells SAS to write observations: PDV output dataset
_N_ D I D FILENAME D ID K SBP K
1 1 C:\text1.txt 01 145
1st iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
SAS reaches the end of the DO loop
_N_ D I D FILENAME D ID K SBP K
1 1 C:\text1.txt 01 145
1st iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
I is incremented to 2
_N_ D I D FILENAME D ID K SBP K
1 2 C:\text1.txt 01 145
2nd iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
FILENAME C:\text2.txt
_N_ D I D FILENAME D ID K SBP K
1 2 C:\text2.txt 01 145
2nd iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
INFILE reads:1st data line from ‘text2.txt’ input buffer
_N_ D I D FILENAME D ID K SBP K
1 2 C:\text2.txt 01 145
2nd iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
1 2 3 4 5 6 …
0 3 1 2 6 …
Input buffer:
ID SBP
1 01 145
???
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
Why? Once one iteration of the DO loop has completed, the following iteration starts to read a new file that is specified by the FILENAME variable
_N_ D I D FILENAME D ID K SBP K
1 2 C:\text2.txt 01 145
2nd iteration of the DO loop:
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
1 2 3 4 5 6 …
0 3 1 2 6 …
Input buffer:
ID SBP
1 01 145
???
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 . 0 .
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
_N_ 1LAST 0Other variables missing
At the beginning of the DATA step:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 0 .
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
I 1
1st Iteration of the DO loop (outer loop):
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 .
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):FILENAME C:\text1.txt
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 .
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):
The DO UNTIL loop evaluates the condition at the end of the loop
1st Iteration of the DO UNTIL loop (inner loop):
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 .
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):
INFILE reads: 1st data line’ from text1.txt’ input buffer
1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 01 145
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):
INPUT statement reads data values:input buffer PDV
1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 01 145
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
OUTPUT statement: PDV output dataset
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 01 145
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
SAS reaches the end of the inner loopSince LAST ≠1, the inner loop continues
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 0 01 145
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 1 1 4 5 …
Input buffer:
The DO UNTIL loop evaluates the condition at the end of the loop
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 1 01 145
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 2 1 1 9 …
Input buffer:
INFILE reads: 2nd data line from ‘text1.txt’ input bufferLAST 1
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 2 1 1 9 …
Input buffer:
The INPUT statement reads data values: input buffer PDV
ID SBP
1 01 145
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 2 1 1 9 …
Input buffer:
OUTPUT statement:PDV output dataset
ID SBP
1 01 145
2 02 119
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 2 1 1 9 …
Input buffer:
SAS reaches the end of the inner loopSince LAST = 1, the inner loop ends
ID SBP
1 01 145
2 02 119
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 1 C:\text1.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
1st Iteration of the DO loop (outer loop):
SAS reaches the end of the outer loop
ID SBP
1 01 145
2 02 119
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text1.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
I ↑2since I ≤ 3, the 2nd iteration of the outer
loop continues
ID SBP
1 01 145
2 02 119
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text2.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
ID SBP
1 01 145
2 02 119FILENAME C:\text2.txt
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text2.txt 1 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
ID SBP
1 01 145
2 02 119
The DO UNTIL loop evaluates the condition at the end of the loop
1st Iteration of the DO UNTIL loop (inner loop):
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text2.txt 0 02 119
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
ID SBP
1 01 145
2 02 119
INFILE reads: 1st data line from ‘text2.txt’ input bufferNot the last record of ‘text2.txt’, LAST 0
1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 3 1 2 6 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text2.txt 0 03 126
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
ID SBP
1 01 145
2 02 119
INPUT statement reads data values:input buffer PDV
1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 3 1 2 6 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text2.txt 0 03 126
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
ID SBP
1 01 145
2 02 119
3 03 126
OUTPUT statement:PDV output dataset
1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 3 1 2 6 …
Input buffer:
data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";
infile dummy filevar = filename; input id $ sbp; output;
end; stop;run;
READING MULTIPLE EXTERNAL FILES
_N_ D I D FILENAME D LAST D ID K SBP K
1 2 C:\text2.txt 0 03 126
text1.txt: 01 145 02 119
text2.txt: 03 126 04 106
text3.txt: 05 140 06 118
do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;
2nd Iteration of the DO loop (outer loop):
ID SBP
1 01 145
2 02 119
3 03 126
Skip the rest….
1st Iteration of the DO UNTIL loop (inner loop):
1 2 3 4 5 6 …
0 3 1 2 6 …
Input buffer:
ARRAY
There is a wide range of applications in using loop structures with ARRAY processing
Since ARRAY is a large and different topic, we are not covering ARRAY in this talk
CONCLUSION
Loops allow us to create more simplified and efficient programming codes
In order to use loop structures correctly, we need to understand how DATA steps are processed
When trying to debug our programming errors, we often realize that most of the errors are closely related to programming fundamentals, which is understanding how the PDV works
CONTACT INFORMATION
Arthur Li
City of Hope
Division of Information Science
1500 East Duarte Road
Duarte, CA 91010 - 3000
Phone: (626) 256-4673 ext. 65121
E-mail: [email protected]