Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | shannon-parsons |
View: | 220 times |
Download: | 4 times |
Chapter 21 Reading Hierarchical Files
Reading Hierarchical Raw Data Files
2
Objectives
– Read data with mixed record types– Read a hierarchical file and create one observation
per detail record.– Read a hierarchical file and create one observation
per header record.
3
Mixed Record Types
Not all records have the same format.
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
...
Multiple INPUT statements are needed using conditional statement to control.
4
Desired Output
Sales Sale ID Location Date Amount
101 USA 14264 3295.503034 EUR 14274 1876.30101 USA 14274 2938.00128 USA 14280 2908.741345 EUR 14281 3145.60109 USA 14320 2789.10
5
The INPUT Statement
• Multiple INPUT statements are needed for different formats of the same variable:
input SalesID $ Location $;if Location='USA' then input SaleDate : mmddyy10. Amount;else if location='EUR' then input SaleDate : date9. Amount : comma8.;
6
The INPUT Statement
NOTE: 6 records were read from the infile 'sales.dat'. The minimum record length was 24. The maximum record length was 26.NOTE: The data set WORK.SALES has 3 observations and 4 variables.
...
NOTE: This is NOT correct. We suppose have 6 cases (6 observations), not 3.
7
Undesirable OutputSales Sale ID Location Date Amount
101 USA . .101 USA . .1345 EUR . .
...
NOTE: This is NOT correct. We have 6 cases (6 observations), not 3. Besides, all the Sale date and Amount are missing!
8
The program:input SalesID $ Location $;if Location='USA' then input SaleDate : mmddyy10. Amount;else if location='EUR' then input SaleDate : date9. Amount : commax8.;
The raw data:101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
NOTE: Each INPUT statement reads a new case (observation), based on the IF condition.
The output:Sales Sale ID Location Date Amount
101 USA . .101 USA . .1345 EUR . .
9
Use The Single Trailing @ to control reading the same case requiring more than one INPUT statement
• The single trailing @ option holds a raw data record in the input buffer until SAS– executes an INPUT statement with no trailing @, or– reaches the bottom of the DATA step.
• General form of an INPUT statement with the single trailing @:
INPUT var1 var2 var3 … @;INPUT var1 var2 var3 … @;
10
input SalesID $ Location $ @;if location='USA' then input SaleDate : mmddyy10. Amount;else if Location='EUR' then input SaleDate : date9. Amount : commax8.;
Hold record for nextINPUT statement.
Load next record.
Processing the Trailing @
11
PDV
SALESID SALEDATE AMOUNTLOCATION
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
Compile
Input Buffer
...
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
12
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
PDV
SALESID SALEDATE
.AMOUNT
.LOCATION
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
...
Execute
Input Buffer
13
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
PDV
SALESID SALEDATE
.AMOUNT
.LOCATION
Input Buffer
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
1 0 1 U S A 1 - 2 0 - 1 9 9 9 3 2 9 5 . 5 0
...
15
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
PDV
SALESID SALEDATE
.AMOUNT
.LOCATION
True
Input Buffer
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
1 0 1 U S A 1 - 2 0 - 1 9 9 9 3 2 9 5 . 5 0
101 USA
...
Hold record.
16
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
Input Buffer
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
SALESID SALEDATE
.AMOUNT
.LOCATION
101 USA
PDV
14264 3295.50
...
1 0 1 U S A 1 - 2 0 - 1 9 9 9 3 2 9 5 . 5 0
17 Write out observation to sales.
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
SALESID SALEDATE
.AMOUNT
.LOCATION
101 USA
PDV
14264 3295.50
...
Input Buffer
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
1 0 1 U S A 1 - 2 0 - 1 9 9 9 3 2 9 5 . 5 0
Implicit output
18
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
Implicit return
SALESID SALEDATE
.AMOUNT
.LOCATION
101 USA
PDV
14264 3295.50
...
Input Buffer
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
1 0 1 U S A 1 - 2 0 - 1 9 9 9 3 2 9 5 . 5 0
19
data sales; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;run;
PDV
SALESID SALEDATE
.AMOUNT
.LOCATION
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
Raw Data File
Continue processinguntil end of the raw data file.
...
Input Buffer
1 0 1 U S A 1 - 2 0 - 1 9 9 9 3 2 9 5 . 5 0
20
NOTE: 6 records were read from the infile 'sales.dat'. The minimum record length was 24. The maximum record length was 26.NOTE: The data set WORK.SALES has 6 observations and 4 variables.
Mixed Record Types
Partial Log
21
Sales Sale ID Location Date Amount
101 USA 14264 3295.503034 EUR 14274 1876.30101 USA 14274 2938.00128 USA 14280 2908.741345 EUR 14281 3145.60109 USA 14320 2789.10
Mixed Record Typesproc print data=sales noobs;run;
PROC PRINT Output
22
Subsetting from a Raw Data File
This scenario uses the raw data file from the previous example.
101 USA 1-20-1999 3295.503034 EUR 30JAN1999 1876,30101 USA 1-30-1999 2938.00128 USA 2-5-1999 2908.741345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10
23
Desired Output
The sales manager wants to see sales for the European branch only.
Sales Sale ID Location Date Amount
3034 EUR 14274 1876.301345 EUR 14281 3145.60
24
The Subsetting IF Statement
data europe; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='USA' then input SaleDate : mmddyy10. Amount ; else if Location='EUR' then input SaleDate : date9. Amount : commax8.; if Location='EUR';run;
This is okay, but not efficient. It reads the entire data first, then select EUR location.
25
The Subsetting IF Statement
• The subsetting IF should appear as early in the program as possible but after the variables used in the condition are calculated.
• In this case, we should read only the EUR cases by adding the IF statement right after reading Location.
26
The Subsetting IF Statement
Because the program reads only European sales, the INPUT statement for USA sales is not needed.
data europe; length SalesID $ 4 Location $ 3; infile 'raw-data-file'; input SalesID $ Location $ @; if Location='EUR'; input SaleDate : date9. Amount : commax8.;run;
27
The Subsetting IF Statement
Sales Sale ID Location Date Amount
3034 EUR 14274 1876.301345 EUR 14281 3145.60
proc print data=europe noobs;run;
28
Processing Hierarchical Files
• Many files are hierarchical in structure, consisting of– a header record– one or more related detail records.
• Typically, each record contains a field that identifies whether it is a header record or a detail record.
HeaderDetailDetailHeaderHeaderDetailHeaderDetailDetail
29
Processing Hierarchical FilesYou can read a hierarchical file into a SAS data set by
creating one observation per detail record and storing the header information as part of each observation.
Header 1Detail 1Detail 2Detail 3Header 2Detail 1Header 3Detail 1Detail 2
Hierarchical File
Header Variables
Header 1Header 1Header 1Header 2Header 3Header 3
Detail Variables
Detail 1Detail 2Detail 3Detail 1Detail 1Detail 2
SAS Data Set
30
Processing Hierarchical FilesYou can also create one observation per header
record and store the information from detail records in summary variables.
Header 1Detail 1Detail 2Detail 3
Header 2Detail 1
Header 3Detail 1Detail 2
Header Variables
Header 1Header 2Header 3
Summary Variables
Summary 1Summary 2Summary 3
Hierarchical File SAS Data Set
31
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Creating One Observation Per DetailThe raw data file dependents has a header record containing the name of the employee and a detail record for each dependent on the employee’s health insurance.
E: Employee,
D: Dependent
C: Child,
S: SpouseEach data value is separated by :
32
Desired Output
Personnel wants a list of all the dependents and the name of the associated employee.
EmpLName EmpFName DepName Relation
Adams Susan Michael C Adams Susan Lindsay C Porter David Susan S Lewis Dorian D. Richard C Nicholls James Roberta C Slaydon Marla John S
33
A Hierarchical File
– Not all the records are the same.
– The fields are separated by colons.
– There is a field indicating whether the record is a header or a detail record.
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
34
How to Read the Hierarchical Data
input Type $ @;if Type='E' then input EmpLName $ EmpFName $;else input DepName $ Relation $;
35
How to Output Only the Dependents
input Type $ @;if Type='E' then input EmpLName $ EmpFName $;else do; input DepName $ Relation $; output;end;
Try the following program. Observe what is wrong with the result.
36
Input Buffer
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
EMPLNAME RELATION
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm= ':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
Compile
EMPFNAME DEPNAMETYPED
...
37
EMPLNAME RELATIONEMPFNAME DEPNAMETYPE
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Execute
...
D
Input Buffer
38
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPE
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s : S u s a n
...
D
39
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s : S u s a n
E
Hold record.
...
40
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
Input Buffer
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s : S u s a n
E
True
...
42
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s : S u s a n
E Adams Susan
...
43
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s : S u s a n
E Adams Susan
No implicit output
...
44
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E Adams Susan
...
Implicit return
E : A d a m s : S u s a n
45
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Reinitialize PDV.
...
E : A d a m s : S u s a n
46
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
...
Input Buffer
E : A d a m s : S u s a n
47
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Input Buffer
D : M i c h e a l : C
...
48
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D
Hold record.
D : M i c h a e l : C
...
49
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D
False
...
Input Buffer
D : M i c h a e l : C
50
EMPLNAME RELATIONEMPFNAME DEPNAMETYPED
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael C
...
Input Buffer
D : M i c h a e l : C
51 Write out observation to dependents.
EMPLNAME RELATIONEMPFNAME DEPNAMETYPERD
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael C
...
Input Buffer
D : M i c h a e l : C
Explicit output
52
Input Buffer
D : M i c h a e l : C
EMPLNAME RELATIONEMPFNAME DEPNAMETYPERD
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael C
Implicit return
...
53
Undesirable Output
Emp EmpLName FName DepName Relation
Michael C Lindsay C Susan S Richard C Roberta C John S
EmpLname and EmpFname are not properly captured.
54
The RETAIN Statement (Review)
• General form of the RETAIN statement:
• The RETAIN statement prevents SAS from reinitializing the values of new variables at the top of the DATA step. This means that values from previous records are available for processing.
RETAIN variable-name <initial-value>;RETAIN variable-name <initial-value>;
55
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
Hold EmpLName and EmpFName
56
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
EMPLNAME RELATION
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
Compile
EMPFNAME DEPNAMETYPER RD R
...
Input Buffer
57
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Execute
...
Input Buffer
58
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s :S u s a n
...
59
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E : A d a m s : S u s a n
E
Hold record.
...
60
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E
True
...
Input Buffer
E : A d a m s : S u s a n
62
Input Buffer
E : A d a m s : S u s a n
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E Adams Susan
...
63
Input Buffer
E : A d a m s : S u s a n
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E Adams Susan
No implicit output
...
64
Input Buffer
E : A d a m s : S u s a n
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
E Adams Susan
Implicit return
...
65
Input Buffer
E : A d a m s : S u s a n
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Reinitialize PDV.
Adams Susan
...
66
EMPLNAME RELATIONEMPFNAME DEPNAMETYPERD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $;output; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
Adams Susan
...
Input Buffer
67
Input Buffer
EMPLNAME RELATIONEMPFNAME DEPNAMETYPERD
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D : M i c h a e l : C
Adams Susan
...
68
EMPLNAME RELATIONEMPFNAME DEPNAMETYPERD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Adams Susan
...
Input Buffer
D : M i c h a e l : C
Hold record.
69
EMPLNAME RELATIONEMPFNAME DEPNAMETYPERD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael C
False
Adams Susan
...
Input Buffer
D : M i c h a e l : C
71
Input Buffer
D : M i c h a e l : C
Write out observation to dependents.
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael C
Explicit output
Adams Susan
...
72
Input Buffer
D : M i c h a e l : C
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael C
Implicit return
Adams Susan
...
73
Input Buffer
D : M i c h a e l : C
EMPLNAME RELATIONEMPFNAME DEPNAMETYPER RD R
data dependents(drop=Type); length Type $ 1 EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end;run;
E:Adams:SusanD:Michael:CD:Lindsay:CE:Porter:DavidD:Susan:SE:Lewis:Dorian D.D:Richard:CE:Dansky:IanE:Nicholls:JamesD:Roberta:CE:Slaydon:MarlaD:John:S
D Michael CAdams Susan
Continue processinguntil end of the raw data file.
74
Creating One Observation Per Detail
EmpLName EmpFName DepName Relation
Adams Susan Michael CAdams Susan Lindsay CPorter David Susan SLewis Dorian D. Richard CNicholls James Roberta CSlaydon Marla John S
proc print data=work.dependents noobs;run;
PROC PRINT Output
Correct Result
75
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Create One Observation Per Header Record
– Employee insurance is free for the employees.
– Each employee pays $50 per month for a spouse’s insurance.
– Each employee pays $25 per month for a child’s insurance.
76
Desired Output
• Personnel wants a list of all employees and their monthly payroll deductions for insurance.
ID Deduct
E01442 50E00705 50E01577 25E00997 0E00955 25E00224 50
77
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Calculating the Value of Deduct
– type of record read– value of Relation
when Type=‘D’.
The values of Deduct will change according to the
78
Retaining IDValues of ID and Deduct must be held across
iterations of the DATA step.
– ID must be retained with a RETAIN statement. – Deduct is created with a sum statement, which
automatically retains.
retain ID;
79
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
End Observation 1
End Observation 2
End Observation 3
End Observation 4
End Observation 5
End Observation 6
When to Output ?
...
80
When SAS Loads a Type E Record
1. Output what is currently in the PDV (unless this is the first time through the DATA step).
2. Read the next employee’s identification number.3. Reset Deduct to 0.
if Type='E' then do; if _N_ > 1 then output; input ID $; Deduct=0;end;
NOTE: _N_ = 1 is the first record with TYPE =‘E’, but there is no data to be processed yet.
81
When SAS Loads a Type D Record1. Read the dependent’s name and relationship. 2. Check the relationship.3. Increment Deduct appropriately.
else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50;end;
82
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then do; if _N_ > 1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end;run;
83
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
What About the Last Record?
No implicit output
...
84
Recall : The END= Option in the INFILE statement
• General form of the END= option:
• where variable-name is any valid SAS variable name.• The END= option creates a variable that has the value
– 1 if it is the last record of the input file– 0 otherwise.Variables created with END= are automatically dropped.
INFILE 'file-name' END=variable-name;INFILE 'file-name' END=variable-name;
85
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_ > 1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
86
RELATION
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file'
dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
Compile
TYPE
RID DEPNAME DEDUCT
D_N_ LASTREC
D D R D D
...
Input Buffer
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
87
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Execute
1 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;Input Buffer
input Type $ @;input Type $ @;
88
Input Buffer
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E : E 0 1 4 4 2
1 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
input Type $ @;input Type $ @;
89
Input Buffer
E : E 0 1 4 4 2
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0
D
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Hold record.
D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
90
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if Type='E' then do;if Type='E' then do;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
True
Input Buffer
E : E 0 1 4 4 2
91
D D
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if _N_ > 1 then output;if _N_ > 1 then output;
False
Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
92
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0E01442 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input ID $;input ID $;
Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
93
Input Buffer
E : E 0 1 4 4 2
D D
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0E01442 0
D Deduct=0
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
Deduct=0;Deduct=0;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
94
D D
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0E01442 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
end;end;
Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
95
D D
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0E01442 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if LastRec then output;if LastRec then output;
False
Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
96
D D
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E 1 0E01442 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Implicit return
97
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2 0
D D D
...
Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Reinitialize PDV.
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
98
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
Input Buffer
E : E 0 1 4 4 2
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
99
Input Buffer
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2
D : M i c h a e l : C
0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
input Type $ @;input Type $ @;
100
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2D 0
D
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Hold record.
D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
101
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2D 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if Type=‘E’ then do;if Type=‘E’ then do;
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
False
102
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2Michael CD 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input DepName $ Relation $;input DepName $ Relation $;
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
103
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2Michael CD 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
True
if Relation='C' then Deduct+25;if Relation='C' then Deduct+25;
25
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
0 + 25
104
RELATIONTYPE
RID DEPNAME
0
DEDUCT
D R_N_ LASTREC
D
E01442 2Michael C 25D 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if LastRec then output;if LastRec then output;
False
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
105
D D
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 2Michael C 25D 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Implicit return
106
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3 0
D D D
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
...
Reinitialize PDV.
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
107
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
Input Buffer
D : M i c h a e l : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
108
Input Buffer
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3
D : L i n d s a y : C
0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
109
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3D 0
D
Input Buffer
D : L i n d s a y : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Hold record.
D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
110
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 325D 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if Type=‘E’ then do;if Type=‘E’ then do;False
Input Buffer
D : L i n d s a y : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
111
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3Lindsay CD 0
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input DepName $ Relation $;input DepName $ Relation $;
Input Buffer
D : L i n d s a y : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
112
D D
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3Lindsay C 50D 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if Relation=‘C’ then Deduct+25;if Relation=‘C’ then Deduct+25;
True
Input Buffer
D : L i n d s a y : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
25 + 25
113
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
RELATIONTYPE
RID DEPNAME
25
DEDUCT
D R_N_ LASTREC
D
E01442 3Lindsay C 50D 0
D D D
...
if LastRec then output;if LastRec then output;
False
Input Buffer
D : L i n d s a y : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
114
Input Buffer
D : L i n d s a y : C
D D
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
E01442 3Lindsay C 2550D 0
D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;Implicit return
115
Input Buffer
D : L i n d s a y : C
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
450E01442 0
D D D
Reinitialize PDV.
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
116
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
450E01442 0
D D D
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
Input Buffer
D : L i n d s a y : C
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
117
Input Buffer
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
E : E 0 0 7 0 5
450E01442 0
D D D
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
118
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
450E01442 0
D
Input Buffer
E : E 0 0 7 0 5
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Hold record.
D D
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
input Type $ @;input Type $ @;
E
119
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
450E01442 0E
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if Type=‘E’ then do;if Type=‘E’ then do;
Input Buffer
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
E : E 0 0 7 0 5
True
120
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
450E01442 0E
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if _N_ > 1 then output;if _N_ > 1 then output;
True
Input Buffer
E : E 0 0 7 0 5
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
121 Write out observation to insurance.
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
450E01442 0E
D D D
...
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;
if _N_ > 1 then output;if _N_ > 1 then output;
Input Buffer
E : E 0 0 7 0 5
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
True
Explicit output
122
RELATIONTYPE
RID DEPNAME DEDUCT
D R_N_ LASTREC
D
E00224 12John S 2550D 1
D D D
data work.insurance(drop=Type DepName Relation); length Type $ 1 ID $ 6 DepName $ 20 Relation $ 1; retain ID; infile 'raw-data-file' dlm=':' end=LastRec; input Type $ @; if Type='E' then do; if _N_>1 then output; input ID $; Deduct=0; end; else do; input DepName $ Relation $; if Relation='C' then Deduct+25; else Deduct+50; end; if LastRec then output;run;Input Buffer
D : J o h n : S
E:E01442D:Michael:CD:Lindsay:CE:E00705D:Susan:SE:E01577D:Richard:CE:E00997E:E00955 D:Roberta:CE:E00224D:John:S
Implicit return
123
Creating One Observation Per Header
ID Deduct
E01442 50E00705 50E01577 25E00997 0E00955 25E00224 50
proc print data=insurance noobs;run;
PROC PRINT Output
Exercise 1
Open program c21_1. Carefully check the data structure, and go through each program statement to make sure you know why the statement is needed.Run the program, and learn how to read hierarchical data.